Personalization

09 / ~ / 2003 to Present

As more people's activity moves online, and more content moves online, personalization as a form of information management becomes more useful and important. My interest in personalization lies in two main areas:

  1. Personalized Search. Historically, information retrieval has been concerned with matching documents to a query. However, a lot of information lies outside the query itself, and rather is associated with the user issuing the query. In particular, the location of the user issuing the query, their recent searches, and their past search behavior, all are signals that can help a search engine give more relevant results for a query.
  2. Personalized Homepages Personalized homepages have been around since the mid 1990's as a means of organizing consumable content. I have been primarily interested in looking at the personalized homepage as an open platform for content and application developers, and, in that context, developing APIs for homepage development and algorithmic and social means of organizing content in this context.

    A discussion on personalization and search is not complete without addressing the implications towards privacy. For me, the following four principles are important in protecting user privacy when it comes to personalized search:

    1. Choice - The user should have a choice as to what data is used in personalizing his or her search results.
    2. Transparency - A search engine should show the user all the data it's using to personalize a user's search results.
    3. Control - Furthermore, the user should be able to delete data that he or she doesn't want to be used in personalization.
    4. Data portability - Data that is collected by a search engine for the purposes of personalization belongs to the user and not the search engine. As such, the user should be able to download it and transfer it to another search engine.
     

Publications

  • Extrapolation Methods for Accelerating PageRank Computations- Proceedings of The Twelfth International World Wide Web Conference, May, 2003

    By Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub

  • Exploiting the Block Structure of the Web for Computing PageRank- Technical Report, June, 2003

    By Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub

  • The Second Eigenvalue of the Google Matrix- Technical Report, June, 2003

    By Taher H. Haveliwala and Sepandar D. Kamvar

  • Adaptive Methods for the Computation of PageRank- Linear Algebra and its Applications, Special Issue on the Numerical Solution of Markov Chains, November, 2003

    By Sepandar D. Kamvar, Taher H. Haveliwala, and Gene H. Golub

  • An Analytical Comparison of Approaches to Personalizing PageRank- Technical Report, June, 2003

    By Taher H. Haveliwala, Sepandar D. Kamvar, and Glen Jeh

  • The Condition Number of the PageRank Problem- Technical Report, June, 2003

    By Sepandar D. Kamvar and Taher H. Haveliwala

From My Blog

  • Organizing The World’s Push Content: The iGoogle Ecosystem - 09 / 26 / 2007

Talks

  • Extrapolation Techniques for Computing PageRank- Proceedings of The Twelfth International World Wide Web Conference, May, 2003

Data

  • Stanford Web Matrix- 281903 pages, ~2.3 million links. 11.7MB zipped, 64.2MB unzipped. From a September 2002 crawl. Matlab format.

    Note: loadStanfordMatrix.m loads the matrix such that the rows represent the inlinks of a page, and the columns represent the outlinks.

  • Stanford-Berkeley Web Matrix- 683446 pages, ~7.6 million links. 32.5MB zipped, 125.8MB unzipped. From a December 2002 crawl. Matlab format.

    Note: loadSBMatrix.m loads the matrix such that the rows represent the inlinks of a page, and the columns represent the outlinks.

Code

  • Basic PageRank Algorithm - in Matlab.