In April of 1995, Lycos had the largest index of the web with 3.6 million web pages. Today, all of the major search engines index several billion pages, as well as images, video, real-time blogs, etc.
The proliferation of online data in the past ten years has increased the visibility and importance of data mining, and has also caused some fundamental changes in methods for data mining.
This project course will focus on methods for mining of large-scale unstructured data sets. The format is seminar-style, and students will read recent research papers in data mining and present them in class.
Students should have basic knowledge of machine learning and statistics. A large portion of this course is the quarter-long project, in which students work in groups of 2-3 on a self-defined data mining project.
We will provide some basic datasets for these projects, including the Enron e-mail corpus and a portion of Google's crawl.
All projects will be presented to a panel of VCs and thought leaders in industry and academia.
Week 1 - Search
Week 2 - Personalized Search
Week 3 - Collaborative Filtering / Recommender Systems
Week 4 - Latent Semantic Indexing
Week 5 - Classification and Feature Selection
Week 6 - Clustering
Week 7 - Info Vis
Week 8 - Peer-to-peer Networks