Teaching

Stanford University

Computational Methods in
Data Mining

CME 340 - Winter 2007

In April of 1995, Lycos had the largest index of the web with 3.6 million web pages. Today, all of the major search engines index several billion pages, as well as images, video, real-time blogs, etc.

The proliferation of online data in the past ten years has increased the visibility and importance of data mining, and has also caused some fundamental changes in methods for data mining.

This project course will focus on methods for mining of large-scale unstructured data sets. The format is seminar-style, and students will read recent research papers in data mining and present them in class.

Students should have basic knowledge of machine learning and statistics. A large portion of this course is the quarter-long project, in which students work in groups of 2-3 on a self-defined data mining project.

We will provide some basic datasets for these projects, including the Enron e-mail corpus and a portion of Google's crawl.

All projects will be presented to a panel of VCs and thought leaders in industry and academia.

Prerequisites

CME 340 - Winter 2007
  • Familiarity with the basic concepts of probability theory. (Stat116 is sufficient but not necessary.)
  • Familiarity with linear algebra. (Math 113 or CS237A are sufficient but not neccessary).
  • Knowledge of basic computer science principles and skills at the level of CS103.
  • Mathematical ability and the ability to understand and analyze fairly complicated algorithms and data structures. (CS161 is sufficient but not necessary.)

Syllabus

CME 340 - Winter 2007
  1. Week 1 - Search

  2. Week 2 - Personalized Search

  3. Week 3 - Collaborative Filtering / Recommender Systems

  4. Week 4 - Latent Semantic Indexing

  1. Week 5 - Classification and Feature Selection

  2. Week 6 - Clustering

  3. Week 7 - Info Vis

  4. Week 8 - Peer-to-peer Networks