Learning Objectives

This course focuses on both concepts and practice. We will introduce (a) the core data mining concepts and (b) practical skills for applying data mining techniques to solve real-world problems.

Concepts

  • Study the major data mining problems as different types of computational tasks (prediction, classification, clustering, etc.) and the algorithms appropriate for addressing these tasks
  • Learn how to analyze data through statistical and graphical summarization, supervised and unsupervised learning algorithms
  • Systematically evaluate data mining algorithms and understand how to choose algorithms for different analysis tasks

Practice

  • Learn how to gather and process raw data into suitable input for a range of data mining algorithms
  • Critique the methods and results from a data mining practice
  • Design and implement data mining applications using real-world datasets, and evaluate and select proper data mining algorithms to apply to practical scenarios

Course Content

Topics to be covered:

  • Data exploration, visualization, and probabilistic thinking
  • Supervised learning (or predictive analysis): Regression, Classification
  • Unsupervised learning (or descriptive analysis): Clustering, Dimension reduction
  • Evaluation and model assessment
  • Special topics: Network mining, Time series analysis, Simulation

See the course schedule for weekly topics.

Computing/Coding

This course will use Python for coding. We will use Jupyter Notebook for creating reproducible data science documents.

Prerequisites

Students are expected to be familiar with Python programing, the basics of Linear Algebra and Probability.

Grading

Grades are based on the major activities listed below.

  • 30% homework
  • 20% quizzes
  • 25% in-person midterm
  • 25% in-person final exam

University Policies

See the university policies page.