This page is subject to change before the start of the
course.
Learning Objectives
This course focuses on both concepts and
practice. We will introduce (a) the core data mining
concepts and (b) practical skills for applying data mining techniques to
solve real-world problems.
Concepts
- Study the major data mining problems as different types of
computational tasks (prediction, classification, clustering, etc.) and
the algorithms appropriate for addressing these tasks
- Learn how to analyze data through statistical and graphical
summarization, supervised and unsupervised learning algorithms
- Systematically evaluate data mining algorithms and understand how to
choose algorithms for different analysis tasks
Practice
- Learn how to gather and process raw data into suitable input for a
range of data mining algorithms
- Critique the methods and results from a data mining practice
- Design and implement data mining applications using real-world
datasets, and evaluate and select proper data mining algorithms to apply
to practical scenarios
Course Content
Topics to be covered:
- Data exploration, visualization, and probabilistic thinking
- Supervised learning (or predictive analysis): Regression,
Classification
- Unsupervised learning (or descriptive analysis): Clustering,
Dimension reduction
- Evaluation and model assessment
- Special topics: Network mining, Time series analysis,
Simulation
See the course schedule
for weekly topics.
Computing/Coding
This course will use Python for
coding. We will use Jupyter Notebook
for creating reproducible data science documents.
Prerequisites
Students are expected to be familiar with Python programing, the
basics of Linear Algebra and Probability.
Grading
Grades are based on the major activities listed below.
- 30% homework
- 20% quizzes
- 25% in-person midterm
- 25% in-person final exam