Learning Objectives

This course focuses on both concepts and practice. We will introduce (a) the core data mining concepts and (b) practical skills for applying data mining techniques to solve real-world problems.

Concepts

  • Study the major data mining problems as different types of computational tasks (prediction, classification, clustering, etc.) and the algorithms appropriate for addressing these tasks
  • Learn how to analyze data through statistical and graphical summarization, supervised and unsupervised learning algorithms
  • Systematically evaluate data mining algorithms and understand how to choose algorithms for different analysis tasks

Practice

  • Learn how to gather and process raw data into suitable input for a range of data mining algorithms
  • Critique the methods and results from a data mining practice
  • Design and implement data mining applications using real-world datasets, and evaluate and select proper data mining algorithms to apply to practical scenarios

Course Content

Topics to be covered:

  • Data exploration, visualization, and probabilistic thinking
  • Supervised learning (or predictive analysis): Regression, Classification
  • Unsupervised learning (or descriptive analysis): Clustering, Dimension reduction
  • Evaluation and model assessment
  • Special topics: Network mining, Time series analysis, Simulation

See the course schedule for weekly topics.

Computing/Coding

This course will use Python for coding. We will use Jupyter Notebook for creating reproducible data science documents.

Prerequisites

Prerequisites for this course include prior experience with Python programming and a foundational understanding of Linear Algebra and Probability.

Assessment & Grading

Your grade will be based on homework, quizzes, and major assessments, giving you several opportunities to demonstrate your progress and understanding.

  • 40% Homework
  • 30% Quizzes
  • 30% In-person Exams / Final Project

Exams & Final Project:

You will complete one in-person exam (15%) around Week 8. For the second major assessment, you may choose the format that works best for you. This flexibility is meant to give you different ways to demonstrate your learning and strengths.

  • Option A (default) is an in-person Exam 2 during Week 16.
  • Option B is a Final Project, completed individually or in pairs (maximum of two). The project includes submitting an abstract in Week 9 (with feedback and revision), giving an in-person presentation in Week 15, and submitting the final paper in Week 16.

Attendance Policy:

You may miss up to two classes during the semester without penalty. Each additional absence will lower your final grade by one point. Students who attend every class will receive extra credit added to their final grade. If you expect challenges with attendance, please let me know early so we can explore possible solutions together.

Submission Policy:

All assignments are due on their scheduled dates. Late submissions will receive a deduction of 10% per day, with the exception of the final project paper, which must be submitted on time. Please ensure all written work follows the required format provided in class.

University Policies

See the university policies page.