Learning Objectives

  • Understand the fundamentals of data mining techniques, particularly with the graph setting
  • Learn recent advanced methods and algorithms in analyzing large and complex datasets
  • Gain experience in conducting research in data mining
  • Develop skills to implement data mining algorithms and solve real-world applications

Course Content

Topics to be covered:

  • network analysis
  • probabilistic models
  • neural network and deep learning
  • text mining and representation
  • graph mining, including ranking, classification, clustering and community detection, summarization, similarity, representation learning in the graph setting
  • recommendation systems and matrix factorization
  • graph neural networks
  • selective topics, e.g., sequence modeling, spatiotemporal modeling, causal modeling

See the course schedule for the most up-to-date weekly topics.

Computing / Programming skill requirement

This course will use R and/or Python for computing, so fluency in either or both programming languages is required. GitHub will be used for homework and project assignments, where tools such as Jupyter Notebooks or R Markdown will be used for creating reproducible data science documents. Examples in lectures or homework assignments will be mainly in R/Python.

Prerequisites / Prior Knowledge

  • Successful completion of an introductory course in data mining (e.g., INFSCI 2160), machine learning (e.g., INFSCI 2725) or equivalent.
  • Students are expected to have background knowledge in data structures, algorithms, basic linear algebra, probability theories, and statistics.
  • It is assumed that every student is familiar with basic data mining topics (regression, classification, clustering, etc.) and has experience with programming with one or more data mining tools (R/Python).

Grading

Grades are based on three major activities listed below. Assignments are due as scheduled, and grades on late work will be decreased by 10% per day late. See the assignment page for more details.

  • 40% in-class participation, presentation and reading (including quizzes and reading reflection/discussion)
  • 30% homework and midterm report
  • 30% final project (including several milestones)

Class Participation

Class participation will be assessed through online and in-class discussions. The online discussion will be evaluated through students’ participation in the post-reading discussion on an online forum (i.e., Teams).

Textbooks / Readings

There are no required textbooks for this course. This course will use online materials and academic readings. There will be reading assignments over the course of the semester. Links to the electronic copies of these readings will be provided.

The reading assignment will be evaluated via reading reflection submission and in-class discussion.

University Policies

See the university policies page.