This page is subject to change before the start of the course.
Spring 2023 Special Requirement (source)
During this pandemic, it is extremely important that you abide by the public health regulations, the University of Pittsburgh’s health standards and guidelines, and Pitt’s Health Rules. These rules have been developed to protect the health and safety of all of us. The University’s requirements for face coverings will at a minimum be consistent with CDC guidance and masks are required indoors (campus buildings and shuttles) on campuses in which COVID-19 Community Levels are High. This means that when COVID-19 Community Levels are High, you must wear a face covering that properly covers your nose and mouth when you are in the classroom. If you do not comply, you will be asked to leave class. It is your responsibility to have the required face covering when entering a university building or classroom. Masks are optional indoors for campuses in which county levels are Medium or Low. Be aware of your Community Level as it changes each Thursday. Read answers to frequently asked questions regarding face coverings. For the most up-to-date information and guidance, please visit the Power of Pitt site and check your Pitt email for updates before each class.
If you are required to isolate or quarantine, become sick, or are unable to come to class, contact me as soon as possible to discuss arrangements.
This course focuses on both concepts and practice. We will introduce (a) the core data mining concepts and (b) practical skills for applying data mining techniques to solve real-world problems.
Topics to be covered:
See the course schedule for weekly topics.
This course will use R for computing. R is freely available online. We will be using R Studio as our default IDE, which can be downloaded for free. We will use R Markdown for creating reproducible data science documents.
Students are expected to be familiar with the basics of Linear Algebra, Probability and Statistics, and should be comfortable with programming. We will use R for computing, and hence familiarity of R is preferred. If you have never programmed before, get started by checking a list of learning resources on the course website here.
Grades are based on three major activities listed below. Assignments are due as scheduled, and grades on late work will be decreased by 10% per day late.
Class participation will be assessed through online quizzes assigned each week, as well as the students’ participation in class.
This course does not have a single textbook. It will use materials from several recommended books listed below. These books are available online (some are available online over Pitt network). There will be reading assignments over the course of the semester. Links to the electronic copies of these readings will be provided. There are also other recommended books for further reading and for learning R.
Readings will be assigned throughout the semester – roughly one reading assignment per week. Each reading assignment is relevant to the weekly topic, and is chosen to help you connect the technical tools to more practical and creative use in real world. The reading assignment is to enrich your data science problem-solving skills and help you develop project ideas. The tentative list of readings to be assigned is available here.
The reading assignment will be evaluated via post-class quizzes.
See the university policies page.