Course Description

Data-driven models have been increasingly used in many domains to assist in human decision-making that has a significant impact on people’s lives – from job hiring and promotion, college admission, judicial decision, to business or public service delivery. The development of decision aids has been made possible both by voluminous data and new data science tools that can exploit complex structures and patterns in data.

Learning Objectives

This course focuses on both concepts and practice in order to understand and cope with the ethical challenges in data science and data-driven decision making. We will introduce (a) the core concepts of fairness and interpretability/explainability and (b) analytic and technical tools to mitigate emerging problems in the real world.

Concepts

  • Recognize where and understand why (un)fairness and ethical issues arise when applying data science to real world problems
  • Learn how to conceptualize, measure, and mitigate bias in data-driven decision-making
  • Learn how to evaluate models and make data-driven decision-making more interpretable and explainable
  • Learn to think critically about data-driven decisions and policy questions, and evaluate a project with these concerns in mind

Practice

  • Develop fluency in the key technical, ethical, policy, and legal terms and concepts that are relevant to a normative assessment of data science
  • Learn common approaches and emerging tools for measuring, mitigating or managing these ethical concerns
  • Gain exposure to technical, legal and policy documents that help understand the current regulatory environment and anticipate future developments
  • Design and implement data science applications using real-world datasets, and systematically evaluate and justify the chosen approach to deal with the ethical concerns

Course Content

Topics to be covered:

  • Big data’s disparate impact
  • Decision-making by humans and machines
  • Decision-making by machines and big data
  • Sources of unfairness/biases
  • Formal notions and statistical measures of fairness
  • Fair ML and bias mitigation
  • Interpretability & explainability in AI
  • Ethics and privacy
  • Legal and policy perspectives, etc.

See the course schedule for weekly topics.

Computing

This course will use R and/or Python for computing. GitHub will be used for homework and project assignments, where tools such as Jupyter Notebooks or R Markdown will be used for creating reproducible data science documents.

Prerequisites

Students are expected to be familiar with the basics of Probability and Statistics, Data Mining/Machine Learning, and should be comfortable with programming with DM/ML toolkits. Students need to have a willingness to do interdisciplinary research, and be comfortable to learn concepts through reading technical, legal and policy documents.

Grading

Grades are based on three major activities listed below. Assignments are due as scheduled, and grades on late work will be decreased by 10% per day late. See the assignment page for more details.

  • 40% in-class participation and reading (including quizzes and reading reflection/discussion)
  • 30% homework and midterm
  • 30% final project (including several milestones)

Class Participation

Class participation will be assessed through online quizzes and discussions assigned each week.

Readings

This course will use online materials and academic readings. There will be reading assignments over the course of the semester. Links to the electronic copies of these readings will be provided. There are no textbooks.

University Policies

See the university policies page.