Causal Inference in Data Science
A computational introduction to causality and counterfactual reasoning with Python
This training provides an invaluable, handson guide to applying causal inference in the wild to solve realworld data science tasks. Using an endtoend example, we will walk through the process of posing a causal hypothesis, modeling our beliefs with causal graphs, estimating causal effects with the doWhy library in Python, and finally evaluating the soundness of our results. Rather than taking an abstract and mathematical approach to these steps, the focus of this training will be on accessible computational methods to practically answer causal questions in the context of a data science workflow.
What you'll learnand how you can apply it
 Understand how to reason causally and why it is necessary for modern data science.
 Use the doWhy library to build, estimate, and evaluate causal models.
 Learn how to practically apply causal inference to realworld data science problems.
This training course is for you because...
 You have taken an introductory data science course or statistics course but want to take the next step to understand the foundations of causal inference and how to effectively apply the theory to realworld problems.
 You have heard about the power of causal reasoning, but do not know how to get started learning its basics or applying it to your own problems.
 You are an aspiring data scientist looking to break into the field and need to learn the practical skills necessary for what you will encounter on the job.
 You are a quantitative researcher interested in applying theory to real projects by taking a computational approach to causal inference.
 You are a software engineer interested in leveraging analytics to augment your application development process.
Prerequisites
 Experience with an objectoriented programming language, e.g., Python (all code demos during the training will be in Python)
 Familiarity with basic probability and statistics (e.g. distributions and hypothesis testing).
 A working knowledge of the scientific Python libraries (numpy, pandas and scikitlearn) is helpful but not required.
Course Setup
 Download the appropriate Python 3.7 Anaconda Distribution for your operating system: https://www.anaconda.com/distribution/
Recommended Preparation
 Data Science Fundamentals Part 2: Machine Learning and Statistical Analysis (Lesson 7 and 8) ](https://learning.oreilly.com/videos/datasciencefundamentals/9780134778877)
Recommended Followup
About your instructor

Jonathan Dinu is currently pursuing a Ph.D. in Computer Science at Carnegie Mellonâ€™s Human Computer Interaction Institute (HCII) where he is working to democratize machine learning and artificial intelligence through interpretable and interactive algorithms. Previously, he cofounded Zipfian Academy (an immersive data science training program acquired by Galvanize), has taught classes at the University of San Francisco, and has built a Data Visualization MOOC with Udacity.
In addition to his professional data science experience, he has run data science trainings for a Fortune 100 company and taught workshops at Strata, PyData, & DataWeek (among others). He first discovered his love of all things data while studying Computer Science and Physics at UC Berkeley and in a former life he worked for Alpine Data Labs developing distributed machine learning algorithms for predictive analytics on Hadoop.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Identifying Causal Effects 50min
 Randomized Control Trials
 Counterfactuals and Potential Outcomes
 Causal Graphical Models
 Q&A
Estimating Causal Effects 50min
 Propensity Score Matching
 Instrument Variables
 Causal Effect Inference with Machine Learning
Break 10 min
Evaluating Causal Models 30min
 Random Confounders and Placebos
 Cross Validation
 Sensitivity Analysis
 Q&A
Discovering Causal Structure 25min
 Guess and Test
 Automated Graph Discovery
 Q&A