O'Reilly logo
live online training icon Live Online training

Causal Inference in Data Science

A computational introduction to causality and counterfactual reasoning with Python

Jonathan Dinu

This training provides an invaluable, hands-on guide to applying causal inference in the wild to solve real-world data science tasks. Using an end-to-end example, we will walk through the process of posing a causal hypothesis, modeling our beliefs with causal graphs, estimating causal effects with the doWhy library in Python, and finally evaluating the soundness of our results. Rather than taking an abstract and mathematical approach to these steps, the focus of this training will be on accessible computational methods to practically answer causal questions in the context of a data science workflow.

What you'll learn-and how you can apply it

  • Understand how to reason causally and why it is necessary for modern data science.
  • Use the doWhy library to build, estimate, and evaluate causal models.
  • Learn how to practically apply causal inference to real-world data science problems.

This training course is for you because...

  • You have taken an introductory data science course or statistics course but want to take the next step to understand the foundations of causal inference and how to effectively apply the theory to real-world problems.
  • You have heard about the power of causal reasoning, but do not know how to get started learning its basics or applying it to your own problems.
  • You are an aspiring data scientist looking to break into the field and need to learn the practical skills necessary for what you will encounter on the job.
  • You are a quantitative researcher interested in applying theory to real projects by taking a computational approach to causal inference.
  • You are a software engineer interested in leveraging analytics to augment your application development process.


  • Experience with an object-oriented programming language, e.g., Python (all code demos during the training will be in Python)
  • Familiarity with basic probability and statistics (e.g. distributions and hypothesis testing).
  • A working knowledge of the scientific Python libraries (numpy, pandas and scikit-learn) is helpful but not required.

Course Set-up

Recommended Preparation

  • Data Science Fundamentals Part 2: Machine Learning and Statistical Analysis (Lesson 7 and 8) ](https://learning.oreilly.com/videos/data-science-fundamentals/9780134778877)

Recommended Follow-up

About your instructor

  • Jonathan Dinu is currently pursuing a Ph.D. in Computer Science at Carnegie Mellon’s Human Computer Interaction Institute (HCII) where he is working to democratize machine learning and artificial intelligence through interpretable and interactive algorithms. Previously, he co-founded Zipfian Academy (an immersive data science training program acquired by Galvanize), has taught classes at the University of San Francisco, and has built a Data Visualization MOOC with Udacity.

    In addition to his professional data science experience, he has run data science trainings for a Fortune 100 company and taught workshops at Strata, PyData, & DataWeek (among others). He first discovered his love of all things data while studying Computer Science and Physics at UC Berkeley and in a former life he worked for Alpine Data Labs developing distributed machine learning algorithms for predictive analytics on Hadoop.


The timeframes are only estimates and may vary according to how the class is progressing

Identifying Causal Effects 50min

  • Randomized Control Trials
  • Counterfactuals and Potential Outcomes
  • Causal Graphical Models
  • Q&A

Estimating Causal Effects 50min

  • Propensity Score Matching
  • Instrument Variables
  • Causal Effect Inference with Machine Learning

Break 10 min

Evaluating Causal Models 30min

  • Random Confounders and Placebos
  • Cross Validation
  • Sensitivity Analysis
  • Q&A

Discovering Causal Structure 25min

  • Guess and Test
  • Automated Graph Discovery
  • Q&A