O'Reilly logo
live online training icon Live Online training

Linear Regression with Python: Essential Math for Data Science

enter image description here

Take control of your data by honing your fundamental math skills

Michael Cullan

Linear regression is a simple but often powerful tool to quantify the relationship between a value you want to predict with a set of explanatory variables. Once a relationship has been established, it is possible to apply further analysis like understanding the degree that each explanatory variable affects the predicted value. Mastery and understanding of the linear regression model is required before learning about more powerful machine learning models.

This is the second course in a four-part series focused on essential math topics. These courses are grouped in pairs with this natural progression:

  1. Linear Regression with Python
  2. Linear Algebra with Python


  1. Probability with Python
  2. Statistics and Hypothesis Testing with Python

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • How linear regression works, and its limitations
  • Model fitting metrics, like mean squared error, used to determine how well a model works
  • Variation of standard linear regression like Ridge regression and how they prevent overfitting

And you’ll be able to:

  • Apply linear regression on a data set to create a predictive model
  • Quantify how well a linear regression model is performing
  • Determine which variables have the greatest impact on the predicted value

This training course is for you because...

  • You are someone in a technical role but are looking for foundational knowledge to transition into a data scientist position
  • You work with data and want to start building predictive models
  • You want to become a data analyst or data scientist


  • Basic math: addition, subtraction, multiplication and division
  • Basic understanding of linear algebra: matrices and vectors
  • Basic Python: variable creation, conditional statements, functions, loops

Recommended preparation:

Recommended follow-up:

About your instructor

  • Michael studied mathematics and music as an undergraduate at the University of Arizona before obtaining a master’s degree studying computational statistics at Arizona State University. He has developed usable software tools alongside cutting edge statistical theory, putting new ideas in the hands of researchers and practitioners. This research experience, along with his time spent in academic and professional teaching positions, sparked a love for sharing knowledge and helping others grow.

    As an undergraduate research assistant in an artificial intelligence and music lab, Michael wrote Python code to model musical data for a jazz-improvisation robot. Later, he developed an R package and a novel procedure to allow biologists to automatically compare and select among many statistical models with confidence. Michael has also taught prospective graduate students to take the GRE for Kaplan Test Prep, and served as a TA for upper level mathematics courses.

    In his free time, Michael turns this passion for math and science toward art, creating code-based visual art and organizing events for digital art and music.


The timeframes are only estimates and may vary according to how the class is progressing

Getting Started (5 minutes)

  • Presentation: Introduction to Jupyter Notebook environment
  • Pulse check: Everyone ready to get started?

Introduction to Linear Regression (5 minutes)

  • Presentation: What is linear regression? What aspects of it are covered in this class?

Regression Metrics (20 minutes)

  • Presentation: Mean squared error
  • Presentation: Mean absolute error
  • Exercise: Determining MSE for California Housing Data
  • Presentation: R-squared, coefficient of determination
  • Presentation: Optimization
  • Q&A and Discussion (10 minutes)
  • Break (5 minutes)

Creating a Linear Regressor in Scikit-Learn (15 minutes)

  • Lecture: The Scikit-Learn workflow
  • Exercise: Creating a regressor (Jupyter Notebook)

Adding Features to Improve Performance (15 minutes)

  • Lecture: How creating features can improve performance
  • Exercise: How to augment the California Housing data (Jupyter Notebook)

Motivation of Regularization (10 minutes)

  • Lecture: What is overfitting?
  • Lecture: What is regularization and how does it prevent overfitting?

Regularization in Action (15 minutes)

  • Lecture: How does Ridge Regression improve Linear Regression?
  • Exercise: Setting up a Ridge Regressor in Scikit-Learn.
  • Q&A and Discussion (10 minutes)

Next Steps: Stochastic Gradient Descent (10 minutes)

  • Presentation: Working with large datasets
  • Exercise: Example of an SGD pipeline
  • Q&A