O'Reilly logo
live online training icon Live Online training

Understanding data science algorithms in R: Regression

Jamie Owen

Build on your foundational knowledge of R as a tool for data science by exploring regression models. These methods are applied routinely by practitioners, although not always appropriately. Jamie Owen walks you through common regression methods, explaining when they are useful for performing data analytics and detailing some of their limitations. You'll gain hands-on experience with key concepts using small toy datasets and understand the bigger picture through complex, interesting datasets such as OkCupid registrations and James Bond behavioral statistics. If you’re a programmer who is interested in data science, a manager who wants to summarize datasets, or simply someone who uses data and wants to learn how to analyze and summarize it, this course is for you.

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • How to interpret linear regression models
  • How lasso, ridge, and elastic net relate to standard linear regression
  • Why regularization and shrinkage are essential when fitting large regression models

And you’ll be able to:

  • Tune regression models using cross-validation techniques
  • Fit advanced regression models

This training course is for you because...

  • You're a programmer who is interested in data science but has little or no experience with statistics or a background in mathematics.
  • You're a manager who wants to summarize datasets.
  • You use data but don't have the necessary training to analyze and summarize it.

Prerequisites

  • A working knowledge of any programming language (Python, MATLAB, C, Java, etc.)
  • Familiarity with R not required

Required materials and setup:

  • A machine with the latest version of R and RStudio installed
  • Download the course R package (link to come)

Recommended preparation:

About your instructor

  • Dr. Jamie Owen is a Senior Data Scientist and Lead Trainer at Jumping Rivers. Having obtained a PhD focusing on computational statistics, Jamie was one of the founding members of Jumping Rivers. He has been delivering R training since 2011 at a variety of levels, ranging from beginner to advanced to a diverse collection of audiences. Jamie has taught courses for audiences from a variety of Universities, government agencies and some of the largest UK companies including Newcastle University, Virgin Media, the NHS, the Ministry of Defence and Shell.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Multiple linear regression (50 minutes)

  • Lecture: Overview of simple linear regression; the difference between the statistical and machine learning approaches; multiple linear regression—geometric interpretation for two covariates and model fit; why does multiple linear regression fail
  • Hands-on exercise

Break (10 minutes)

Regression: Shrinkage and regularization (50 minutes)

  • Lecture: Best subset regression and why you should avoid it; lasso and ridge regression—shrinkage and regularization, how normalization changes interpretation, L1 and L2 norms, the relationship with multiple linear regression
  • Hands-on exercise

Break (10 minutes)

Model assessment (50 minutes)

  • Lecture: Cross-validation—training and test datasets, running in parallel; bootstrapping
  • Hands-on exercise

Wrap-up and Q&A (10 minutes)