O'Reilly logo
live online training icon Live Online training

Statistical Literacy: Linear Models as a Unifying Concept (using R)

Uncovering the foundational concepts that link inferential statistics to deep learning

Rick Scavetta

In this course, the big idea is to understand linear models as an essential underlying concept for many statistical and machine learning techniques. They provide a unifying framework that unites basic equations like the mean and variance all the way up to modern, complex processes like deep learning.

We’ll cover the fundamentals of linear models and reveal major themes in data analysis using insightful connections and examples.

As in “Statistical Literacy: Inferential Statistics using R”, this course also focuses on developing a deeper understanding of the key concepts that unite what seems, to new-comers, as disparate techniques. This reveals the underlying concepts and lays the foundation for further study.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • What linear models are
  • The mean and two-sample t-tests as linear models
  • Models as best-guess predictions
  • The curse of dimensionality
  • The minimization of loss functions (residuals)
  • Similarities among equations for various situations
  • Bias-variance trade-off
  • Complex methods as elaborations of concepts present in simple linear models

And you’ll be able to:

  • Understand reported results based on linear models
  • Have a solid basis for further independent study

This training course is for you because...

  • You encounter linear models but are unclear of what they mean.
  • You don’t understand that the mean & variance, t-tests, Ordinary Least Squares regression and ANOVA are literally built on the same fundamental concepts.
  • You don’t see how more complex or reiterative methods like clustering, gradient descent and deep learning are also connected to linear models.
  • You apply linear models, but are unclear how to interpret the results.

Prerequisites

  • Basic-to-intermediate knowledge of R and RStudio
  • An understanding of fundamental concepts in statistics that are useful but not explicitly covered in this course:
    • Simple random samples
    • Systematic vs random error & types of selection bias
    • Measures for location and spread

Recommended preparation:

  • Datasets used will be built-in datasets available in R or provided via a GitHub repository for use after the class.
  • An RStudio account is needed for the in-course exercises. RStudio Cloud projects, pre-loaded with exercise scripts and datasets, will be provided shortly before the course.

About your instructor

  • Rick Scavetta has worked as an independent data science trainer since 2012. Operating as Scavetta Academy, Rick has a close and recurring presence at primary research institutes all over Germany, including many Max Planck Institutes and Excellence Clusters, in fields as varied as primatology, earth sciences, marine biology, molecular genetics, and behavioral psychology.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (20 minutes)

  • Discussion: What are models? Linear models? Where do they appear?
  • Lecture: Overview of methods explored in this course
  • Q&A

Classic OLS regression (60 minutes)

  • Lecture: Defining models, bias-variance trade-off, minimizing loss functions
  • Demonstration: The basics of linear models in R
  • Hands-on Exercise: Coding OLS regression from scratch
  • Q&A
  • 5 minute break

Other statistical tests (30 minutes)

  • Lecture: Understanding two-sample t-tests and ANOVA, the curse of dimensionality
  • Discussion: Similarities to regression
  • Demonstration: Executing t-test and ANOVA as linear models
  • Hands-on Exercise: Performing tests in R
  • Q&A

Extending linear models (30 minutes)

  • Lecture: Elaborating on simple models for regression and ANOVA
  • Hands-on Exercises: Exploring model forms in R
  • Q&A
  • 5 minute break

Session 4 - Complex Methods (ca 30 minutes)

  • Lecture: Analytical versus reiterative approaches to minimize the loss function.
  • Discussion: Linear models as the basis for advanced methods.
  • Exercise: Executing advanced methods in R
  • Wrap-up and Q&A