O'Reilly logo
live online training icon Live Online training

Hands-on Machine Learning with Python: Clustering, Dimension Reduction, and Time Series Analysis

enter image description here

Matt Harrison

It's common knowledge that when undertaking a machine learning project, most of your time is spent preparing and tweaking your data so that the libraries and algorithms will work on it. But many don't know that you can take advantage of Python's optimized libraries to run your algorithms more quickly.

Join Matt Harrison for an overview of machine learning with Python using Jupyter and pandas—the same tools used throughout industry to prepare data to analyze. You'll review key Jupyter and pandas features, explore dimension reduction to visualize data and reduce datasets, and use clustering to group similar items together and see what features tie them together. Matt also demonstrates how to do time series forecasting with the Prophet library, helping you predict future performance from past observations.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • Basic machine learning tasks
  • How to use Python and Jupyter to perform machine learning

And you’ll be able to:

  • Use pandas to load and preprocess data
  • Run dimension reduction, clustering, and time series analysis

This training course is for you because...

  • You are a programmer and would like to see how to use Python for machine learning tasks of clustering, dimension reduction, and time series analysis.
  • You are a data scientist with experience in SAS or R and would like an introduction to the Python ecosystem

Prerequisites

  • Programming experience in any language
  • Familiarity with the Python programming language (useful but not required)

Materials or Downloads Needed in Advance:

  • A machine with Anaconda and Jupyter installed and set up. (Please try them out to get comfortable before the course.)

Recommended preparation:

Introduction to Pandas for Developers (video)

Learning the Pandas Library (book)

Chapters 2, and 5-10 of Python for Data Analysis (book)

About your instructor

  • Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Jupyter - 20 min

  • Explore the functionality you will need to be successful with Jupyter

Common Data Cleaning Operations - 30 min

  • Most machine learning algorithms require some data preparation to run. We will cover them here.

Break - 5 min

Dimension Reduction - 35 min

  • We will use scikit-learn and Yellowbrick to explore Principal Component Analysis. This is a powerful tool for dimensionality reduction, but also understanding the dataset and for visualization.

Break - 5 min

Clustering - 35 min

  • We will look at two clustering techniques to divide data into similar segments. We will use visualization to help determine the appropriate number of divisions.

Break - 5 min

Time series forecasting - 40 min

  • The Prophet library from Facebook is a powerful library for extracting trends from timeseries data and forecasting into the future. We will introduce it and use it to predict future events.

Conclusion/QA (10 minutes)