O'Reilly logo
live online training icon Live Online training

Cleaning data at scale

Boosting performance of industrial data science

Powered by Jupyter logo

Dr. Philip Winder

The internet is full of examples of how to train models. But the reality is that most of the time spent on industrial projects involves working with data. Thus, the largest improvements in performance can often be found through improving the underlying data.

In this hands-on three-hour course, expert Philip Winder teaches you fundamental techniques to improve and make the best use of your data. You'll learn how to impute missing data, clean corrupted data, remove anomalies, and convert features into a suitable format. You'll also discover how and why you should be transforming features and how to generate new features to boost performance.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • Why improving data quality improves results and performance
  • The many ways in which data can become corrupt
  • Why the type of data affects data cleaning
  • Why derived data can be better than the original

And you’ll be able to:

  • Determine when and how to clean data
  • Spot different types of corruption
  • Transform the data to produce better representations of the original
  • Clean all types of data: categorical, continuous, time series, etc.

This training course is for you because...

  • You're an engineer who has to clean and improve data to remove anomalies (such as for monitoring purposes).
  • You're a data scientist who has to clean and improve data to make solutions more robust, more performant, and simpler.


  • Familiarity with Python
  • A working knowledge of basic statistics

Recommended preparation:

Recommended follow-up:

About your instructor

  • Dr. Philip Winder is a multidisciplinary Engineer who creates data-driven software products. His work incorporates Data Science, Cloud Native and traditional software development using a range of languages and tools.

    Phil is the CEO of Winder, a Data Science consultancy in the UK, which operates throughout Europe delivering training, development and consultancy services. He has Ph.D. and a Masters degree in Electronics from the University of Hull, UK.


The timeframes are only estimates and may vary according to how the class is progressing