Getting started with pandas
Data ingestion, tweaking, and summarizing
Python’s pandas library can make your data or programming life easier because it enables painless ingestion, exporting, transformation, and visualization of your data. It’s no surprise then that pandas is very popular among data scientists, quants, Excel junkies, and Python developers. But if you're only familiar with Python, you may encounter a few gotchas as you get started with pandas.
Join Matt Harrison to jumpstart your pandas journey. By the end of this three-hour hands-on course, you’ll be importing, exploring, and tweaking data with pandas, using the Jupyter Notebook as the basis for your exploratory analysis. You’ll also be prepared for the second course in this series, Mastering pandas, where you’ll learn more advanced skills, such as filtering, plotting, and pivoting your data.
Special Note: This course is paired with Mastering pandas: Visualization, missing data, and pivoting. Although these courses are designed to be taken in either order, we recommend taking that course after this one.
What you'll learn-and how you can apply it
By the end of this live online course, you’ll understand:
- How to use Jupyter to interact with Python scripts
- How pandas can make life easier for data scientists and programmers
And you’ll be able to:
- Import, explore, and tweak data with pandas
- Understand how to get help when you get stuck
- Practice debugging doing analytics with pandas
This training course is for you because...
- You're a data scientist with experience in R or SAS who wants to learn about pandas and the Python ecosystem.
- You're a developer with programming experience in Python who wants to start using pan
- All of the coding exercises in the course will be hosted on JupyterHub, and we'll send the URL out at the start of class. Purely browser-based, no installations required.
Materials or downloads needed in advance:
To test whether you will be able to run the jupyter notebooks in your upcoming training, please:
Navigate here: https://attendee-testing.oreilly-jupyterhub.com (This is the link to the test site)
- Sign in with your Safari credentials
- Click "start my server"
Click on "notebook .ipynb"
Run each of the code cells: click the cell then either press Shift+Return or click the triangle in the top menu
There may be a few second delay, but you should eventually see the graphs. If you do not, this probably means that your firewall is blocking JupyterHub's websockets. Please turn off your company VPN or speak with your system administrator to allow.
About your instructor
Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.
The timeframes are only estimates and may vary according to how the class is progressing
Set up and introduction to Jupyter (15 minutes)
- Lecture: Jupyter features
Introduction to pandas (10 minutes)
- Lecture: pandas basic data structures
Loading data (25 minutes)
- Lecture: Ingesting data from the web and CSV files; exploring some of the options for manipulation during loading
- Hands-on exercise: Load data
Break (10 minutes)
Inspecting data (30 minutes)
- Lecture: Examining your data, characterizing it, and seeing what it looks like
- Hands-on exercise: Inspect your data
Tweaking data (30 minutes)
- Lecture: Changing the types of the values for your data, fixing them, or ignoring them
- Hands-on exercise: Tweak your data
Break (10 minutes)
Basic stats (40 minutes)
- Lecture: The functionality that pandas provides to easily look at descriptive analytics for your data
- Hands-on exercise: Use basic stats to gain insight from your data
Wrap-up and Q&A (10 minutes)