O'Reilly logo
live online training icon Live Online training

Medium R Programming: Beyond the Basics

Jared Lander

This online training is for those who already have a foundation in R and want to take their knowledge to the next level. You’ll learn how to create reproducible documents with RMarkdown for knowledge transfer. Then you’ll learn how to manipulate lists with purrr, reshape data.frames with tidyr, manipulate data in databases using dplyr and run code in parallel with doParallel. Additionally you learn how to read data from XML, JSON and scraped from websites. Lastly you’ll plot data on maps and globes using Leaflet and threejs.

What you'll learn-and how you can apply it

  • RMarkdown
  • Iterating over lists
  • Reshaping data
  • Dplyr in databases
  • Reading JSON and XML
  • Scraping Websites
  • Working in Parallel

This training course is for you because...

  • You want to learn more about sharing your code and results
  • You want more powerful data manipulation tools
  • You want to harvest all kinds of data

Prerequisites

  • Working knowledge in the basics of R

Recommended Preparation:

Chapters 1-5 in R for Everyone, Second Edition: https://www.safaribooksonline.com/library/view/r-for-everyone/9780134546988/

Course Set-up

About your instructor

  • Jared P. Lander is the Chief Data Scientist of Lander Analytics, a data science and artificial intelligence consulting and training firm based in New York City; the organizer of the New York Open Statistical Programming Meetup—the world’s largest R meetup—–and the New York R Conference); author of R for Everyone and an adjunct professor at Columbia University. With an M.A. from Columbia University in statistics and a B.S. from Muhlenberg College in mathematics, he has experience in both academic research and industry. Very active in the data community, Jared is a frequent speaker at conferences, universities and meetups around the world. His writings on statistics can be found at jaredlander.com and his work has been featured in publications such as Forbes and the Wall Street Journal.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Day 1

1 Reproducible Documents with RMarkdown (90 Minutes)

  • YAML Metadata Header
  • Markdown Primer
  • Sections
  • Subsections
  • Formatting Text
  • Lists
  • Links
  • Including R Code
  • R Chunks
  • R Chunk Options
  • Building Slideshows with RMarkdown

Break: 15 minutes

2 Iterating Over Data with purrr (45 Minutes)

  • Apply a function to each element of a list with purrr
  • Use lambda/inline functions for quick manipulations
  • Specify output type with map_int, map_dbl, map_chr and similar functions
  • Iterate without returning results with walk

3 Reshape Data with tidyr (30 minutes)

  • Switch between wide and long formats with gather and spread
  • Split and combine columns with unite and separate

Day 2

Day 1 recap and check-in for Day 2 (10 minutes)

4 Use dplyr on databases (30 minutes)

  • Connect to databases using the dbplyr package
  • Execute dplyr commands on databases
  • Join data from different sources

5 Run code in Parallel (45 minutes)

  • Iterate over data with a for loop
  • Create a parallel compute environment with doParallel
  • Iterate over data in parallel with foreach
  • Iterate in parallel automatically with parLapply

Break: 10 minutes

6 Read JSON and XML Data (20 minutes)

  • Read json data with tidyjson
  • Read XML data with xml2

7 Scrape Websites with rvest (20 minutes)

  • Read a webpage into R
  • Read an HTML Table
  • Inspect elements to identify CSS elements
  • Extract content with html_nodes

Break: 5 minutes

8 Plot Data on a Map and Globe (30 minutes)

  • Plot points of interest on a map with leaflet
  • Plot paths on a globe with threejs