The major educational initiative of the JHUDSL is to create open-source online courses delivered through a range of platforms including Youtube, Github, Leanpub, and Coursera. We currently have four active MOOC programs that you can enroll in at any time. Join over 8 million other students in taking a course produced by the Johns Hopkins Data Science Lab!


Tidyverse Skills for Data Science in R

This Specialization is intended for data scientists with some familiarity with the R programming language who are seeking to do data science using the tidyverse family of packages. Through 5 courses, you will cover importing, wrangling, visualizing, and modeling data using the powerful Tidyverse framework. The Tidyverse packages provide a simple but powerful approach to data science which scales from the most basic analyses to massive data deployments. This course covers the entire life cycle of a data science project and presents specific tidy tools for each stage. At completion, students will have a portfolio demonstrating their mastery of the material.


Class Course Page Links
Introduction to the Tidyverse Coursera course bookdown chapter
Importing data in the Tidyverse Coursera course bookdown chapter
Wrangling data in the Tidyverse Coursera course bookdown chapter
Visualizing data in the Tidyverse Coursera course bookdown chapter
Modeling Data in the Tidyverse Coursera course bookdown chapter

Data Science

This Specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.


Class Course Page Links
Data Scientist’s Toolbox Coursera Lecture notes
R programming Coursera Lecture notes, Book
Getting and Cleaning Data Coursera Lecture notes
Exploratory Data Analysis Coursera Lecture notes, Book
Reproducible Research Coursera Lecture notes, Book
Statistical Inference Coursera Lecture notes, Book
Regression models Coursera Lecture notes, Book
Practical Machine Learning Coursera Lecture notes
Developing Data Products Coursera Lecture notes, Book

Genomic Data Science

This specialization covers the concepts and tools to understand, analyze, and interpret data from next generation sequencing experiments. It teaches the most common tools used in genomic data science including how to use the command line, Python, R, Bioconductor, and Galaxy. The sequence is a stand alone introduction to genomic data science or a perfect compliment to a primary degree or postdoc in biology, molecular biology, or genetics.

Class Course Page Links
Introduction to Genomic Technologies Coursera
Genomic Data Science With Galaxy Coursera Videos
Python for Genomic Data Science Coursera
Algorithms for DNA Sequencing Coursera Videos, Slides, Notebooks
Command Line Tools for Genomic Data Science Coursera
Bioconductor for Genomic Data Science Coursera Book, Videos, Notes
Statistics for Genomic Data Science Coursera Notes, R package

Executive Data Science

In four intensive courses, you will learn what you need to know to begin assembling and leading a data science enterprise, even if you have never worked in data science before. You’ll get a crash course in data science so that you’ll be conversant in the field and understand your role as a leader. You’ll also learn how to recruit, assemble, evaluate, and develop a team with complementary skill sets and roles. You’ll learn the structure of the data science pipeline, the goals of each stage, and how to keep your team on target throughout. Finally, you’ll learn some down-to-earth practical skills that will help you overcome the common challenges that frequently derail data science projects.

Class Course Page Links
A Crash Course in Data Science Coursera Book
Building a Data Science Team Coursera Book
Managing Data Analysis Coursera Book
Data Science in Real Life Coursera Book

Mastering Software Development in R

This Specialization covers R software development for building data science tools. As the field of data science evolves, it has become clear that software development skills are essential for producing useful data science results and products. You will obtain rigorous training in the R language, including the skills for handling complex data, building R packages and developing custom data visualizations. You will learn modern software development practices to build tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers.

Class Course Page Links
The R Programming Environment Coursera Book
Advanced R programming Coursera Book
Building R packages Coursera Book
Building Data Visualization Tools Coursera Book

Advanced Statistics

Class Course Page Links
Advanced Linear Models for Data Science 1: Least Squares Coursera Book
Advanced Linear Models for Data Science 2: Statistical Linear Models Coursera Book