9.2 Analyze Existing Data

In addition to bringing your own data, you can use existing data on AnVIL. Using the following resources can help you discover data to use in your analyses.

9.2.1 AnVIL Data Library

The Datasets Library is a good place to get started and familiarize yourself with existing data. Here, you can find curated datasets from thousands of participants. Some of these are open access (such as the 1000 Genomes dataset) while others will require you to request access.

Image shows a screenshot of the Datasets Library landing page.

Taking a look at Featured Workspaces can get you started quickly. Remember that when you clone a Workspace, AnVIL automatically cross-links to the original data contained within the Data Tables.

Image shows a screenshot of the Featured Workspaces landing page.

Image shows a screenshot of the Featured Workspaces tab on AnVIL. The featured tab is highlighted.

9.2.2 AnVIL Dataset Catalog

The AnVIL Dataset Catalog displays key NHGRI datasets accessible in AnVIL, such as the CCDG (Centers for Common Disease Genomics), CMG (Centers for Mendelian Genomics), eMERGE (Electronic Medical Records and Genomics), as well as other relevant datasets. You will need to coordinate access to controlled data.

Image shows a screenshot of the AnVIL Dataset Catalog website landing page.

9.2.3 Gen3 Data Explorer

The Gen3 Data Explorer and Data Commons provides their API for data queries and downloads, supporting cross-project analyses. Gen3 provides access to open and protected datasets that can be exported to an AnVIL Workspace. For example, users can find the 1000 Genomes dataset on Gen3 and filter by ancestry, age, and other features prior to performing analyses on AnVIL.

Image shows a screenshot of the Gen3 on AnVIL Data Explorer website landing page.