
Chapter 13 Why use AnVIL?
The NHGRI AnVIL (Genomic Data Science Analysis, Visualization, and Informatics Lab-space) is a project powered by Terra for biomedical researchers to access data, run analysis tools, and collaborate. Both biology researchers and educators can benefit from using AnVIL (anvil.terra.bio) for their research and in the classroom.
This guide acts as a resource answering the question “why use AnVIL?”. It will discuss the research, classroom, and general benefits of using AnVIL and point to related resources throughout.
13.1 Benefits of using AnVIL for research
13.1.1 Ease of platform access
The primary means of accessing the AnVIL platform (anvil.terra.bio) is through a web browser - users do not need to download data or install software.
13.1.2 Variety of analysis solutions
AnVIL supports an assortment of frameworks and tools. Researchers can use their favorite tool to work with data interactively or through non-interactive batch processing. Due to this variety and interoperability with other platforms, researchers can stay within a single environment for their analysis without having to shift between platforms.
13.1.3 Data: yours or cloud-hosted open & controlled access
AnVIL securely stores diverse, open and controlled access, cloud-hosted datasets with a browsable summary catalog so researchers can identify relevant datasets they may need to request access to.
13.1.4 Data & analysis in same place
AnVIL is a unified computing environment for data storage, management, and analysis. The AnVIL portal serves as an entry point to access all parts of the AnVIL system as well as training materials and announcements.
13.1.5 Scalability
AnVIL is conducive to analysis at massive scale and for data exploration and training. Researchers get access to dedicated compute resources, avoiding queue time and lack of access at some institutions. Researchers can also launch light environments or run test analyses without incurring much cost or spending a lot of time to configure.
13.1.6 Rent needed resources
AnVIL allows you to rent the computational resources that you need for occasional high demand needs rather than obtaining and maintaining the same resources yourself or paying a subscription for an allocation/constant access (with little consistent use over time). AnVIL can provide different hardware and software setups, rather than preparing the environment yourself (or relying on an institutional core to do it and waiting in the queue).
13.1.7 Role-based permissions
Group management can be utilized to control who can access specific data, analysis workspaces, and your billing resources. Workspaces provide a collaborative environment with role-based permissions. These permission include reading, writing, or owning with additional permissions for running compute and sharing. Especially within the contexts of working with sensitive data or large amounts of data, AnVIL’s role-based group management permission structure is instrumental.
13.1.9 Repository compliant with DMS Policy
The AnVIL serves as a cloud data repository compliant with the Data Management and Sharing (DMS) Policy. Data access controls can be specified to limit data access and use.
By submitting their data to AnVIL, not only can researchers meet the requirements of DMS Policy, they can also contribute to the expanding network of NIH funded data housed in the AnVIL, furthering scientific discovery.
13.2 Benefits of using AnVIL in the classroom
AnVIL provides all the advantages of a cloud computing environment, such as version control and offering a unified computing system without providing physical computers with certain specifications. Additionally, AnVIL provides students with authentic experience working in the cloud – which is becoming common in today’s research environment. Students can also gain experience with a variety of tools (e.g., Galaxy, RStudio, Jupyter notebooks, WDL workflows) all in one place while working with relevant datasets and prepared exercises.
13.3 Overall benefits of AnVIL
13.3.1 Ability to control costs
Cloud computing is not free and estimating costs may seem daunting to those considering use of the AnVIL. However, Terra provides thorough, transparent documentation explaining data storage and cloud computing costs and has been working to improve transparency and management of costs for AnVIL users through cost reporting, cost controls and estimates, and cost optimizations. Additionally, in order to debug or benchmark your work, analyses or workflows can be tested with smaller scale test datasets or light environments without incurring much cost or spending a lot of time to configure environments.
13.3.2 Work with protected data safely
Due to AnVIL maintaining compliance with FedRAMP policies, clinical data containing PHI and PII can be safely and securely stored and analyzed on AnVIL. This includes the ability to export data from clinical data collection and management tools like REDCap and import it into AnVIL Terra Tables.
13.3.3 Maintenance is handled
Since AnVIL handles the support and maintenance of the platform (including the hardware and software), you can focus on performing your work on AnVIL rather than setting up and maintaining the platform, freeing up effort for your science. This is immensely valuable for researchers who do not have deep institutional IT and system administrator support for research infrastructure.
13.3.4 Training and support is available
To equip researchers and students to work on the AnVIL, the AnVIL team
- provides and maintains training materials and documentation in multiple formats (ex: Getting Started on AnVIL),
- moderates a support forum
- hosts demos (ex: https://anvilproject.org/events/anvil2023-december-demos)
- hosts workshops (ex: https://anvilproject.org/events/anvil2024-nhgri-intramural-workshop)
13.3.5 Collaborative community
AnVIL has begun hosting community conferences to collaboratively innovate during CoFests! and to discuss research performed with the platform. The community can work directly with the AnVIL team to understand current development, feature requests, and a roadmap or future directions for the platform.
Additionally, AnVIL values and routinely solicits user feedback to improve the user experience and provide the most beneficial features and enhancement for biomedical research. Feedback is gathered:
- at the community conference
- through State of the AnVIL community polls
- through voluntary user interviews
- 24/7 at the support forum help.anvilproject.org