Making a Data Request
- Data in St. Jude Cloud is grouped into different Data Access Units (DAUs) which usually correspond to large-scale sequencing initiatives at St. Jude.
- Individuals can apply for access to DAUs on a case-by-case basis for a specific amount of time (usually 1 year).
- Access to data in a given DAU is assessed by the corresponding Data Access Committee who reviews a variety of factors to grant access.
Creating a data request is the premier way to access raw St. Jude next generation sequencing data in the cloud. You can get a free copy of the data in a secure cloud environment powered by Microsoft Azure and DNAnexus, or you can elect to download the data to your local computing environment.
If you would like to download the data to local storage, there are extra steps you'll need to follow such as getting additional signatures on your data access agreement. We recommend that you work with the data in the cloud if it's feasible; the data provided by St. Jude is free, the compute charges are reasonable, and working in the cloud helps to eliminate the long, error-prone downloading process. Porting your tools to be run in the cloud is easy, as well. We recommend you follow this guide to get started.
There are two ways to make your data selection. You can peruse our raw genomic data by diagnosis, publication, or dataset using our Data Browser, a tabular view with a number of filtering options. Or you can select samples associated with specific diagnoses, gene expression, or gene mutations while exploring curated data from the donut and bubble charts on the Pediatric Cancer portal (PeCan) homepage.
Selecting Data in the Data Browser
Go to the Data Browser here, or navigate there from the St. Jude Cloud home page by clicking Access Data and then Explore Data.
From the Data Browser, you can view samples grouped by Diagnosis, Publication, or Dataset by toggling the tabs above the table. Use the search bar to look for something specific. Search the publication tab by title or pubmed ID.
You can further refine your data selection by using the filters for sequencing type, sample type, file type, and tissue type on the left side bar. Filters of the same type apply using “OR” logic. Filters of different types apply using “AND” logic. Note that filtering is dynamic, so as you make selections the table will update to show all of the files we have that match your filters. Filters reset when you move from tab to tab.
The summary panel above the filters in the left side bar shows statistics about the data currently displayed in the table. As you can see in gifs above, this panel updates as you change what data is displayed by switching tabs, searching, filtering, or making selections.
Selecting Data via PeCan
Using these visualizations along with ProteinPaint, you can:
- Add samples to your cart by diagnosis.
- Add samples to your cart by gene mutation.
- Add samples to your cart by gene expression.
Clicking Submit to SJCloud from the PeCan checkout window will land you back in the Data Browser with your checked out data selected.
Once you have made your selections, click the red Request Data button at the bottom of the table.
You must have created an account and be logged in to make a data request. If you have not yet created an account or you are not logged in, the red Request Data button will say Log In.
On the Request Data page fill out your name, institution, and project name. Give your data request a project name that makes sense to you as this will be the name of the DNAnexus project to which the data will be vended.
Finally, click the green button. If you already have access to the data you selected in the browser, the button will read Get Data Now. If you are submitting a DAA and requesting data access, the button will read Submit Request.
Applying for Data Access
If you are requesting access to a dataset you have not yet been approved for, you will see a section called Controlled Access Data on the Request Data page (see image above). Under this section, there is a bulleted list indicating the dataset(s) or Data Access Unit(s) you must request access to by submitting a form called the Data Access Agreememnt (DAA). Please use this list to fill in the Datasets section of the DAA. For more information on filling out the DAA, see Filling out the DAA. You must upload a DAA to proceed.
Managing Your Access
Clicking the Submit Request from the Request Data page will direct you to the Manage Data page where you can see the status of the data request you just made, as well the history of any of your previous data requests.
If you already have access to the data that you requested, your data will be vended to you immediately. Otherwise, the status of your request will say Pending while your request is routed to the respective Data Access Committee(s) for evaluation. Request approval typically takes a week or two if your data access agreement is correctly and completely filled out. You will receive automated emails from firstname.lastname@example.org at the time that your request is received and once your request is approved.
If you receive an email from us that your DAA is incomplete, you may edit your DAA and upload the revised copy using the 'Add a Form' button the on Manage Data page.
Viewing Your Data
From the Manage Data page, you can click on a request to navigate to the DNAnexus platform where a project will have been created with the project name that you entered on the Request Data page. Once your request is approved, the data will be vended to your DNAnexus account and will be accessible in this project. You can also follow the link in the email from email@example.com to view your DNAnexus project page. When the data is vended, the directory structure will typically look something like:
project_space/ ├── restricted/ │ ├── bam/ │ ├── gVCF/ │ ├── Somatic_VCF/ │ └── CNV/ └── SAMPLE_INFO.txt
SAMPLE_INFO.txt file provides all the metadata associated with the request, and the restricted folder contains all the data for which you were approved separated by file type.