Mutational Spectrum is an upcoming St. Jude Cloud tool and is not yet publicly available. See Mutational Spectrum on GitHub for more information.
|Authors||Scott Newman, Michael Macias|
|Technical Support||Contact Us|
Mutational Spectrum finds and quantifies COSMIC mutational signatures across samples. This is done by finding the optimal non-negative linear combination of mutation signatures to reconstruct a mutation matrix. It builds the initial mutation matrix from multiple single-sample VCFs and, by default, fits it to mutational signatures from COSMIC.
Mutational Spectrum supports both hg19 (GRCh37) and hg38 (GRCh38).
|VCF(s)||Array of files||List of VCF inputs. Can be single-sample or multi-sample and uncompressed or gzipped.||[
|Sample sheet||File||Tab-delimited file (no headers) with sample ID and tag pairs [optional]||
|Genome build||String||Genome build used as reference. Can be either "GRCh37" or "GRCh38". [default: "GRCh38"]||GRCh38|
|Minimum mutation burden||Integer||Minimum number of somatic SNVs a sample must have to be considered for analysis [default: 9]||15|
|Minimum signature contribution||Integer||Minimum number of mutations attributable to a single signature [default: 9]||100|
|Output prefix||String||Prefix to append to output filenames [optional]||mutspec|
|Disabled VCF column||Integer||VCF column (starting from sample names, zero-based) to ignore when reading VCFS [optional]||1|
|Raw signatures||File||Tab-delimited file of the raw results with sample contributions for each signature|
|Signatures visualization||File||HTML file for interactive plotting|
|Sample sheet||File||Tab-delimited file (no headers) with sample ID and tag pairs|
Mutational Spectrum runs four steps using subcommands of mutspec.
- Split VCFs (single or multi-sample) to multiple single-sample VCFs.
- If not given, generate a sample sheet from the directory of single-sample VCFs.
- Build a mutation matrix and reconstruct/fit it using COSMIC mutation signatures.
- Create a visualization file using the fitted signatures.
Mutational Spectrum only requires VCFs as inputs. This can be a single multi-sample VCF, multiple single-sample VCFs, or a combination of both. All other inputs are optional.
Sample sheet is a tab-delimited file (no headers) with two columns: the sample ID and a tag. The tag is an arbitrary identifier used to group the samples, typically a disease abbreviation or tissue of origin.
If not given, a sample sheet will be generated automatically.
Output prefix is the prefix to append to the output filenames. By default, if a single input VCF is given, its basename is used as the output prefix. If multiple input VCFs are given, a default "mutspec" prefix is used. This behavior can be overridden by a user-defined prefix.
|VCF(s)||Prefix||Output filename for raw signatures|
Disabled VCF column
Disabled VCF column is the column index to ignore when reading VCFs. This is useful when the inputs are tumor-normal VCFs, and one column should be ignored. Otherwise, the results would likely be duplicated.
The argument is a zero-based index relative to the sample names in the header
of the VCF. For example, in a VCF with samples
the germline sample (
SJEPD003_G) can be discarded by setting the disabled
VCF column to
Running the tool
Monitoring run progress
Analysis of results
Upon a successful run of Mutational Spectrum, three files are saved to the results directory: raw signature contributions, a visualization file, and a sample sheet.
Raw signatures is a tab-delimited file of the raw results with sample contributions for each signature. Column 1 is the sample name, columns 2-(N-1) are the COSMIC signatures contribution counts, and column N is the group tag, where N is the total number of columns. The number of columns is variable since if the signature has no contributions for all samples, it is completely omitted.
Note that the last column
tissue is a misnomer. It aligns to the arbitrary
tag given in the sample sheet.
Signatures visualization is an HTML file that can be used for interactive plotting.
When opened in a web browser, a set of controls allows plotting various stacked bar charts: total contributions by signature, total contributions by tag, and total contributions by sample per tag. The total contributions can be stacked as absolute values or as a percentage of the total.
When no sample sheet is given as an input, one is generated automatically, but it is not guaranteed the derived tags will be of any use. This generated sample sheet is given as an output in the case the tags need to be manually edited, and the job is resubmitted with it as an input.
When a sample sheet is given as an input, the sample sheet output is a copy of the input.
See also the description for the input sample sheet.