4bf8cd5b339e46ed05d34677f0fed28eb2b03463 lrnassar Thu May 26 17:14:43 2022 -0700 Setting MTR track to QA ready. Choosing new .ra setting to highlight low values, updating the make script, and improving the desc page. Refs #29152 diff --git src/hg/makeDb/trackDb/human/constraintSuper.html src/hg/makeDb/trackDb/human/constraintSuper.html index 7c40d0a..bfdb885 100644 --- src/hg/makeDb/trackDb/human/constraintSuper.html +++ src/hg/makeDb/trackDb/human/constraintSuper.html @@ -1,165 +1,167 @@

Description

The "Constraint scores" container track includes several subtracks showing the results of constraint prediction algorithms. These try to find regions of negative selection, where variations likely have functional impact. The algorithms do not use multi-species alignments to derive evolutionary constraint, but use primarily human variation, usually from variants collected by gnomAD (see the gnomAD V2 or V3 tracks on hg19 and hg38) or TOPMED (contained in our dbSNP tracks and available as a filter). Another constraint score, gnomAD constraint, is not part of this container but can be found in the hg38 gnomAD track. The algorithms covered here are:

JARVIS - "Junk" Annotation genome-wide Residual Variation Intolerance Score: First scan the entire genome with a sliding-window approach (using a 1-nucleotide step), recording the number of all TOPMED variants and common variants, irrespective of their predicted effect, within each window, to eventually calculate a single-nucleotide resolution genome-wide residual variation intolerance score (gwRVIS). Then combine gwRVIS, primary genomic sequence context, and additional genomic annotations with a multi-module deep learning framework to infer pathogenicity of noncoding regions that still remains naive to existing phylogenetic conservation metrics. The higher the score, the more deleterious is the prediction.
HMC - Homologous Missense Constraint: Homologous Missense Constraint (HMC) is a amino acid level measure of genetic intolerance of missense variants within human populations. For all assessable amino-acid positions in Pfam domains, the number of missense substitutions directly observed in gnomAD (Observed) was counted and compared to the expected value under a neutral evolution model (Expected). The upper limit of a 95% confidence interval for the Observed/Expected ratio is defined as the HMC score. Missense variants disrupting the amino-acid positions with HMC<0.8 are predicted to be likely deleterious
MTR - Missense Tolerance Ratio (hg19 only): Missense Tolerance Ratio (MTR) scores aim to quantify the amount of purifying selection acting specifically on missense variants in a given window of protein-coding sequence. It is estimated across sliding windows of 31 codons (default) and uses observed standing variation data from the WES component of gnomAD / the Exome Aggregation Consortium Database (ExAC), version 2.0. Scores were computed using Ensembl v95 release

Display Conventions and Configuration

JARVIS

JARVIS scores are the scores as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The scores were downloaded and converted to a single bigWig file. See hg19 makeDoc and hg38 makeDoc.

HMC

HMC scores are displayed as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The highly-constrained cutoff of 0.8 is indicated with a line.

The HMC scores were downloaded and converted to .bedGraph files with a custom Python script. The bedGraph files were then converted to bigWig files, as documented in our makeDoc hg19 build log.

MTR

MTR data can be found on two tracks, MTR All data and MTR Scores. In the MTR Scores track the data has been converted into 4 separate signal tracks representing each base pair mutation, with the lowest possible score represented when -multiple transcripts overlap. It is recommended that the data be explored using +multiple transcripts overlap. A horizontal line is drawn on the 0.8 score line +to roughly represent the 25th percentile, meaning the items below may be of particular +interest. It is recommended that the data be explored using this version of the track, as it condenses the information substatially while retaining the magnitude of the data.

Any specific point mutations of interest can then be researched in the MTR All data track. This track contains all of the information from MTRV2 including more than 3 possible scores per base when transcripts overlap. A mouse-over on this track shows the ref and alt allele, as well as the MTR score and the MTR score percentile. Filters are available for MTR score, False Discovery Rate (FDR), MTR percentile, and variant consequence. By default, only items in the bottom 25 percentile are shown. Items in the track are colored according to their MTR percentile:

Green items MTR percentiles over 75
Black items MTR percentiles between 25 and 75
Red items MTR percentiles below 25
Blue items No MTR score

Interpretation: Regions with low MTR scores were seen to be enriched with pathogenic variants. For example, ClinVar pathogenic variants were seen to have an average score of 0.77 whereas ClinVar benign variants had an average score of 0.92. Further validation using the FATHMM cancer-associated training dataset saw that scores less than 0.5 contained 8.6% of the pathogenic variants while only containing 0.9% of neutral variants. In summary, lower scores are more likely to represent pathogenic variants whereas higher scores could be pathogenic, but have a higher chance to be a false positive. For more information see the MTR-Viewer publication.

Methods

HMC

Scores were downloaded and converted to .bedGraph files with a custom Python script. The bedGraph files were then converted to bigWig files, as documented in our makeDoc hg19 build log.

Jarvis

Scores were downloaded and converted to a single bigWig file. See hg19 makeDoc and hg38 makeDoc

MTR

V2 file was downloaded and columns were reshuffled as well as itemRgb added for the MTR All data track. For the MTR Scores track the file was parsed with a python script to pull out the highest possible MTR score for each of the 3 possible mutations at each base pair and 4 tracks built out of these values representing each mutation.

See the hg19 makeDoc entry on MTR for more info.

Credits

Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the Jarvis, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios for helping clean up the hg38 Jarvis files.

References

Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun. 2021 Mar 8;12(1):1504. PMID: 33686085; PMC: PMC7940646

Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, the SHaRe Investigators, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware Genetic constraint at single amino acid resolution improves missense variant prioritisation and gene discovery. Medrxiv 2022.02.16.22271023

Silk M, Petrovski S, Ascher DB. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res. 2019 Jul 2;47(W1):W121-W126. PMID: 31170280; PMC: PMC6602522