e7bfc71ae21d7da0b0b5001c651d5ba1586e99c6 lrnassar Wed Jun 29 18:44:29 2022 -0700 Creating metaDome track, refs #23883 diff --git src/hg/makeDb/trackDb/human/constraintSuper.html src/hg/makeDb/trackDb/human/constraintSuper.html index 8850ccb..20c5b5a 100644 --- src/hg/makeDb/trackDb/human/constraintSuper.html +++ src/hg/makeDb/trackDb/human/constraintSuper.html @@ -30,30 +30,38 @@ phylogenetic conservation metrics. The higher the score, the more deleterious the prediction.
  • HMC - Homologous Missense Constraint: Homologous Missense Constraint (HMC) is a amino acid level measure of genetic intolerance of missense variants within human populations. For all assessable amino-acid positions in Pfam domains, the number of missense substitutions directly observed in gnomAD (Observed) was counted and compared to the expected value under a neutral evolution model (Expected). The upper limit of a 95% confidence interval for the Observed/Expected ratio is defined as the HMC score. Missense variants disrupting the amino-acid positions with HMC<0.8 are predicted to be likely deleterious. +
  • + MetaDome - Tolerance Landscape Score (hg19 only): + MetaDome Tolerance Landscape scores are computed as a missense over synonymous + variant count ratio, which is calculated in a sliding window manner to provide + a per-position indication of regional tolerance to missense variation. The + variants are based on gnomAD and corrected for codon composition. Scores + <0.7 are considered intolerant. +
  • MTR - Missense Tolerance Ratio (hg19 only): Missense Tolerance Ratio (MTR) scores aim to quantify the amount of purifying selection acting specifically on missense variants in a given window of protein-coding sequence. It is estimated across sliding windows of 31 codons (default) and uses observed standing variation data from the WES component of gnomAD / the Exome Aggregation Consortium Database (ExAC), version 2.0. Scores were computed using Ensembl v95 release.

    Display Conventions and Configuration

    JARVIS

    JARVIS scores are shown as a signal ("wiggle") track, with one score per genome position. @@ -82,30 +90,60 @@

    HMC

    HMC scores are displayed as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The highly-constrained cutoff of 0.8 is indicated with a line.

    Interpretation: A protein residue with HMC score <1 indicates that missense variants affecting the homologous residues are significantly under negative selection (P-value < 0.05) and likely to be deleterious. A more stringent score threshold of HMC<0.8 is recommended to prioritize predicted disease-associated variants.

    +

    MetaDome

    +

    +MetaDome data can be found on two tracks, MetaDome and MetaDome All Data. +The MetaDome track should be used by default for data exploration. In this track +the raw data containing the MetaDome tolerance scores were converted into a signal ("wiggle") +track. Since this data was computed on the proteome, there was a small amount of coordinate +overlap, roughly 0.42%. In these regions the lowest possible score was chosen for display +in the track to maintain sensitivity. For this reason, if a protein variant is being evaluated, +the MetaDome All Data track can be used to validate the score. More information +on this data can be found in the MetaDome FAQ.

    +

    +Interpretation: The authors suggest the following guidelines for evaluating +intolerance. By default, the MetaDome track displays a horizontal line at 0.7 which +signifies the first intolerant bin. For more information see the MetaDome publication.

    + +

    + + + + + + + + + +
    ClassificationMetaDome Tolerance Score
    Highly intolerant≤ 0.175
    Intolerant≤ 0.525
    Slightly intolerant≤ 0.7
    +

    +

    MTR

    MTR data can be found on two tracks, MTR All data and MTR Scores. In the MTR Scores track the data has been converted into 4 separate signal tracks representing each base pair mutation, with the lowest possible score shown when multiple transcripts overlap at a position. Overlaps can happen since this score is derived from transcripts and multiple transcripts can overlap. A horizontal line is drawn on the 0.8 score line to roughly represent the 25th percentile, meaning the items below may be of particular interest. It is recommended that the data be explored using this version of the track, as it condenses the information substantially while retaining the magnitude of the data.

    Any specific point mutations of interest can then be researched in the MTR All data track. This track contains all of the information from @@ -139,88 +177,112 @@

    Scores were downloaded and converted to a single bigWig file. See the hg19 makeDoc and the hg38 makeDoc for more info.

    HMC

    Scores were downloaded and converted to .bedGraph files with a custom Python script. The bedGraph files were then converted to bigWig files, as documented in our makeDoc hg19 build log.

    +

    MetaDome

    +

    +The authors provided a bed file containing codon coordinates along with the scores. +This file was parse with a python script to create the two tracks. For the first track +the scores were aggregated for each coordinate, then the lowest score chosen for any +overlaps and the result written out to bedGraph format. The file was then converted +to bigWig with the bedGraphToBigWig utility. For the second track the file +was reorganized into a bed 4+3 and conveted to bigBed with the bedToBigBed +utility.

    +

    +See the hg19 makeDoc for details including the build script.

    +

    MTR

    V2 file was downloaded and columns were reshuffled as well as itemRgb added for the MTR All data track. For the MTR Scores track the file was parsed with a python script to pull out the highest possible MTR score for each of the 3 possible mutations at each base pair and 4 tracks built out of these values representing each mutation.

    See the hg19 makeDoc entry on MTR for more info.

    Data Access

    The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset.

    For automated download and analysis, the genome annotation is stored at UCSC in bigWig and bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tools bigWigToWig or bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tools can also be used to obtain features confined to a given range, e.g., -
    +

    bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/$db/hmc/hmc.bw stdout

    Please refer to our Data Access FAQ for more information.

    Credits

    -Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the JARVIS, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios and Slavé Petrovski for helping clean up the hg38 JARVIS files for providing guidance on interpretation. +Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the JARVIS, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios and Slave Petrovski for helping clean up the hg38 JARVIS files for providing guidance on interpretation. Additional +thanks to Laurens van de Wiel for providing the MetaDome data as well as guidance on the track development and interpretation.

    References

    Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun. 2021 Mar 8;12(1):1504. PMID: 33686085; PMC: PMC7940646

    Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, the SHaRe Investigators, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware Genetic constraint at single amino acid resolution improves missense variant prioritisation and gene discovery. Medrxiv 2022.02.16.22271023

    +Wiel L, Baakman C, Gilissen D, Veltman JA, Vriend G, Gilissen C. + +MetaDome: Pathogenicity analysis of genetic variants through aggregation of homologous human protein +domains. +Hum Mutat. 2019 Aug;40(8):1030-1038. +PMID: 31116477; PMC: PMC6772141 +

    + +

    Silk M, Petrovski S, Ascher DB. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res. 2019 Jul 2;47(W1):W121-W126. PMID: 31170280; PMC: PMC6602522