8c2f7318d8d821de9b2a25750586a94ab5e8c1bb lrnassar Fri Nov 15 18:50:19 2024 -0800 Giving the UI link cronjob some love by fixing all the 301 redirects. These are the bulk of the items listed on the cron. No RM. diff --git src/hg/makeDb/trackDb/human/constraintSuper.html src/hg/makeDb/trackDb/human/constraintSuper.html index 6c7688b..52ba63f 100644 --- src/hg/makeDb/trackDb/human/constraintSuper.html +++ src/hg/makeDb/trackDb/human/constraintSuper.html @@ -1,324 +1,324 @@
The "Constraint scores" container track includes several subtracks showing the results of constraint prediction algorithms. These try to find regions of negative selection, where variations likely have functional impact. The algorithms do not use multi-species alignments to derive evolutionary constraint, but use primarily human variation, usually from variants collected by gnomAD (see the gnomAD V2 or V3 tracks on hg19 and hg38) or TOPMED (contained in our dbSNP tracks and available as a filter). One of the subtracks is based on UK Biobank variants, which are not available publicly, so we have no track with the raw data. The number of human genomes that are used as the input for these scores are 76k, 53k and 110k for gnomAD, TOPMED and UK Biobank, respectively.
Note that another important constraint score, gnomAD constraint, is not part of this container track but can be found in the hg38 gnomAD track.
The algorithms included in this track are:JARVIS scores are shown as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The scores were downloaded and converted to a single bigWig file. Move the mouse over the bars to display the exact values. A horizontal line is shown at the 0.733 value which signifies the 90th percentile.
See hg19 makeDoc and hg38 makeDoc.Interpretation: The authors offer a suggested guideline of > 0.9998 for identifying higher confidence calls and minimizing false positives. In addition to that strict threshold, the following two more relaxed cutoffs can be used to explore additional hits. Note that these thresholds are offered as guidelines and are not necessarily representative of pathogenicity.
Percentile | JARVIS score threshold |
---|---|
99th | 0.9998 |
95th | 0.9826 |
90th | 0.7338 |
HMC scores are displayed as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The highly-constrained cutoff of 0.8 is indicated with a line.
Interpretation: A protein residue with HMC score <1 indicates that missense variants affecting the homologous residues are significantly under negative selection (P-value < 0.05) and likely to be deleterious. A more stringent score threshold of HMC<0.8 is recommended to prioritize predicted disease-associated variants.
Interpretation: The authors suggest the following guidelines for evaluating intolerance. By default, the MetaDome track displays a horizontal line at 0.7 which signifies the first intolerant bin. For more information see the MetaDome publication.
Classification | MetaDome Tolerance Score |
---|---|
Highly intolerant | ≤ 0.175 |
Intolerant | ≤ 0.525 |
Slightly intolerant | ≤ 0.7 |
MTR data can be found on two tracks, MTR All data and MTR Scores. In the MTR Scores track the data has been converted into 4 separate signal tracks representing each base pair mutation, with the lowest possible score shown when multiple transcripts overlap at a position. Overlaps can happen since this score is derived from transcripts and multiple transcripts can overlap. A horizontal line is drawn on the 0.8 score line to roughly represent the 25th percentile, meaning the items below may be of particular interest. It is recommended that the data be explored using this version of the track, as it condenses the information substantially while retaining the magnitude of the data.
Any specific point mutations of interest can then be researched in the MTR All data track. This track contains all of the information from - + MTRV2 including more than 3 possible scores per base when transcripts overlap. A mouse-over on this track shows the ref and alt allele, as well as the MTR score and the MTR score percentile. Filters are available for MTR score, False Discovery Rate (FDR), MTR percentile, and variant consequence. By default, only items in the bottom 25 percentile are shown. Items in the track are colored according to their MTR percentile:
Interpretation: Regions with low MTR scores were seen to be enriched with pathogenic variants. For example, ClinVar pathogenic variants were seen to have an average score of 0.77 whereas ClinVar benign variants had an average score of 0.92. Further validation using the FATHMM cancer-associated training dataset saw that scores less than 0.5 contained 8.6% of the pathogenic variants while only containing 0.9% of neutral variants. In summary, lower scores are more likely to represent pathogenic variants whereas higher scores could be pathogenic, but have a higher chance to be a false positive. For more information see the MTR-Viewer publication.
Scores were downloaded and converted to a single bigWig file. See the hg19 makeDoc and the hg38 makeDoc for more info.
Scores were downloaded and converted to .bedGraph files with a custom Python script. The bedGraph files were then converted to bigWig files, as documented in our makeDoc hg19 build log.
The authors provided a bed file containing codon coordinates along with the scores.
This file was parsed with a python script to create the two tracks. For the first track
the scores were aggregated for each coordinate, then the lowest score chosen for any
overlaps and the result written out to bedGraph format. The file was then converted
to bigWig with the bedGraphToBigWig
utility. For the second track the file
was reorganized into a bed 4+3 and conveted to bigBed with the bedToBigBed
utility.
See the hg19 makeDoc for details including the build script.
The raw MetaDome data can also be accessed via their Zenodo handle.
-V2 +V2 file was downloaded and columns were reshuffled as well as itemRgb added for the MTR All data track. For the MTR Scores track the file was parsed with a python script to pull out the highest possible MTR score for each of the 3 possible mutations at each base pair and 4 tracks built out of these values representing each mutation.
See the hg19 makeDoc entry on MTR for more info.
The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset.
For automated download and analysis, the genome annotation is stored at UCSC in bigWig and bigBed
files that can be downloaded from
our download server.
Individual regions or the whole genome annotation can be obtained using our tools bigWigToWig
or bigBedToBed which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tools can also be used to obtain features confined to a given range, e.g.,
bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/$db/hmc/hmc.bw stdout
Please refer to our Data Access FAQ for more information.
Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the JARVIS, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios and Slave Petrovski for helping clean up the hg38 JARVIS files for providing guidance on interpretation. Additional thanks to Laurens van de Wiel for providing the MetaDome data as well as guidance on the track development and interpretation.
Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun. 2021 Mar 8;12(1):1504. PMID: 33686085; PMC: PMC7940646
Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, the SHaRe Investigators, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware Genetic constraint at single amino acid resolution improves missense variant prioritisation and gene discovery. Medrxiv 2022.02.16.22271023
Wiel L, Baakman C, Gilissen D, Veltman JA, Vriend G, Gilissen C. MetaDome: Pathogenicity analysis of genetic variants through aggregation of homologous human protein domains. Hum Mutat. 2019 Aug;40(8):1030-1038. PMID: 31116477; PMC: PMC6772141
Silk M, Petrovski S, Ascher DB. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res. 2019 Jul 2;47(W1):W121-W126. PMID: 31170280; PMC: PMC6602522
Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, Palsson G, Hardarson MT, Oddsson A, Jensson BO et al. The sequences of 150,119 genomes in the UK Biobank. Nature. 2022 Jul;607(7920):732-740. PMID: 35859178; PMC: PMC9329122