680c0cb830ff738f2e9e8fe559a0b2cb894945a6 jnavarr5 Fri May 30 15:42:30 2025 -0700 Adding a hypen for 'non-coding', and add a link to the references section on the track description page for the linsight track, refs #35730 diff --git src/hg/makeDb/trackDb/human/constraintSuper.html src/hg/makeDb/trackDb/human/constraintSuper.html index 19a058b1431..106f16aed8d 100644 --- src/hg/makeDb/trackDb/human/constraintSuper.html +++ src/hg/makeDb/trackDb/human/constraintSuper.html @@ -57,37 +57,37 @@

MTR - Missense Tolerance Ratio (hg19 only): Missense Tolerance Ratio (MTR) scores aim to quantify the amount of purifying selection acting specifically on missense variants in a given window of protein-coding sequence. It is estimated across sliding windows of 31 codons (default) and uses observed standing variation data from the WES component of gnomAD version 2.0. Scores were computed using Ensembl v95 release. The number of gnomAD 2 exomes used here is higher than the number of gnomAD 3 samples (125 exoms versus 76k full genomes), and this score only covers coding regions so gnomAD 2 was more appropriate.

LINSIGHT (hg19 only): LINSIGHT is a statistical model for estimating negative selection on noncoding sequences in the human genome. The LINSIGHT score measures the - probability of negative selection on noncoding sites which can be used to + probability of negative selection on non-coding sites which can be used to prioritize SNVs associated with genetic diseases or quantify evolutionary constraint on regulatory sequences, e.g., enhancers or promoters. More - specifically, if a noncoding site is under negative selection, it will be + specifically, if a non-coding site is under negative selection, it will be less likely to have a substitution or SNV in the human lineage. In addition, even if we see a SNV at the site, it will tend to segregate at - low frequency because of selection. See (Huang et al, Nat Genet 2017). + low frequency because of selection. See (Huang et al, Nat Genet 2017).

UK Biobank depletion rank score (hg38 only): Halldorsson et al. tabulated the number of UK Biobank variants in each 500bp window of the genome and compared this number to an expected number given the heptamer nucleotide composition of the window and the fraction of heptamers with a sequence variant across the genome and their mutational classes. A variant depletion score was computed for every overlapping set of 500-bp windows in the genome with a 50-bp step size. They then assigned a rank (depletion rank (DR)) from 0 (most depletion) to 100 (least depletion) for each 500-bp window. Since the windows are overlapping, we plot the value only in the central 50bp of the 500bp window, following advice from the author of the score, Hakon Jonsson, deCODE Genetics. He suggested that the value of the central window, rather than the worst possible score of all overlapping windows, is @@ -274,30 +274,31 @@

Please refer to our Data Access FAQ for more information.

Credits

Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the JARVIS, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios and Slave Petrovski for helping clean up the hg38 JARVIS files for providing guidance on interpretation. Additional thanks to Laurens van de Wiel for providing the MetaDome data as well as guidance on the track development and interpretation.

References

Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun. 2021 Mar 8;12(1):1504. PMID: 33686085; PMC: PMC7940646

Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, the SHaRe Investigators, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware