b34331f456005e4f28bb0d095c081b50d94f7ebd gperez2 Tue Dec 2 10:59:48 2025 -0800 Updating the text for the gnomAD MPC, refs #36531 diff --git src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html index 44881937e1a..87e28e4dc07 100644 --- src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html +++ src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html @@ -1,150 +1,150 @@
The ${longLabel} track shows a score that tries to identify missense-depleted regions using the patterns of rare missense variation in 125,748 gnomAD v2.1.1 exomes, compared to a null mutational model. Missense-depleted regions are enriched for ClinVar pathogenic variants, de novo missense variants in individuals with neurodevelopmental disorders (NDDs), and complex trait heritability. The score's publication suggests that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity according to ACMG criteria.
-Transcripts with constraint predictions are colored with the viridis palette, where yellow -indicates the lowest OE values and dark blue-purple indicates the highest. +Regions of transcripts with constraint predictions are colored using the viridis palette, where +yellow indicates the lowest OE values and dark blue-purple indicates the highest.
OE Constraint Legend
Yellow = strongest constraint
Purple = weakest constraint
| Color | OE Range |
|---|---|
| OE = 0.0066884 | |
| OE = 0.36229 | |
| OE = 0.66993 | |
| OE = 0.93385 | |
| OE = 2.2429 |
Mouseovers on an item show the observed and expected number of missense mutations, the observed/expected (OE) ratio, and the associated Chi-square statistic and P-value.
The study analyzed only canonical, coding transcripts as defined by GENCODE v19/Ensembl v74. Some were excluded: transcripts that had outlier counts of variants expected under neutrality (zero expected pLoF, missense, or synonymous variants; too many observed pLoF, missense, or synonymous variants compared to expectation; or too few observed synonymous variants compared to expectation). In total, the study analyzed 18,629 transcripts.
125,748 gnomAD v2.1.1 exomes were used on hg19. Median coverage was calculated on a random subset of the gnomAD exomes. The set of sites with possible missense variants was described using a synthetic Hail Table (HT) containing all possible single nucleotide variants in the exome. Ensembl VEP annotated this HT against GENCODE version 19, and filtered to variants with the consequence "missense_variant" in the canonical, coding transcripts as defined above. Variants were filtered by following criteria: (1) allele count (AC) > 0 and AF < 0.001, variant QC PASS, and median3 coverage > 0 in gnomAD v2.1.1 exomes; or (2) AC = 0, i.e. variants not seen in gnomAD v2.1.1 exomes.A likelihood ratio test was applied to assess whether the missense observed/expected (OE) ratio was uniform along each transcript or if distinct regions of missense constraint were present. Observed and expected missense counts were modeled using a Poisson distribution, with the null hypothesis assuming no regional variability in missense depletion and the alternative allowing for subsections with differing OE ratios. Chi-square statistics (p = 0.001) were used to identify significant breakpoints dividing transcripts into two or more sections, requiring at least 16 expected missense variants per subsection. Transcripts lacking a single significant breakpoint were further analyzed for two simultaneous breakpoints, with all significant results merged across search types. Recursive testing was then performed, treating each identified subsection as an independent transcript until no additional significant breakpoints were detected. To focus on missense depletion, subsections with observed counts exceeding expectations were capped at an OE of 1, and subsections with zero expected variants were assigned an expected count of 10-9 to avoid nonfinite OE values.
Obs/Exp annotation genome annotation data was downloaded and reformatted at UCSC to bigBed with a script (mpcToBed.py) available in our Github repo. Like all our tracks, the file makeDb/doc/hg19.txt in our Github repo describes the commands for the entire download and conversion.
The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated access, this track, like all others, is available via our API. Our command line tool bigBedToBed can be used to transform the bigBed file from our server directly to a tab-sep text file. The data can also be found on the gnomAD 2.1.1 downloads page.
Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information.
More information about using and understanding the gnomAD data can be found on the gnomAD FAQ site.
Thanks to gnomAD for releasing this data and to Luis Nassar for finding it.
Chao KR, Wang L, Panchal R, Liao C, Abderrazzaq H, Ye R, Schultz P, Compitello J, Grant RH, Kosmicki JA et al. The landscape of regional missense mutational intolerance quantified from 125,748 exomes. bioRxiv. 2024 May 3;. PMID: 38645134; PMC: PMC11030311