971ae4b10f79d994d419c94d932e5b9163b72098 gperez2 Tue Dec 2 10:39:58 2025 -0800 Updating the Display Conventions and Configuration section for gnomAD MPC, refs #36531 diff --git src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html index aa07dfe5e08..44881937e1a 100644 --- src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html +++ src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html @@ -1,147 +1,150 @@

Description

The ${longLabel} track shows a score that tries to identify missense-depleted regions using the patterns of rare missense variation in 125,748 gnomAD v2.1.1 exomes, compared to a null mutational model. Missense-depleted regions are enriched for ClinVar pathogenic variants, de novo missense variants in individuals with neurodevelopmental disorders (NDDs), and complex trait heritability. The score's publication suggests that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity according to ACMG criteria.

Display Conventions and Configuration

-Transcripts with constraint predictions are highlighted. Observed and expected -number of missense mutations are shown on mouse overs, as well as their -Observed/expected ratio (OE), and the Chi-square and P-value of the ratio. -Regions are colored using the viridis palette, with yellow for the lowest OE -values and dark blue for the highest values. +Transcripts with constraint predictions are colored with the viridis palette, where yellow +indicates the lowest OE values and dark blue-purple indicates the highest.

OE Constraint Legend
-Yellow = strongest constraint, Purple = weakest

- +Yellow = strongest constraint
+Purple = weakest constraint +

Color OE Range
OE = 0.0066884
OE = 0.36229
OE = 0.66993
OE = 0.93385
OE = 2.2429
+

Mouseovers on an item show the observed and expected number of missense mutations, the +observed/expected (OE) ratio, and the associated Chi-square statistic and P-value. +

+ +

Methods

The study analyzed only canonical, coding transcripts as defined by GENCODE v19/Ensembl v74. Some were excluded: transcripts that had outlier counts of variants expected under neutrality (zero expected pLoF, missense, or synonymous variants; too many observed pLoF, missense, or synonymous variants compared to expectation; or too few observed synonymous variants compared to expectation). In total, the study analyzed 18,629 transcripts.

125,748 gnomAD v2.1.1 exomes were used on hg19. Median coverage was calculated on a random subset of the gnomAD exomes. The set of sites with possible missense variants was described using a synthetic Hail Table (HT) containing all possible single nucleotide variants in the exome. Ensembl VEP annotated this HT against GENCODE version 19, and filtered to variants with the consequence "missense_variant" in the canonical, coding transcripts as defined above. Variants were filtered by following criteria: (1) allele count (AC) > 0 and AF < 0.001, variant QC PASS, and median3 coverage > 0 in gnomAD v2.1.1 exomes; or (2) AC = 0, i.e. variants not seen in gnomAD v2.1.1 exomes.

A likelihood ratio test was applied to assess whether the missense observed/expected (OE) ratio was uniform along each transcript or if distinct regions of missense constraint were present. Observed and expected missense counts were modeled using a Poisson distribution, with the null hypothesis assuming no regional variability in missense depletion and the alternative allowing for subsections with differing OE ratios. Chi-square statistics (p = 0.001) were used to identify significant breakpoints dividing transcripts into two or more sections, requiring at least 16 expected missense variants per subsection. Transcripts lacking a single significant breakpoint were further analyzed for two simultaneous breakpoints, with all significant results merged across search types. Recursive testing was then performed, treating each identified subsection as an independent transcript until no additional significant breakpoints were detected. To focus on missense depletion, subsections with observed counts exceeding expectations were capped at an OE of 1, and subsections with zero expected variants were assigned an expected count of 10-9 to avoid nonfinite OE values.

Obs/Exp annotation genome annotation data was downloaded and reformatted at UCSC to bigBed with a script (mpcToBed.py) available in our Github repo. Like all our tracks, the file makeDb/doc/hg19.txt in our Github repo describes the commands for the entire download and conversion.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated access, this track, like all others, is available via our API. Our command line tool bigBedToBed can be used to transform the bigBed file from our server directly to a tab-sep text file. The data can also be found on the gnomAD 2.1.1 downloads page.

Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information.

More information about using and understanding the gnomAD data can be found on the gnomAD FAQ site.

Credits

Thanks to gnomAD for releasing this data and to Luis Nassar for finding it.

References

Chao KR, Wang L, Panchal R, Liao C, Abderrazzaq H, Ye R, Schultz P, Compitello J, Grant RH, Kosmicki JA et al. The landscape of regional missense mutational intolerance quantified from 125,748 exomes. bioRxiv. 2024 May 3;. PMID: 38645134; PMC: PMC11030311