f30798ae5d11e88e0ab7eb2bcab634e253fd0675 max Thu Apr 23 10:36:40 2026 -0700 Add gnomAD MPC v4.1.1 track to hg38. New composite track under the gnomAD container showing per-variant MPC (Missense deleteriousness Prediction by Constraint) scores from gnomAD v4.1.1. Four bigWigs provide per-base scores (one per ALT nucleotide); a companion bigBed carries the ~250K multi-transcript variants with a per-transcript breakdown. Included via 'alpha' for QA review. refs #37434 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/hg38/gnomadMpc.html src/hg/makeDb/trackDb/human/hg38/gnomadMpc.html new file mode 100644 index 00000000000..c43a66760dc --- /dev/null +++ src/hg/makeDb/trackDb/human/hg38/gnomadMpc.html @@ -0,0 +1,175 @@ +

Description

+

+Missense variants change a single amino acid in a protein and are a common +source of variants of uncertain significance (VUS): about 90% of missense +variants in ClinVar are VUS. The MPC score ("Missense deleteriousness +Prediction by Constraint") is a machine-learning score that flags missense +variants likely to be deleterious by combining three lines of evidence: +(i) regional missense constraint (how depleted the surrounding sub-genic +region is of rare missense variation in the general population), (ii) the +biochemical severity of the specific amino-acid substitution as captured by +PolyPhen-2, and (iii) cross-species conservation (phyloP). The model is +trained to separate pathogenic from benign missense variation under strong +heterozygous selection; higher scores indicate greater predicted +deleteriousness. The authors report that MPC ≥ 2.5 is strongly +enriched for de novo variants in individuals with severe developmental +disorders relative to their unaffected siblings, with MPC 2–2.5 showing +intermediate enrichment and MPC < 2 little enrichment. +

+ +

+This track shows MPC v4.1.1, computed by the Broad Institute gnomAD team from +the gnomAD v4.1.1 release of 730,947 exomes aligned to GRCh38. Scores +are provided for every possible single-nucleotide missense variant in 17,841 +MANE Select or canonical protein-coding transcripts that passed gnomAD QC, as +well as for an additional 1,534 transcripts that failed QC (the authors note +that scores may be less accurate in the latter). +

+ +

Display Conventions and Configuration

+

+Two views of the same underlying data are available: +

+ +

+Across the 250,000 multi-transcript variants, per-transcript MPC scores +typically agree within 0.5 units; only a few percent differ by more than +0.5. The bigBed view is the authoritative source for the full +transcript-level detail in those cases; the bigWig view collapses +transcripts by showing the maximum MPC at each position. +

+ +

Methods

+

+Regional missense constraint (MCR). For each of 17,841 QC-passing +MANE Select or canonical coding transcripts, the authors tallied the +observed rare missense variants (allele count > 0, allele frequency +< 0.1%, %AN ≥ 20, QC PASS) in gnomAD v4.1.1 against the expected +count under a position- and coverage-adjusted mutational model. A recursive +likelihood-ratio test (Poisson model, p-value threshold 0.001, minimum +16 expected missense variants per sub-region) identifies change-points at +which the transcript-wide observed/expected (OE) ratio deviates +significantly; each resulting segment is a missense constraint region +(MCR). 36% of transcripts (6,361/17,841) harbor two or more MCRs. MCR +missense OE was calibrated against ClinVar P/LP vs. B/LB missense variants +following ClinGen recommendations for the ACMG/AMP guidelines: OE ≤ 0.36 +meets moderate evidence for pathogenicity, OE ≤ 0.59 meets +supporting evidence for pathogenicity, OE > 0.97 and OE > 1.23 +meet supporting and moderate evidence for benignity, respectively. +

+ +

+MPC score. MPC is an XGBoost gradient-boosted-tree classifier that +takes as input (1) MCR missense OE, (2) gene-level constraint, (3) a +per-substitution amino-acid severity feature, (4) the PolyPhen-2 +pathogenicity score, and (5) phyloP conservation. Training: 20,931 +"pathogenic" variants (high-quality ClinVar P/LP in 2,987 +haploinsufficient genes with pHaplo ≥ 0.86 or in 359 non-LoF DD genes +from Gene2Phenotype) vs. 93,638 "benign" variants (high-quality ClinVar +B/LB or gnomAD variants with AF > 0.1% in the same gene set). The model +is applied to all 70,313,598 possible exome-wide missense variants in the +Ensembl VEP table. For a variant i, MPC is +di = log10(M / mi), +where M is the number of benign training variants and +mi is the number of those with a fitted +pathogenicity probability lower than variant i's; when +mi is 0 the score is capped at 6. Higher scores +indicate greater predicted deleteriousness. The authors caution that MPC is +best suited to modelling strong fitness effects (as expected given its +training set) and that naively taking the maximum of MPC and AlphaMissense +decreases case/control discrimination for de novo variants relative +to either score alone. +

+ +

+At UCSC. The precomputed MPC score table was downloaded from the +gnomAD Broad public bucket at +gs://gcp-public-data--gnomad/papers/2026-rmc/gnomad_v4.1.1_mpc.tsv.bgz, +companion to the Hail-table release +gnomad_v4.1.1_mpc.ht in the same directory. The input TSV contains +one row per (locus, alleles, transcript) combination, for 70,313,598 rows +covering 70,047,670 unique (chrom, pos, alt) variants. Two Python scripts in +src/hg/makeDb/scripts/gnomadMpc +emit the UCSC files: gnomadMpcToWig.py writes four bigWig files +(one per alternate base, carrying the maximum MPC across transcripts at each +variant) and gnomadMpcToBed.py writes a BED file with one row per +(position, alternate allele) restricted to variants scored against more +than one transcript, with all per-transcript scores preserved as aligned +comma-separated lists. Build commands are documented +in the +hg38/gnomadMpc.txt +makeDoc file. +

+ +

Data Access

+

+The raw data can be explored interactively with the +Table Browser or the +Data Integrator. For automated access, +this track is available via our +API. The underlying bigWig and +bigBed files are at +our download server +as a.bw, c.bw, g.bw, t.bw, and +mpcOverlaps.bb. Individual positions or whole chromosomes can be extracted +with bigWigToBedGraph / bigWigToWig (for the bigWigs) or +bigBedToBed (for the bigBed), for example: +

+
+bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500 \
+    http://hgdownload.soe.ucsc.edu/gbdb/hg38/gnomAD/mpc/a.bw stdout
+
+

+The original MPC table and the accompanying missense constraint regions can +be downloaded from the +gnomAD downloads page +or directly from the Broad Institute's public Google Cloud bucket at +gs://gcp-public-data--gnomad/papers/2026-rmc/. Code for +reproducing the MPC scores and MCRs is available at the +broadinstitute/regional_missense_constraint +GitHub repository. +

+ +

Credits

+

+Thanks to the gnomAD production team and the Samocha and MacArthur +laboratories for generating and releasing the MPC scores. +

+ +

References

+

+Wang L, Chao KR, Panchal R, Liao C, Abderrazzaq H, Ye R, Schultz P, +Compitello J, Grant RH, Kosmicki JA, Weisburd B, Phu W, Wilson MW, +Laricchia KM, Goodrich JK, Goldstein D, Goldstein JI, Vittal C, Poterba T, +Baxter S, Watts NA, Solomonson M, gnomAD consortium, Tiao G, Rehm HL, +Neale BM, Talkowski ME, MacArthur DG, O'Donnell-Luria A, Karczewski KJ, +Radivojac P, Daly MJ, Samocha KE. +The landscape of regional missense mutational intolerance quantified from 730,947 exomes. +bioRxiv 2024.04.11.588920, posted April 23, 2026; +doi: 10.1101/2024.04.11.588920. +