src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html 971ae4b10f79d994d419c94d932e5b9163b72098

971ae4b10f79d994d419c94d932e5b9163b72098
gperez2
  Tue Dec 2 10:39:58 2025 -0800
Updating the Display Conventions and Configuration section for gnomAD MPC, refs #36531

diff --git src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html
index aa07dfe5e08..44881937e1a 100644
--- src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html
+++ src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html
@@ -1,147 +1,150 @@
 <h2>Description</h2>
 <p>
 The <b>${longLabel}</b> track shows a score that tries to identify missense-depleted
 regions using the patterns of rare missense variation in 125,748 gnomAD v2.1.1 exomes,
 compared to a null mutational model. Missense-depleted regions are enriched for ClinVar pathogenic
 variants, de novo missense variants in individuals with neurodevelopmental disorders (NDDs), and
 complex trait heritability. The score's publication suggests that regions
 with less than 20% of their expected missense variation achieve moderate
 support for pathogenicity according to ACMG criteria.
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 
 <p>
-Transcripts with constraint predictions are highlighted. Observed and expected
-number of missense mutations are shown on mouse overs, as well as their
-Observed/expected ratio (OE), and the Chi-square and P-value of the ratio.
-Regions are colored using the viridis palette, with yellow for the lowest OE
-values and dark blue for the highest values.
+Transcripts with constraint predictions are colored with the viridis palette, where yellow
+indicates the lowest OE values and dark blue-purple indicates the highest.
 </p>
 
 <p><strong>OE Constraint Legend</strong><br>
-Yellow = strongest constraint, Purple = weakest</p>
-
+Yellow = strongest constraint<br>
+Purple = weakest constraint
+</p>
 <table>
   <thead>
   <tr>
     <th style="border-bottom: 2px solid #6678B1;">Color</th>
     <th style="border-bottom: 2px solid #6678B1;">OE Range</th>
   </tr>
   </thead>
     <tr>
         <th bgcolor="#FDE724"></th>
         <th align="left">OE = 0.0066884</th>
     </tr>
     
     <tr>
         <th bgcolor="#74D054"></th>
         <th align="left">OE = 0.36229 </th>
     </tr>
     
     <tr>
         <th bgcolor="#22898d"></th>
         <th align="left">OE = 0.66993</th>
     </tr>
     
     <tr>
         <th bgcolor="#39558B"></th>
         <th align="left">OE = 0.93385 </th>
     </tr>
     
     <tr>
         <th bgcolor="#440154"></th>
         <th align="left">OE = 2.2429</th>
     </tr>
 </table>
 
+<p>Mouseovers on an item show the observed and expected number of missense mutations, the
+observed/expected (OE) ratio, and the associated Chi-square statistic and P-value.
+</p>
+
+
 <H2>Methods</H2>
 <p>
 The study analyzed only canonical, coding transcripts as defined by GENCODE v19/Ensembl v74. Some
 were excluded: transcripts that had outlier counts of variants expected under neutrality (zero
 expected pLoF, missense, or synonymous variants; too many observed pLoF, missense, or synonymous
 variants compared to expectation; or too few observed synonymous variants compared to expectation).
 In total, the study analyzed 18,629 transcripts.
 </p>
 
 </p>125,748 gnomAD v2.1.1 exomes were used on hg19.
 Median coverage was calculated on a random subset of the gnomAD exomes.
 The set of sites with possible missense variants was described using a
 synthetic Hail Table (HT) containing all possible single nucleotide variants in the exome. 
 Ensembl VEP annotated this HT against GENCODE version 19, and filtered to
 variants with the consequence "missense_variant" in the canonical, coding
 transcripts as defined above. Variants were filtered by
 following criteria: (1) allele count (AC) &gt; 0 and AF &lt; 0.001, variant QC PASS, and median3
 coverage &gt; 0 in gnomAD v2.1.1 exomes; or (2) AC = 0, i.e. variants not seen in gnomAD v2.1.1
 exomes.
 </p>
 
 <p>
 A likelihood ratio test was applied to assess whether the missense observed/expected (OE) ratio
 was uniform along each transcript or if distinct regions of missense constraint were present.
 Observed and expected missense counts were modeled using a Poisson distribution, with the null
 hypothesis assuming no regional variability in missense depletion and the alternative allowing for
 subsections with differing OE ratios. Chi-square statistics (p = 0.001) were used to identify
 significant breakpoints dividing transcripts into two or more sections, requiring at least 16
 expected missense variants per subsection. Transcripts lacking a single significant breakpoint were
 further analyzed for two simultaneous breakpoints, with all significant results merged across
 search types. Recursive testing was then performed, treating each identified subsection as an
 independent transcript until no additional significant breakpoints were detected. To focus on
 missense depletion, subsections with observed counts exceeding expectations were capped at an OE
 of 1, and subsections with zero expected variants were assigned an expected count of 10<sup>-9</sup>
 to avoid nonfinite OE values.
 </p>
 
 <p>
 Obs/Exp annotation genome annotation data was downloaded and reformatted at UCSC to bigBed with a script
 (<a target="_blank"
 href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/gnomad/gnomadMpcToBed.py"
 >mpcToBed.py</a>) available in our
 <a target="_blank" href="https://github.com/ucscGenomeBrowser">Github repo</a>.
 Like all our tracks, the file
 <a target="_blank"
 href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg19.txt#L35943"
 >makeDb/doc/hg19.txt</a>
 in our Github repo describes the commands for the entire download and conversion.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The raw data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a>
 or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated access, this track, like
 all others, is available via our <a href="../goldenPath/help/api.html">API</a>.
 Our command line tool bigBedToBed can be used to transform the bigBed file from our server
 directly to a tab-sep text file.
 The data can also be found on the <a target="_blank"
 href="https://gnomad.broadinstitute.org/downloads">gnomAD 2.1.1 downloads page</a>.
 </p>
 
 <p>
 Please refer to our
 <A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/gnomAD"
 target="_blank">mailing list archives</a>
 for questions and example queries, or our
 <a HREF="../FAQ/FAQdownloads.html#download36" target="_blank">Data Access FAQ</a>
 for more information.</p>
 
 <p>
 More information about using and understanding the gnomAD data can be found on the
 <a target="_blank" href="https://gnomad.broadinstitute.org/faq">gnomAD FAQ</a> site.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to gnomAD for releasing this data and to Luis Nassar for finding it.
 </p>
 
 <h2>References</h2>
 <p>
 Chao KR, Wang L, Panchal R, Liao C, Abderrazzaq H, Ye R, Schultz P, Compitello J, Grant RH, Kosmicki
 JA <em>et al</em>.
 <a href="https://doi.org/10.1101/2024.04.11.588920" target="_blank">
 The landscape of regional missense mutational intolerance quantified from 125,748 exomes</a>.
 <em>bioRxiv</em>. 2024 May 3;.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38645134" target="_blank">38645134</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11030311/" target="_blank">PMC11030311</a>
 </p>