src/hg/makeDb/trackDb/human/hg38/gnomadPLI.html 198c9b8daecc44fbda6a6494c566c723920f030a

198c9b8daecc44fbda6a6494c566c723920f030a
lrnassar
  Wed Mar 11 18:25:21 2026 -0700
Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM.

diff --git src/hg/makeDb/trackDb/human/hg38/gnomadPLI.html src/hg/makeDb/trackDb/human/hg38/gnomadPLI.html
index b1f4a41885c..3a3581d274f 100644
--- src/hg/makeDb/trackDb/human/hg38/gnomadPLI.html
+++ src/hg/makeDb/trackDb/human/hg38/gnomadPLI.html
@@ -1,277 +1,277 @@
 <H2>Description</H2>
 <P>
 The <b>Genome Aggregation Database (gnomAD) - Predicted Constraint Metrics</b> track set contains
 metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1, v4.0, or v4.1 and identifies genes subject to
 strong selection against various classes of mutation.
 </p>
 
 <p>
 This track includes several subtracks of constraint metrics calculated at gene (canonical
 transcript) and transcript level. For more information see the following
 <a href="https://macarthurlab.org/2018/10/17/gnomad-v2-1">blog post</a>.
 The metrics include:
 <ul>
   <li>Observed and expected variant counts per transcript/gene
   <li>Observed/Expected  ratio (O/E)
   <li>Z-scores of the observed counts compared to expected
   <li>Probability of loss of function intolerance (pLI), for predicted loss-of-function (pLoF) variation only
 </ul>
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 There are two &quot;groups&quot; of tracks in this set, and three gnomAD versions (v2.1.1, v4.0, and v4.1):
 <ol>
   <li><b>Gene/Transcript LoF Constraint tracks</b>: Predicted constraint metrics at the whole gene
     level or whole transcript level for three different types of variation: missense, synonymous,
     and predicted loss of function. The Gene Constraint track displays metrics for a canonical 
     transcript per gene defined as the longest isoform. The Transcript Constraint track displays 
     metrics for all transcript isoforms. Items on both tracks are shaded according to the pLI score,
     with outlier items shaded in grey.
   <br><img src="../images/gnomAD_LOEUFkey.png"  width='33%'alt="LOEUF score legend"><br>
     Please note there is no gene-level track available for v4.0 and v4.1.
   <li><b>Gene/Transcript Missense Constraint tracks</b>: The missense constraint tracks are built
     similarly to the LoF constraint tracks, however the items displayed are based on 
     <a href="https://gnomad.broadinstitute.org/help/constraint" target="_blank">missense Z scores</a>.
     All items are colored black, and individual Z scores can be seen on mouseover. 
 </ol>
 All tracks follow the general configuration settings for bigBed tracks. Mouseover on the 
 Gene/Transcript Constraint tracks shows the <b>pLI score</b> and the loss of function 
 observed/expected upper bound fraction <b>(LOEUF)</b>, while mouseover on the Regional
 Constraint track shows only the missense <b>O/E ratio</b>. Clicking on items in any track brings
 up a table of constraint metrics.
 </p>
 
 <p>
 Clicking the grey box to the left of the track, or right-clicking and choosing the Configure option,
 brings up the interface for filtering items based on their pLI score, or labeling the items
 based on their Ensembl identifier and/or Gene Name.
 </p>
 
 <H2>Methods</H2>
 <p>
 Please see the <a href="https://gnomad.broadinstitute.org/help/constraint">gnomAD browser help page</a> and <a 
 href="https://gnomad.broadinstitute.org/faq">FAQ</a> for further explanation of the topics below.</p>
 
 <h3>Observed and Expected Variant Counts</h3>
 <p>
 <em>Observed count</em>: The number of unique single-nucleotide variants in each transcript/gene
 with 123 or fewer alternative alleles (MAF &lt; 0.1%).
 </p>
 <p>
 <em>Expected count</em>: A depth-corrected probability prediction model that takes into account
 sequence context, coverage, and methylation was used to predict expected
 variant counts. For more information please see Lek et al., 2016.
 </p>
 <p>
 Variants found in exons with a median depth &lt; 1 were removed from both counts.
 <p>
 The O/E constraint score is the ratio of the observed/expected variants in that gene. Each item in
 this track shows the O/E ratio for three different types of variation: missense, synonymous, and
 loss-of-function. The O/E ratio is a continuous measurement of how tolerant a gene or
 transcript is to a certain class of variation. <b>When a gene has a low O/E value, it is under stronger
 selection for that class of variation than a gene with a higher O/E value.</b> Because Counts depend on
 gene size and sample size, the precision of the values varies a lot from one gene to the next. 
 Therefore, the 90% confidence interval (CI) is also displayed along with the O/E ratio to better
 assist interpretation of the scores.
 <p>
 When evaluating how constrained a gene is, <b>it is essential to consider the CI when using O/E</b>. In 
 research and clinical interpretation of Mendelian cases, <b>pLI > 0.9</b> has been widely used for 
 filtering. Accordingly, the Gnomad team suggests using the upper bound of the O/E confidence interval
 <b>LOEUF &lt; 0.35</b> as a threshold if needed.
 <p>
 Please see the Methods section below for more information about how the scores were calculated.
 </p>
 
 <h3>pLI and Z-scores</h3>
 <p>
 The pLI and Z-scores of the deviation of observed variant counts relative to the expected number 
 are intended to measure how constrained or intolerant a gene or transcript is to a specific type of
 variation. Genes or transcripts that are particularly depleted of a specific class of variation
 (as observed in the gnomAD data set) are considered intolerant of that specific type of variation.
-Z-scores are available for the missense and synonynmous categories and pLI scores are available for
+Z-scores are available for the missense and synonymous categories and pLI scores are available for
 the loss-of-function variation.
 </p>
 <p>
 <em>Missense and Synonymous</em>: Positive Z-scores indicate more constraint (fewer observed 
 variants than expected), and negative scores indicate less constraint (more observed variants than
 expected). <b>A greater Z-score indicates more intolerance to the class of variation.</b> Z-scores
 were generated by a sequence-context-based mutational model that predicted the number of expected
 rare (&lt; 1% MAF) variants per transcript. The square root of the chi-squared value of the 
 deviation of observed counts from expected counts was multiplied by -1 if the observed count was
 greater than the expected and vice versa. For the synonymous score, each Z-score was corrected by
 dividing by the standard deviation of all synonymous Z-scores between -5 and 5. For the missense
 scores, a mirrored distribution of all Z-scores between -5 and 0 was created, and then all missense
 Z-scores were corrected by dividing by the standard deviation of the Z-score of the mirror
 distribution.
 </p>
 <p>
 <em>Loss-of-function</em>: <b>pLI closer to 1 indicates that the gene or transcript cannot tolerate
 protein truncating variation</b> (nonsense, splice acceptor and splice donor variation). The gnomAD
 team recommends transcripts with a <b>pLI &gt;= 0.9</b> for the set of transcripts extremely intolerant
 to truncating variants. pLI is based on the idea that transcripts can be classified into three
 categories:
 <ul>
   <li>null: heterozygous or homozygous protein truncating variation is completely tolerated
   <li>recessive: heterozygous variants are tolerated but homozygous variants are not
   <li>haploinsufficient: heterozygous variants are not tolerated
 </ul>
 An expectation-maximization algorithm was then used to assign a probability of belonging in each
 class to each gene or transcript. pLI is the probability of belonging in the haploinsufficient class.
 </p>
 
 <p>
 Please see Samocha et al., 2014 and Lek et al., 2016 for further discussion of these metrics.
 </p>
 
 <h3>Transcripts Included</h3>
 <p>
 For version 2.1.1 only, the GENCODE transcripts were filtered according to the following criteria:
 <ul>
   <li>Must have methionine at start of coding sequence
   <li>Must have stop codon at end of coding sequence
   <li>Must be divisible by 3
   <li>Must have at least one observed variant when removing exons with median depth &lt; 1
   <li>Must have reasonable number of missense and synonymous variants as determined by a Z-score cutoff
 </ul>
 </p>
 <p>
 For version v2.1.1, the gnomAD gene/transcript data is based on hg19. In order to map transcripts and genes to the hg38 genome the following steps were taken:
 <ul>
 <li><b>Transcript track:</b> The gnomAD ENST identifiers were attempted to be matched to all GENCODE versions
 between V20 and V44, giving coordinate priorities to the most recent models. In total 74550/80950 
 transcripts were mapped.</li>
 <li><b>Genes track:</b> The gnomAD file ENSG identifiers were attempted to be matched to all GENCODE versions
 between V20 and V44, giving coordinate priorities to the most recent models. This mapped 19221/19704
 genes. The remainder of the genes were attempted to be mapped using the same strategy, but matching
 on gene symbols instead of ENSG identifiers. In total 19567/19704 genes were mapped.</li>
 </ul>
 </p>
 
 <p>
 For version v4.0 and v4.1, the gnomAD transcript data is based on hg38. In order to map the
 transcripts to hg38, the transcript version numbers in the gnomAD download file were joined with
 GENCODE V39 and NCBI RefSeq coordinates available at UCSC.
 </p>
 
 <h3>UCSC Track Methods</h3>
 <h4>Version based on gnomAD v2.1.1</h4>
 <h5>Gene and Transcript Constraint tracks</h5>
 <p>
 Per gene and per transcript data were downloaded from the gnomAD Google Storage bucket:
 <pre>
 gs://gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz
 gs://gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz
 </pre>
 These data were then joined to the Gencode set of genes/transcripts available at the UCSC
 Genome Browser (see previous section) and then transformed into a bigBed 12+5. For the full list of commands used to
 make this track please see the
 <a href="https://raw.githubusercontent.com/ucscGenomeBrowser/kent/master/src/hg/makeDb/doc/hg38/gnomadPLI.txt">makedoc</a>.
 </p>
 
 <h4>Version based on gnomAD v4.0</h4>
 <h5>Gene and Transcript Constraint tracks</h5>
 <p>
 Per gene and per transcript data were downloaded from the gnomAD Google Storage bucket:
 <pre>
 https://storage.googleapis.com/gcp-public-data--gnomad/release/4.0/constraint/gnomad.v4.0.constraint_metrics.tsv
 </pre>
 These data were then joined to the Gencode/NCBI set of genes/transcripts available at the UCSC
 Genome Browser and then transformed into a bigBed 12+5. For the full list of commands used to
 make this track please see the
 <a href="https://raw.githubusercontent.com/ucscGenomeBrowser/kent/master/src/hg/makeDb/doc/hg38/gnomad.txt">makedoc</a>.
 </p>
 
 <h4>Version based on gnomAD v4.1</h4>
 <h5>Gene and Transcript Constraint tracks</h5>
 <p>
 Per gene and per transcript data were downloaded from the gnomAD Google Storage bucket:
 <pre>
 https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/constraint/gnomad.v4.1.constraint_metrics.tsv
 </pre>
 These data were then joined to the Gencode/NCBI set of genes/transcripts available at the UCSC
 Genome Browser and then transformed into a bigBed 12+5. For the full list of commands used to
 make this track please see the
 <a href="https://raw.githubusercontent.com/ucscGenomeBrowser/kent/master/src/hg/makeDb/doc/hg38/gnomad.txt">makedoc</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The raw data can be explored interactively with the <a href="../hgTables">Table Browser</a>, or
 the <a href="../hgIntegrator">Data Integrator</a>. For automated access, this track, like all 
 others, is available via our <a href="../goldenPath/help/api.html">API</a>.  However, for bulk 
 processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed 
 file that can be downloaded from the
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/gnomAD/pLI/">download server</a>. The exact
 filenames can be found in the track configuration file. Annotations can be converted to ASCII text
 by our tool <code>bigBedToBed</code> which can be compiled from the source code or downloaded as
 a precompiled binary for your system. Instructions for downloading source code and binaries can be
 found <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool
 can also be used to obtain only features within a given range, for example:</p>
 <pre>
 bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/gnomAD/pLI/pliByTranscript.bb -chrom=chr6 -start=0 -end=1000000 stdout
 </pre>
 <p>
 Please refer to our
 <A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/gnomAD"
 target=_blank>mailing list archives</a>
 for questions and example queries, or our
 <a HREF="../FAQ/FAQdownloads.html#download36" target=_blank>Data Access FAQ</a>
 for more information.
 </p>
 
 <p>
 More information about using and understanding the gnomAD data can be found in the
 <a target="_blank" href="https://gnomad.broadinstitute.org/faq">gnomAD FAQ</a> site.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to the <a href="https://gnomad.broadinstitute.org/about" target="_blank">Genome Aggregation
 Database Consortium</a> for making these data available. The data are released under the <a
 href="https://opendatacommons.org/licenses/odbl/1.0/" target="_blank">ODC Open Database License
 (ODbL)</a> as described <a href="https://gnomad.broadinstitute.org/terms" target="_blank">here</a>.
 </p>
 
 <H2>References</H2>
 
 <p>
 Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill
 AJ, Cummings BB <em>et al</em>.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/27535533" target="_blank">
 Analysis of protein-coding genetic variation in 60,706 humans</a>.
 <em>Nature</em>. 2016 Aug 18;536(7616):285-91.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27535533" target="_blank">27535533</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018207/" target="_blank">PMC5018207</a>
 </p>
 
 <p>
 Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alf&#246;ldi J, Wang Q, Collins RL, Laricchia KM,
 Ganna A, Birnbaum DP <em>et al</em>.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461654" target="_blank">
 The mutational constraint spectrum quantified from variation in 141,456 humans</a>.
 <em>Nature</em>. 2020 May;581(7809):434-443.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461654" target="_blank">32461654</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334197/" target="_blank">PMC7334197</a>
 </p>
 
 <p>
 Collins RL, Brand H, Karczewski KJ, Zhao X, Alf&#246;ldi J, Francioli LC, Khera AV, Lowther C,
 Gauthier LD, Wang H <em>et al</em>.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461652" target="_blank">
 A structural variation reference for medical and population genetics</a>.
 <em>Nature</em>. 2020 May;581(7809):444-451.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461652" target="_blank">32461652</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334194/" target="_blank">PMC7334194</a>
 </p>
 
 <p>
 Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J,
 Satterstrom FK, O'Donnell-Luria AH <em>et al</em>.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461655" target="_blank">
 Transcript expression-aware annotation improves rare variant interpretation</a>.
 <em>Nature</em>. 2020 May;581(7809):452-458.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461655" target="_blank">32461655</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334198/" target="_blank">PMC7334198</a>
 </p>