198c9b8daecc44fbda6a6494c566c723920f030a lrnassar Wed Mar 11 18:25:21 2026 -0700 Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM. diff --git src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html index 7e5f3fa4688..1e2944ed57a 100644 --- src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html +++ src/hg/makeDb/trackDb/human/hg19/gnomadMpc.html @@ -55,60 +55,60 @@

Mouseovers on an item show the observed and expected number of missense mutations, the observed/expected (OE) ratio, and the associated Chi-square statistic and P-value.

Methods

The study analyzed only canonical, coding transcripts as defined by GENCODE v19/Ensembl v74. Some were excluded: transcripts that had outlier counts of variants expected under neutrality (zero expected pLoF, missense, or synonymous variants; too many observed pLoF, missense, or synonymous variants compared to expectation; or too few observed synonymous variants compared to expectation). In total, the study analyzed 18,629 transcripts.

-

125,748 gnomAD v2.1.1 exomes were used on hg19. +

125,748 gnomAD v2.1.1 exomes were used on hg19. Median coverage was calculated on a random subset of the gnomAD exomes. The set of sites with possible missense variants was described using a synthetic Hail Table (HT) containing all possible single nucleotide variants in the exome. Ensembl VEP annotated this HT against GENCODE version 19, and filtered to variants with the consequence "missense_variant" in the canonical, coding transcripts as defined above. Variants were filtered by following criteria: (1) allele count (AC) > 0 and AF < 0.001, variant QC PASS, and median3 coverage > 0 in gnomAD v2.1.1 exomes; or (2) AC = 0, i.e. variants not seen in gnomAD v2.1.1 exomes.

A likelihood ratio test was applied to assess whether the missense observed/expected (OE) ratio was uniform along each transcript or if distinct regions of missense constraint were present. Observed and expected missense counts were modeled using a Poisson distribution, with the null hypothesis assuming no regional variability in missense depletion and the alternative allowing for subsections with differing OE ratios. Chi-square statistics (p = 0.001) were used to identify significant breakpoints dividing transcripts into two or more sections, requiring at least 16 expected missense variants per subsection. Transcripts lacking a single significant breakpoint were further analyzed for two simultaneous breakpoints, with all significant results merged across search types. Recursive testing was then performed, treating each identified subsection as an independent transcript until no additional significant breakpoints were detected. To focus on missense depletion, subsections with observed counts exceeding expectations were capped at an OE of 1, and subsections with zero expected variants were assigned an expected count of 10-9 to avoid nonfinite OE values.

-Obs/Exp annotation genome annotation data was downloaded and reformatted at UCSC to bigBed with a script +Obs/Exp genome annotation data was downloaded and reformatted at UCSC to bigBed with a script (mpcToBed.py) available in our Github repo. Like all our tracks, the file makeDb/doc/hg19.txt in our Github repo describes the commands for the entire download and conversion.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated access, this track, like