0151d00a4a1d73a78c35f6158c6c936ff338faeb
max
  Fri Apr 24 10:37:34 2026 -0700
NMD Escape: MANE subtrack, Rule 1 bug fix, transcript filter. refs #33737

- Add nmdEscMane subtrack (MANE Select Plus Clinical 1.5), built from
/gbdb/hg38/mane/mane.bb. Reuses nmdEscTranscripts.html.
- Fix Rule 1: measure 50 bp upstream of the transcript's last splice
junction (including 3'UTR introns) rather than stripping 3'UTR from
the exon list first. The old logic painted the entire last CDS exon
as NMD-escape whenever the transcript had only one CDS exon, even
when a 3'UTR intron sat far past the stop codon (e.g. NBDY: 207 bp
of CDS over-painted for a junction 2.6 kb past the stop).
- Add --rule1-mode {cds,mrna} (default cds): cds counts only CDS bp
on the walk-back (paints up to 50 bp of CDS matching the rule label
literally); mrna counts mRNA bp and clips to CDS (tracks the 55 bp
rule literature). Documented in makeDoc.
- Rule 4: when a 3'UTR intron exists, the last CDS-containing exon
has a downstream EJC and is now eligible for the long-exon rule.
- Mouseover lists contributing transcript accessions when 1-3 items
collapse into a region; falls back to a count above that.
- Add filterText/filterType/filterLabel on all three escape subtracks
so a user can narrow the display to one transcript.
- genePredNmdEsc: --gene-sym-field (default 17 for Gencode; pass 18
for MANE, whose HGNC symbol lives in bigGenePred geneName2).
- Add findShortTxLongUtrIntron.py helper for finding MANE transcripts
with long UTR introns (used to pick NMD edge-case test cases).

Post-fix collapsed-region counts (--rule1-mode=cds):
MANE 1.5:        67,752
Gencode V49:    233,375
RefSeq Curated: 112,356

diff --git src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
index b9207fc0d6f..cc74d2318c6 100644
--- src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
+++ src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
@@ -1,173 +1,180 @@
 <h2>Description</h2>
 <p>
 The <b>NMD escape ruleset</b> tracks show predicted regions where a premature termination
 codon (PTC) or frameshift variant is likely to cause the transcript to
 <em>escape</em> nonsense-mediated decay (NMD), leading to the production of an
 aberrant truncated protein rather than degradation of the mRNA.
 </p>
 
 <p>
 The following rules were applied to transcript annotations to define predicted
 NMD escape regions (Nagy et al, Trends Biochem Sci 1998 and Lindeboom et al, Nat Genet 2016):
 </p>
 
 <ol>
-  <li><b>50 bp rule</b>: The entire last coding exon plus the last 50 bp of
-    the penultimate coding exon. A PTC here has no downstream exon-exon
-    junction (or is too close to the last one) for NMD to be triggered.
-    Non-protein-coding 3' exons are not counted when identifying the last
-    coding junction. Note: when the penultimate coding exon is shorter than
-    50 bp, the annotated region extends only to the upstream junction of
-    that exon and does not walk further upstream. A small number of
-    transcripts with unusually short penultimate coding exons are affected.</li>
+  <li><b>50 bp rule</b>: Coding positions within 50 bp (mRNA distance)
+    upstream of the transcript's last splice junction, plus any coding
+    sequence downstream of that junction. A PTC in this window has no
+    downstream exon-exon junction (or is too close to the last one) for
+    NMD to be triggered. The last junction is determined from all exons
+    of the transcript, including 3&#8242;UTR introns, since those introns
+    deposit EJCs that can trigger NMD. For transcripts with no 3&#8242;UTR
+    intron (the common case), this reduces to the entire last coding exon
+    plus the last 50 bp of the penultimate coding exon. For transcripts
+    with a 3&#8242;UTR intron (~4.5% of MANE transcripts), the last
+    junction sits downstream of the stop codon; the escape region is only
+    the stretch of CDS within 50 bp (mRNA distance) of that junction, so
+    if the junction is more than 50 bp past the stop codon no CDS position
+    escapes via this rule.</li>
   <li><b>No downstream EJC rule</b>: Transcripts with a single coding exon and
     no 3&#8242;UTR intron. No exon-exon junction exists downstream of the stop
     codon, so no EJC is deposited that could trigger NMD at a PTC. This
     covers truly intronless transcripts as well as transcripts whose only
     introns are in the 5&#8242;UTR (where EJCs are cleared by the scanning 40S
     ribosomal subunit or sit upstream of the stop and are never encountered by
     the terminating ribosome). Transcripts with a single coding exon but a
     3&#8242;UTR intron are excluded, because that intron deposits an EJC
     downstream of the stop codon that can trigger NMD.</li>
   <li><b>Start-proximal region</b>: The first 100 bp of coding nucleotides.
     PTCs in this region do not lead to NMD, a phenomenon known as start-proximal
     NMD insensitivity. One proposed mechanism, supported by experimental
     evidence, is re-initiation of translation at a downstream AUG codon.</li>
   <li><b>Long exon rule</b>: Coding exons longer than 400 bp (excluding the last
     coding exon, which is already covered by the 50 bp rule). Lindeboom et al.
     2016 showed a marked drop in NMD efficiency (61% vs. 98%) for PTCs in exons
     longer than 400 nt, likely because the large distance between the stalled
     ribosome and the downstream EJC reduces UPF1-EJC contact.</li>
 </ol>
 
 <p>
 Non-coding transcripts (where CDS start equals CDS end) are excluded.
 Overlapping regions from multiple transcripts with identical coordinates and
 the same rule are collapsed into a single item, with the contributing
 transcript IDs stored as a comma-separated list.
 </p>
 
 <p>
 Two versions of this track are available, based on different transcript annotation sets:
 </p>
 <ul>
   <li><b><a href="hgTrackUi?g=nmdEscGencode">NMD escape Gencode</a></b>:
     Derived from GENCODE V49 transcript annotations.</li>
   <li><b><a href="hgTrackUi?g=nmdEscNcbiRefSeq">NMD escape NCBI RefSeq</a></b>:
     Derived from NCBI RefSeq Curated transcript annotations (NM_ and NR_
     accessions; predicted XM_/XR_ models are excluded).</li>
 </ul>
 
 <h2>Background</h2>
 <p>
 NMD escape regions were predicted based on the Exon Junction Complex
 (EJC)-dependent model of NMD. During normal translation, EJCs are deposited at
 exon-exon junctions after splicing. As the ribosome translates the mRNA, it
 displaces each EJC it encounters. When a PTC causes the ribosome to stall
 prematurely, any remaining downstream EJCs recruit surveillance factors
 (notably UPF1) that trigger mRNA degradation via NMD.
 </p>
 
 <p>
 However, PTCs located in the last coding exon or within approximately 50 bp
 upstream of the last exon-exon junction are too close to the final EJC (or
 have no downstream EJC at all) for NMD to be triggered&mdash;the transcript
 escapes degradation. Conversely, PTCs located more than 50&ndash;55 bp
 upstream of the last exon-exon junction are predicted to elicit NMD.
 </p>
 
 <p>
 Additional escape mechanisms, supported by Lindeboom et al. 2016 and other
 studies, are captured by three further rules:
 </p>
 <ul>
   <li><b>Transcripts with no EJC downstream of the stop codon</b> (single coding
     exon and no 3&#8242;UTR intron) cannot trigger NMD, so any PTC in the coding
     sequence escapes. 5&#8242;UTR introns are tolerated because their EJCs are
     upstream of the stop.</li>
   <li><b>Start-proximal PTCs</b> (within the first 100 bp of coding sequence)
     escape NMD, likely through translation re-initiation at a downstream AUG
     codon.</li>
   <li><b>PTCs in long coding exons</b> (&gt;400 bp) show reduced NMD
     efficiency (61% vs. 98% for shorter exons in Lindeboom et al. 2016),
     likely because the large distance between the stalled ribosome and the
     downstream EJC reduces UPF1-EJC contact.</li>
 </ul>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Regions from overlapping transcripts with the same coordinates are collapsed into
 a single item. The gene symbol is shown as the item name. Mouseover displays the
 NMD escape rule and the number of transcripts. The details page lists all
 contributing transcript IDs.
 </p>
 
 <p>
 Items are colored by the NMD escape rule that applies:
 </p>
 <ul>
-  <li><font color="#FF0000"><b>Red</b></font> &ndash; Rule 1: Last 50 bp
-    of the last coding exon-exon junction. A PTC here is too close to the
-    last exon junction complex (EJC) for NMD to be triggered.</li>
+  <li><font color="#FF0000"><b>Red</b></font> &ndash; Rule 1: CDS within
+    50 bp (mRNA distance) upstream of the last splice junction (or
+    downstream of it). A PTC here is too close to the last exon junction
+    complex (EJC) for NMD to be triggered.</li>
   <li><font color="#FF8C00"><b>Orange</b></font> &ndash; Rule 2: Single coding
     exon and no 3&#8242;UTR intron. No EJC is deposited downstream of the stop
     codon, so all PTCs in the coding sequence escape NMD.</li>
   <li><font color="#8B0000"><b>Dark red</b></font> &ndash; Rule 3: First 100 bp
     of coding nucleotides. PTCs in this start-proximal region are insensitive
     to NMD, possibly due to translation re-initiation at a downstream AUG codon.</li>
   <li><font color="#FFD700"><b>Gold</b></font> &ndash; Rule 4: Coding exons
     longer than 400 bp (excluding the last coding exon). NMD efficiency is
     reduced in these long exons because the PTC is far from the downstream
     exon-exon junction.</li>
 </ul>
 
 <h2>Data Access</h2>
 <p>
 The data underlying this track can be explored interactively with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis,
 the data may be queried from our
 <a href="/goldenPath/help/api.html">REST API</a>. Please refer to our
 <a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome"
 target="_blank">mailing list archives</a> for questions, or our
 <a href="../FAQ/FAQdownloads.html#download36">Data Access FAQ</a> for more
 information.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to Guido Neidhardt for suggesting this track at HUGO VEPTC 2025 and Andreas Lahner
 for feedback. Thanks to the Decipher Genome Browser team for introducing the idea of a
 track.
 </p>
 
 <h2>References</h2>
 
 <p>
 Kurosaki T, Popp MW, Maquat LE.
 <a href="https://doi.org/10.1038/s41580-019-0126-2" target="_blank">
 Quality and quantity control of gene expression by nonsense-mediated mRNA decay</a>.
 <em>Nat Rev Mol Cell Biol</em>. 2019 Jul;20(7):406-420.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30992545" target="_blank">30992545</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855384/" target="_blank">PMC6855384</a>
 </p>
 
 <p>
 Lindeboom RGH, Supek F, Lehner B.
 <a href="https://doi.org/10.1038/ng.3664" target="_blank">
 The rules and impact of nonsense-mediated mRNA decay in human cancers</a>.
 <em>Nat Genet</em>. 2016 Oct;48(10):1112-8.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27618451" target="_blank">27618451</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045715/" target="_blank">PMC5045715</a>
 </p>
 
 <p>
 Nagy E, Maquat LE.
 <a href="https://linkinghub.elsevier.com/retrieve/pii/S0968-0004(98)01208-0" target="_blank">
 A rule for termination-codon position within intron-containing genes: when nonsense affects RNA
 abundance</a>.
 <em>Trends Biochem Sci</em>. 1998 Jun;23(6):198-9.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/9644970" target="_blank">9644970</a>
 </p>