34d2eee845f5f45e571d1e153c632683b8a93f75
lrnassar
  Tue Apr 21 16:17:53 2026 -0700
Refine NMD Escape Rule 2 gate to "single coding exon and no 3'UTR intron". refs #33737

Previously Rule 2 required exonCount==1 (truly intronless). This
overcorrected for single-CDS-exon transcripts whose only introns are in
the 5'UTR: biologically these have no EJC downstream of the stop codon
(5'UTR EJCs are cleared by the scanning 40S or sit upstream of the
terminating ribosome) and are NMD-immune, but the code pushed them to
Rules 1/3 under a less accurate "last coding exon" label.

New gate: len(cdsExons) == 1 AND no exon-exon junction strictly
downstream of the stop codon (strand-aware). Transcripts with a single
coding exon but a 3'UTR intron correctly stay in Rules 1/3 because that
intron deposits an EJC that can trigger NMD.

3,113 RefSeq Curated and 10,790 Gencode V49 transcripts move into Rule
2. 140 RefSeq and 1,135 Gencode single-CDS-exon transcripts with 3'UTR
introns correctly remain in Rules 1/3. Description page and makedoc
updated.

diff --git src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
index 2a7bc848ee2..b9207fc0d6f 100644
--- src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
+++ src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
@@ -8,33 +8,39 @@
 
 <p>
 The following rules were applied to transcript annotations to define predicted
 NMD escape regions (Nagy et al, Trends Biochem Sci 1998 and Lindeboom et al, Nat Genet 2016):
 </p>
 
 <ol>
   <li><b>50 bp rule</b>: The entire last coding exon plus the last 50 bp of
     the penultimate coding exon. A PTC here has no downstream exon-exon
     junction (or is too close to the last one) for NMD to be triggered.
     Non-protein-coding 3' exons are not counted when identifying the last
     coding junction. Note: when the penultimate coding exon is shorter than
     50 bp, the annotated region extends only to the upstream junction of
     that exon and does not walk further upstream. A small number of
     transcripts with unusually short penultimate coding exons are affected.</li>
-  <li><b>Intronless transcripts</b>: Transcripts with a single exon. Since no
-    EJCs are deposited on single-exon transcripts, all PTCs are predicted to
-    escape NMD.</li>
+  <li><b>No downstream EJC rule</b>: Transcripts with a single coding exon and
+    no 3&#8242;UTR intron. No exon-exon junction exists downstream of the stop
+    codon, so no EJC is deposited that could trigger NMD at a PTC. This
+    covers truly intronless transcripts as well as transcripts whose only
+    introns are in the 5&#8242;UTR (where EJCs are cleared by the scanning 40S
+    ribosomal subunit or sit upstream of the stop and are never encountered by
+    the terminating ribosome). Transcripts with a single coding exon but a
+    3&#8242;UTR intron are excluded, because that intron deposits an EJC
+    downstream of the stop codon that can trigger NMD.</li>
   <li><b>Start-proximal region</b>: The first 100 bp of coding nucleotides.
     PTCs in this region do not lead to NMD, a phenomenon known as start-proximal
     NMD insensitivity. One proposed mechanism, supported by experimental
     evidence, is re-initiation of translation at a downstream AUG codon.</li>
   <li><b>Long exon rule</b>: Coding exons longer than 400 bp (excluding the last
     coding exon, which is already covered by the 50 bp rule). Lindeboom et al.
     2016 showed a marked drop in NMD efficiency (61% vs. 98%) for PTCs in exons
     longer than 400 nt, likely because the large distance between the stalled
     ribosome and the downstream EJC reduces UPF1-EJC contact.</li>
 </ol>
 
 <p>
 Non-coding transcripts (where CDS start equals CDS end) are excluded.
 Overlapping regions from multiple transcripts with identical coordinates and
 the same rule are collapsed into a single item, with the contributing
@@ -63,58 +69,61 @@
 </p>
 
 <p>
 However, PTCs located in the last coding exon or within approximately 50 bp
 upstream of the last exon-exon junction are too close to the final EJC (or
 have no downstream EJC at all) for NMD to be triggered&mdash;the transcript
 escapes degradation. Conversely, PTCs located more than 50&ndash;55 bp
 upstream of the last exon-exon junction are predicted to elicit NMD.
 </p>
 
 <p>
 Additional escape mechanisms, supported by Lindeboom et al. 2016 and other
 studies, are captured by three further rules:
 </p>
 <ul>
-  <li><b>Intronless transcripts</b> deposit no EJCs during splicing, so any
-    PTC escapes NMD.</li>
+  <li><b>Transcripts with no EJC downstream of the stop codon</b> (single coding
+    exon and no 3&#8242;UTR intron) cannot trigger NMD, so any PTC in the coding
+    sequence escapes. 5&#8242;UTR introns are tolerated because their EJCs are
+    upstream of the stop.</li>
   <li><b>Start-proximal PTCs</b> (within the first 100 bp of coding sequence)
     escape NMD, likely through translation re-initiation at a downstream AUG
     codon.</li>
   <li><b>PTCs in long coding exons</b> (&gt;400 bp) show reduced NMD
     efficiency (61% vs. 98% for shorter exons in Lindeboom et al. 2016),
     likely because the large distance between the stalled ribosome and the
     downstream EJC reduces UPF1-EJC contact.</li>
 </ul>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Regions from overlapping transcripts with the same coordinates are collapsed into
 a single item. The gene symbol is shown as the item name. Mouseover displays the
 NMD escape rule and the number of transcripts. The details page lists all
 contributing transcript IDs.
 </p>
 
 <p>
 Items are colored by the NMD escape rule that applies:
 </p>
 <ul>
   <li><font color="#FF0000"><b>Red</b></font> &ndash; Rule 1: Last 50 bp
     of the last coding exon-exon junction. A PTC here is too close to the
     last exon junction complex (EJC) for NMD to be triggered.</li>
-  <li><font color="#FF8C00"><b>Orange</b></font> &ndash; Rule 2: Intronless
-    (single-exon) transcript. No EJCs are deposited, so all PTCs escape NMD.</li>
+  <li><font color="#FF8C00"><b>Orange</b></font> &ndash; Rule 2: Single coding
+    exon and no 3&#8242;UTR intron. No EJC is deposited downstream of the stop
+    codon, so all PTCs in the coding sequence escape NMD.</li>
   <li><font color="#8B0000"><b>Dark red</b></font> &ndash; Rule 3: First 100 bp
     of coding nucleotides. PTCs in this start-proximal region are insensitive
     to NMD, possibly due to translation re-initiation at a downstream AUG codon.</li>
   <li><font color="#FFD700"><b>Gold</b></font> &ndash; Rule 4: Coding exons
     longer than 400 bp (excluding the last coding exon). NMD efficiency is
     reduced in these long exons because the PTC is far from the downstream
     exon-exon junction.</li>
 </ul>
 
 <h2>Data Access</h2>
 <p>
 The data underlying this track can be explored interactively with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis,
 the data may be queried from our