4bd316f5f1ca47328bd3f9a181214b788055f0bc
lrnassar
  Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737

Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".

Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.

Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.

Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.

QA cleanups: non-ASCII prime char replaced with &#8242;, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.

Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.

diff --git src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
index 8398645cc67..2a7bc848ee2 100644
--- src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
+++ src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
@@ -1,160 +1,164 @@
 <h2>Description</h2>
 <p>
 The <b>NMD escape ruleset</b> tracks show predicted regions where a premature termination
 codon (PTC) or frameshift variant is likely to cause the transcript to
 <em>escape</em> nonsense-mediated decay (NMD), leading to the production of an
 aberrant truncated protein rather than degradation of the mRNA.
 </p>
 
 <p>
 The following rules were applied to transcript annotations to define predicted
 NMD escape regions (Nagy et al, Trends Biochem Sci 1998 and Lindeboom et al, Nat Genet 2016):
 </p>
 
 <ol>
   <li><b>50 bp rule</b>: The entire last coding exon plus the last 50 bp of
     the penultimate coding exon. A PTC here has no downstream exon-exon
     junction (or is too close to the last one) for NMD to be triggered.
     Non-protein-coding 3' exons are not counted when identifying the last
-    coding junction.</li>
+    coding junction. Note: when the penultimate coding exon is shorter than
+    50 bp, the annotated region extends only to the upstream junction of
+    that exon and does not walk further upstream. A small number of
+    transcripts with unusually short penultimate coding exons are affected.</li>
   <li><b>Intronless transcripts</b>: Transcripts with a single exon. Since no
     EJCs are deposited on single-exon transcripts, all PTCs are predicted to
     escape NMD.</li>
   <li><b>Start-proximal region</b>: The first 100 bp of coding nucleotides.
     PTCs in this region do not lead to NMD, a phenomenon known as start-proximal
     NMD insensitivity. One proposed mechanism, supported by experimental
     evidence, is re-initiation of translation at a downstream AUG codon.</li>
   <li><b>Long exon rule</b>: Coding exons longer than 400 bp (excluding the last
     coding exon, which is already covered by the 50 bp rule). Lindeboom et al.
     2016 showed a marked drop in NMD efficiency (61% vs. 98%) for PTCs in exons
     longer than 400 nt, likely because the large distance between the stalled
     ribosome and the downstream EJC reduces UPF1-EJC contact.</li>
 </ol>
 
 <p>
 Non-coding transcripts (where CDS start equals CDS end) are excluded.
 Overlapping regions from multiple transcripts with identical coordinates and
 the same rule are collapsed into a single item, with the contributing
 transcript IDs stored as a comma-separated list.
 </p>
 
 <p>
 Two versions of this track are available, based on different transcript annotation sets:
 </p>
 <ul>
   <li><b><a href="hgTrackUi?g=nmdEscGencode">NMD escape Gencode</a></b>:
     Derived from GENCODE V49 transcript annotations.</li>
   <li><b><a href="hgTrackUi?g=nmdEscNcbiRefSeq">NMD escape NCBI RefSeq</a></b>:
-    Derived from NCBI RefSeq transcript annotations.</li>
+    Derived from NCBI RefSeq Curated transcript annotations (NM_ and NR_
+    accessions; predicted XM_/XR_ models are excluded).</li>
 </ul>
 
 <h2>Background</h2>
 <p>
 NMD escape regions were predicted based on the Exon Junction Complex
 (EJC)-dependent model of NMD. During normal translation, EJCs are deposited at
 exon-exon junctions after splicing. As the ribosome translates the mRNA, it
 displaces each EJC it encounters. When a PTC causes the ribosome to stall
 prematurely, any remaining downstream EJCs recruit surveillance factors
 (notably UPF1) that trigger mRNA degradation via NMD.
 </p>
 
 <p>
 However, PTCs located in the last coding exon or within approximately 50 bp
 upstream of the last exon-exon junction are too close to the final EJC (or
 have no downstream EJC at all) for NMD to be triggered&mdash;the transcript
 escapes degradation. Conversely, PTCs located more than 50&ndash;55 bp
 upstream of the last exon-exon junction are predicted to elicit NMD.
 </p>
 
 <p>
 Additional escape mechanisms, supported by Lindeboom et al. 2016 and other
 studies, are captured by three further rules:
 </p>
 <ul>
   <li><b>Intronless transcripts</b> deposit no EJCs during splicing, so any
     PTC escapes NMD.</li>
   <li><b>Start-proximal PTCs</b> (within the first 100 bp of coding sequence)
     escape NMD, likely through translation re-initiation at a downstream AUG
     codon.</li>
   <li><b>PTCs in long coding exons</b> (&gt;400 bp) show reduced NMD
     efficiency (61% vs. 98% for shorter exons in Lindeboom et al. 2016),
     likely because the large distance between the stalled ribosome and the
     downstream EJC reduces UPF1-EJC contact.</li>
 </ul>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Regions from overlapping transcripts with the same coordinates are collapsed into
 a single item. The gene symbol is shown as the item name. Mouseover displays the
 NMD escape rule and the number of transcripts. The details page lists all
 contributing transcript IDs.
 </p>
 
 <p>
 Items are colored by the NMD escape rule that applies:
 </p>
 <ul>
   <li><font color="#FF0000"><b>Red</b></font> &ndash; Rule 1: Last 50 bp
     of the last coding exon-exon junction. A PTC here is too close to the
     last exon junction complex (EJC) for NMD to be triggered.</li>
   <li><font color="#FF8C00"><b>Orange</b></font> &ndash; Rule 2: Intronless
     (single-exon) transcript. No EJCs are deposited, so all PTCs escape NMD.</li>
   <li><font color="#8B0000"><b>Dark red</b></font> &ndash; Rule 3: First 100 bp
     of coding nucleotides. PTCs in this start-proximal region are insensitive
     to NMD, possibly due to translation re-initiation at a downstream AUG codon.</li>
   <li><font color="#FFD700"><b>Gold</b></font> &ndash; Rule 4: Coding exons
     longer than 400 bp (excluding the last coding exon). NMD efficiency is
     reduced in these long exons because the PTC is far from the downstream
     exon-exon junction.</li>
 </ul>
 
 <h2>Data Access</h2>
 <p>
 The data underlying this track can be explored interactively with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis,
 the data may be queried from our
 <a href="/goldenPath/help/api.html">REST API</a>. Please refer to our
-<a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome">mailing
-list archives</a> for questions, or our
+<a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome"
+target="_blank">mailing list archives</a> for questions, or our
 <a href="../FAQ/FAQdownloads.html#download36">Data Access FAQ</a> for more
 information.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to Guido Neidhardt for suggesting this track at HUGO VEPTC 2025 and Andreas Lahner
 for feedback. Thanks to the Decipher Genome Browser team for introducing the idea of a
 track.
 </p>
 
 <h2>References</h2>
 
 <p>
 Kurosaki T, Popp MW, Maquat LE.
 <a href="https://doi.org/10.1038/s41580-019-0126-2" target="_blank">
 Quality and quantity control of gene expression by nonsense-mediated mRNA decay</a>.
 <em>Nat Rev Mol Cell Biol</em>. 2019 Jul;20(7):406-420.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30992545" target="_blank">30992545</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855384/" target="_blank">PMC6855384</a>
 </p>
 
 <p>
 Lindeboom RGH, Supek F, Lehner B.
 <a href="https://doi.org/10.1038/ng.3664" target="_blank">
 The rules and impact of nonsense-mediated mRNA decay in human cancers</a>.
 <em>Nat Genet</em>. 2016 Oct;48(10):1112-8.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27618451" target="_blank">27618451</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045715/" target="_blank">PMC5045715</a>
 </p>
 
 <p>
 Nagy E, Maquat LE.
 <a href="https://linkinghub.elsevier.com/retrieve/pii/S0968-0004(98)01208-0" target="_blank">
 A rule for termination-codon position within intron-containing genes: when nonsense affects RNA
 abundance</a>.
 <em>Trends Biochem Sci</em>. 1998 Jun;23(6):198-9.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/9644970" target="_blank">9644970</a>
 </p>