4bd316f5f1ca47328bd3f9a181214b788055f0bc lrnassar Tue Apr 21 13:29:26 2026 -0700 NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737 Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all) to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models) per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts". Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of len(cdsExons)==1. The old test misclassified multi-exon transcripts with a single CDS exon (UTR introns) as "intronless" and silently suppressed their Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt both tracks. Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a penultimate coding exon shorter than 50 bp. Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq. QA cleanups: non-ASCII prime char replaced with ′, mailing list links given target="_blank" across all three HTML pages, dead commented nmdGencode block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4 color and the gene-symbol-to-transcript-ID fallback. Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks. diff --git src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html index 8398645cc67..2a7bc848ee2 100644 --- src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html +++ src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html @@ -4,60 +4,64 @@ codon (PTC) or frameshift variant is likely to cause the transcript to <em>escape</em> nonsense-mediated decay (NMD), leading to the production of an aberrant truncated protein rather than degradation of the mRNA. </p> <p> The following rules were applied to transcript annotations to define predicted NMD escape regions (Nagy et al, Trends Biochem Sci 1998 and Lindeboom et al, Nat Genet 2016): </p> <ol> <li><b>50 bp rule</b>: The entire last coding exon plus the last 50 bp of the penultimate coding exon. A PTC here has no downstream exon-exon junction (or is too close to the last one) for NMD to be triggered. Non-protein-coding 3' exons are not counted when identifying the last - coding junction.</li> + coding junction. Note: when the penultimate coding exon is shorter than + 50 bp, the annotated region extends only to the upstream junction of + that exon and does not walk further upstream. A small number of + transcripts with unusually short penultimate coding exons are affected.</li> <li><b>Intronless transcripts</b>: Transcripts with a single exon. Since no EJCs are deposited on single-exon transcripts, all PTCs are predicted to escape NMD.</li> <li><b>Start-proximal region</b>: The first 100 bp of coding nucleotides. PTCs in this region do not lead to NMD, a phenomenon known as start-proximal NMD insensitivity. One proposed mechanism, supported by experimental evidence, is re-initiation of translation at a downstream AUG codon.</li> <li><b>Long exon rule</b>: Coding exons longer than 400 bp (excluding the last coding exon, which is already covered by the 50 bp rule). Lindeboom et al. 2016 showed a marked drop in NMD efficiency (61% vs. 98%) for PTCs in exons longer than 400 nt, likely because the large distance between the stalled ribosome and the downstream EJC reduces UPF1-EJC contact.</li> </ol> <p> Non-coding transcripts (where CDS start equals CDS end) are excluded. Overlapping regions from multiple transcripts with identical coordinates and the same rule are collapsed into a single item, with the contributing transcript IDs stored as a comma-separated list. </p> <p> Two versions of this track are available, based on different transcript annotation sets: </p> <ul> <li><b><a href="hgTrackUi?g=nmdEscGencode">NMD escape Gencode</a></b>: Derived from GENCODE V49 transcript annotations.</li> <li><b><a href="hgTrackUi?g=nmdEscNcbiRefSeq">NMD escape NCBI RefSeq</a></b>: - Derived from NCBI RefSeq transcript annotations.</li> + Derived from NCBI RefSeq Curated transcript annotations (NM_ and NR_ + accessions; predicted XM_/XR_ models are excluded).</li> </ul> <h2>Background</h2> <p> NMD escape regions were predicted based on the Exon Junction Complex (EJC)-dependent model of NMD. During normal translation, EJCs are deposited at exon-exon junctions after splicing. As the ribosome translates the mRNA, it displaces each EJC it encounters. When a PTC causes the ribosome to stall prematurely, any remaining downstream EJCs recruit surveillance factors (notably UPF1) that trigger mRNA degradation via NMD. </p> <p> However, PTCs located in the last coding exon or within approximately 50 bp upstream of the last exon-exon junction are too close to the final EJC (or @@ -103,32 +107,32 @@ of coding nucleotides. PTCs in this start-proximal region are insensitive to NMD, possibly due to translation re-initiation at a downstream AUG codon.</li> <li><font color="#FFD700"><b>Gold</b></font> – Rule 4: Coding exons longer than 400 bp (excluding the last coding exon). NMD efficiency is reduced in these long exons because the PTC is far from the downstream exon-exon junction.</li> </ul> <h2>Data Access</h2> <p> The data underlying this track can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis, the data may be queried from our <a href="/goldenPath/help/api.html">REST API</a>. Please refer to our -<a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome">mailing -list archives</a> for questions, or our +<a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome" +target="_blank">mailing list archives</a> for questions, or our <a href="../FAQ/FAQdownloads.html#download36">Data Access FAQ</a> for more information. </p> <h2>Credits</h2> <p> Thanks to Guido Neidhardt for suggesting this track at HUGO VEPTC 2025 and Andreas Lahner for feedback. Thanks to the Decipher Genome Browser team for introducing the idea of a track. </p> <h2>References</h2> <p> Kurosaki T, Popp MW, Maquat LE.