4bd316f5f1ca47328bd3f9a181214b788055f0bc
lrnassar
  Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737

Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".

Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.

Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.

Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.

QA cleanups: non-ASCII prime char replaced with &#8242;, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.

Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.

diff --git src/hg/makeDb/trackDb/human/hg38/nmd.html src/hg/makeDb/trackDb/human/hg38/nmd.html
index 53369d39588..8cf085ee766 100644
--- src/hg/makeDb/trackDb/human/hg38/nmd.html
+++ src/hg/makeDb/trackDb/human/hg38/nmd.html
@@ -11,31 +11,32 @@
 close to the first or last splice junction, within unusually long coding exons,
 or in transcripts without any junction.
 </p>
 
 <h2>Subtracks</h2>
 
 <h3>NMD escape regions</h3>
 <p>
 Rule-based predictions of NMD escape regions, computed from transcript
 annotations. Two transcript sets are provided:
 </p>
 <ul>
   <li><b><a href="hgTrackUi?g=nmdEscGencode">NMD escape Gencode</a></b>:
     NMD escape regions derived from GENCODE V49 transcripts.</li>
   <li><b><a href="hgTrackUi?g=nmdEscNcbiRefSeq">NMD escape NCBI RefSeq</a></b>:
-    NMD escape regions derived from NCBI RefSeq transcripts.</li>
+    NMD escape regions derived from NCBI RefSeq Curated transcripts
+    (NM_ and NR_ accessions only).</li>
 </ul>
 <p>
 Click either of the links to the track details here or above to show the four rules
 that were used (50bp, intronless, 100bp, long exon &gt;400nt).
 </p>
 
 <h3>NMDetective scores</h3>
 <p>
 Machine-learning predictions of NMD efficiency from
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/27618451" target="_blank">Lindeboom
 et al. 2016</a>. Two models (A = random forest, B = decision tree)
 predict whether a PTC at each position will trigger NMD or allow escape.
 Positive scores indicate predicted NMD triggering; negative scores indicate
 predicted escape.
 </p>
@@ -44,43 +45,43 @@
     Random forest model for all possible PTCs from nonsense variants.</li>
   <li><b><a href="hgTrackUi?g=nmdDetectiveB">NMDetective-B</a></b>:
     Decision tree model for all possible PTCs from nonsense variants.</li>
   <li><b><a href="hgTrackUi?g=nmdDetectiveA_ptc">NMDetective-A PTC</a></b>:
     Random forest model for the first out-of-frame PTC from frameshifting indels.</li>
   <li><b><a href="hgTrackUi?g=nmdDetectiveB_ptc">NMDetective-B PTC</a></b>:
     Decision tree model for the first out-of-frame PTC from frameshifting indels.</li>
 </ul>
 
 <h2>Background</h2>
 <p>
 The ACMG guidelines say under PVS1:
 </p>
 <p>
 <i>
-(ii) One must also be cautious when interpreting truncating variants downstream of the most 3′ truncating variant established as pathogenic in the literature. This is especially true if the predicted stop codon occurs in the last exon or in the last 50 base pairs of the penultimate exon, such that nonsense-mediated decay would not be predicted, and there is a higher likelihood of an expressed protein.
+(ii) One must also be cautious when interpreting truncating variants downstream of the most 3&#8242; truncating variant established as pathogenic in the literature. This is especially true if the predicted stop codon occurs in the last exon or in the last 50 base pairs of the penultimate exon, such that nonsense-mediated decay would not be predicted, and there is a higher likelihood of an expressed protein.
 </i>
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data underlying these tracks can be explored interactively with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis,
 the data may be queried from our
 <a href="/goldenPath/help/api.html">REST API</a>. Please refer to our
-<a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome">mailing
-list archives</a> for questions, or our
+<a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome"
+target="_blank">mailing list archives</a> for questions, or our
 <a href="../FAQ/FAQdownloads.html#download36">Data Access FAQ</a> for more
 information.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to Guido Neidhardt for suggesting this track at HUGO VEPTC 2025 and Andreas Lahner
 for feedback. Thanks to the Decipher Genome Browser team for introducing the idea of a
 track. Thanks to Rik Lindeboom for providing custom tracks.
 </p>
 
 <h2>References</h2>
 <p>
 Kurosaki T, Popp MW, Maquat LE.
 <a href="https://doi.org/10.1038/s41580-019-0126-2" target="_blank">