4bd316f5f1ca47328bd3f9a181214b788055f0bc
lrnassar
  Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737

Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".

Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.

Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.

Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.

QA cleanups: non-ASCII prime char replaced with &#8242;, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.

Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.

diff --git src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as
index 8ac61908579..54c95a12c08 100644
--- src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as
+++ src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as
@@ -1,15 +1,15 @@
 table nmdEscCollapsed
 "NMD escape regions collapsed across overlapping transcripts"
     (
     string chrom;      "Chromosome (or contig, scaffold, etc.)"
     uint   chromStart; "Start position in chromosome"
     uint   chromEnd;   "End position in chromosome"
-    string name;       "Gene symbol"
+    string name;       "Gene symbol (falls back to transcript ID if no gene symbol is available)"
     uint   score;      "Score from 0-1000"
     char[1] strand;    "+ or -"
     uint thickStart;   "Start of where display should be thick"
     uint thickEnd;     "End of where display should be thick"
-    uint color;        "RGB color: red=rule 1, orange=rule 2, dark red=rule 3"
+    uint color;        "RGB color: red=rule 1, orange=rule 2, dark red=rule 3, gold=rule 4"
     string mouseover;  "Rule description and transcript count"
     lstring transcripts; "Comma-separated list of transcript IDs from which this region was derived"
     )