3a62ea7e9a8cb3503586a0a78570331308c9bc58
max
  Mon Apr 27 02:23:00 2026 -0700
NMD Escape MANE: expose NM_ accession via labelFields. refs #33737

Per QA, the MANE subtrack now shows the NCBI RefSeq accession by default
instead of the HGNC gene symbol, with the ENST and gene symbol still
selectable via labelFields.

- genePredNmdEsc: new --ncbi-id-field N option (default -1 = unused).
When set, the named bigGenePred column is captured per-transcript and
written into a new ncbiIds output column. For MANE pass 21.
- genePredNmdEsc: new --no-collapse option. By default, regions with
identical (chrom, start, end, rule) from multiple transcripts collapse
into one row with comma-separated lists. With --no-collapse the script
emits one row per (transcript, region). Used for MANE so each
label-field column holds a single value: the 74 MANE Plus Clinical
genes (e.g. LMNA) get two rows per region instead of one row with a
two-element list.
- nmdEscCollapsed.as: add lstring ncbiIds column. Schema is now bed9+3.
- nmd.ra (nmdEscMane only): labelFields ncbiIds,name,transcripts;
defaultLabelFields ncbiIds; labelSeparator " / ". Gencode and RefSeq
subtracks unchanged - they default to the gene symbol (name column)
and have an empty ncbiIds column.
- doc/hg38/nmd.txt: bump all three bedToBigBed invocations to bed9+3
and document the --ncbi-id-field 21 + --no-collapse invocation for
MANE.

Counts: MANE 68,028 (--no-collapse); Gencode 233,375; RefSeq 112,356.

diff --git src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as
index 54c95a12c08..53837ff1e3e 100644
--- src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as
+++ src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as
@@ -1,15 +1,16 @@
 table nmdEscCollapsed
 "NMD escape regions collapsed across overlapping transcripts"
     (
     string chrom;      "Chromosome (or contig, scaffold, etc.)"
     uint   chromStart; "Start position in chromosome"
     uint   chromEnd;   "End position in chromosome"
     string name;       "Gene symbol (falls back to transcript ID if no gene symbol is available)"
     uint   score;      "Score from 0-1000"
     char[1] strand;    "+ or -"
     uint thickStart;   "Start of where display should be thick"
     uint thickEnd;     "End of where display should be thick"
     uint color;        "RGB color: red=rule 1, orange=rule 2, dark red=rule 3, gold=rule 4"
     string mouseover;  "Rule description and transcript count"
     lstring transcripts; "Comma-separated list of transcript IDs from which this region was derived"
+    lstring ncbiIds;   "Comma-separated list of NCBI RefSeq accessions (NM_/NR_); populated for MANE only"
     )