f9a89b0e1ce3c937b4fbb879736c1619c35c271f
lrnassar
  Tue Apr 21 12:11:02 2026 -0700
QA fixes for PromoterAI track. refs #37278

Description page: replaced the wrong reference (Gao et al. 2023, the PrimateAI-3D
paper) with the actual PromoterAI citation (Jaganathan et al. Science 2025, PMID
40440429), corrected the score-direction wording (negative = under-expression,
positive = over-expression, not "tolerated vs disruptive"), fixed the Data Access
source link (Illumina BaseSpace, not the GitHub repo), and corrected the mouseover
blurb to match mouseOverFunction noAverage behavior.

Converter and AS: the overlap bigBed now carries the real per-transcript strand
from the source TSV (was hardcoded '+'), with a new strands column in the AS, and
the name field concatenates unique gene symbols so bidirectional-promoter items
read as "HES4,ISG15" etc. BED score is now |PromoterAI|*1000 so scoreFilter is
meaningful. Rewrote the converter to stream (sorted input), which drops peak
memory from ~40 GB to a few MB.

trackDb: added filterLabel/filterLimits on scoreDiff (the filter was unusable
without labels), scoreFilter + scoreLabel, alwaysZero and autoScale off on the
bigWig subtracks, color 200,0,0 / altColor 0,0,200 so signed bigWig bars draw
red (over-expression) above zero and blue (under-expression) below, matching
the overlap track itemRgb. Added maxWindowToDraw and maxItems on the overlap
subtrack.

Makedoc updated to describe the streaming pipeline, the new strands column,
and the rebuild workflow.

diff --git src/hg/makeDb/scripts/promoterAiOverlaps.as src/hg/makeDb/scripts/promoterAiOverlaps.as
index b2859d35438..7dcb08c28a2 100644
--- src/hg/makeDb/scripts/promoterAiOverlaps.as
+++ src/hg/makeDb/scripts/promoterAiOverlaps.as
@@ -1,18 +1,19 @@
 table promoterAiOverlaps
 "PromoterAI overlap positions where transcripts disagree on score"
     (
     string chrom;        "Chromosome"
     uint chromStart;     "Start position (0-based)"
     uint chromEnd;       "End position"
-    string name;         "Gene name"
-    uint score;          "Score scaled 0-1000"
-    char[1] strand;      "Strand"
+    string name;         "Gene name(s) (comma-separated if multiple)"
+    uint score;          "|PromoterAI| * 1000 (0-1000), based on largest-magnitude score"
+    char[1] strand;      "Consensus transcript strand ('.' if transcripts disagree)"
     uint thickStart;     "Thick start"
     uint thickEnd;       "Thick end"
-    uint reserved;       "Item RGB color"
+    uint reserved;       "Item RGB color (red=over-expression, blue=under-expression)"
     string alt;          "Alternate allele"
     lstring transcripts; "Ensembl transcript IDs (comma-separated)"
-    lstring scores;      "PromoterAI scores (comma-separated)"
+    lstring scores;      "PromoterAI scores per transcript (comma-separated)"
+    lstring strands;     "Transcript strands (comma-separated, aligned with transcripts)"
     float scoreDiff;     "Maximum score difference across transcripts"
     string _mouseOver;   "Mouse over text"
     )