f9a89b0e1ce3c937b4fbb879736c1619c35c271f lrnassar Tue Apr 21 12:11:02 2026 -0700 QA fixes for PromoterAI track. refs #37278 Description page: replaced the wrong reference (Gao et al. 2023, the PrimateAI-3D paper) with the actual PromoterAI citation (Jaganathan et al. Science 2025, PMID 40440429), corrected the score-direction wording (negative = under-expression, positive = over-expression, not "tolerated vs disruptive"), fixed the Data Access source link (Illumina BaseSpace, not the GitHub repo), and corrected the mouseover blurb to match mouseOverFunction noAverage behavior. Converter and AS: the overlap bigBed now carries the real per-transcript strand from the source TSV (was hardcoded '+'), with a new strands column in the AS, and the name field concatenates unique gene symbols so bidirectional-promoter items read as "HES4,ISG15" etc. BED score is now |PromoterAI|*1000 so scoreFilter is meaningful. Rewrote the converter to stream (sorted input), which drops peak memory from ~40 GB to a few MB. trackDb: added filterLabel/filterLimits on scoreDiff (the filter was unusable without labels), scoreFilter + scoreLabel, alwaysZero and autoScale off on the bigWig subtracks, color 200,0,0 / altColor 0,0,200 so signed bigWig bars draw red (over-expression) above zero and blue (under-expression) below, matching the overlap track itemRgb. Added maxWindowToDraw and maxItems on the overlap subtrack. Makedoc updated to describe the streaming pipeline, the new strands column, and the rebuild workflow. diff --git src/hg/makeDb/scripts/promoterAiOverlaps.as src/hg/makeDb/scripts/promoterAiOverlaps.as index b2859d35438..7dcb08c28a2 100644 --- src/hg/makeDb/scripts/promoterAiOverlaps.as +++ src/hg/makeDb/scripts/promoterAiOverlaps.as @@ -1,18 +1,19 @@ table promoterAiOverlaps "PromoterAI overlap positions where transcripts disagree on score" ( string chrom; "Chromosome" uint chromStart; "Start position (0-based)" uint chromEnd; "End position" - string name; "Gene name" - uint score; "Score scaled 0-1000" - char[1] strand; "Strand" + string name; "Gene name(s) (comma-separated if multiple)" + uint score; "|PromoterAI| * 1000 (0-1000), based on largest-magnitude score" + char[1] strand; "Consensus transcript strand ('.' if transcripts disagree)" uint thickStart; "Thick start" uint thickEnd; "Thick end" - uint reserved; "Item RGB color" + uint reserved; "Item RGB color (red=over-expression, blue=under-expression)" string alt; "Alternate allele" lstring transcripts; "Ensembl transcript IDs (comma-separated)" - lstring scores; "PromoterAI scores (comma-separated)" + lstring scores; "PromoterAI scores per transcript (comma-separated)" + lstring strands; "Transcript strands (comma-separated, aligned with transcripts)" float scoreDiff; "Maximum score difference across transcripts" string _mouseOver; "Mouse over text" )