6c567fd9a03e87610681a43d2183ebb43547d1ad
lrnassar
  Fri Apr 24 17:58:57 2026 -0700
PromoterAI: review followups. refs #37278

Move /gbdb/hg38/promoterAi/ to /gbdb/hg38/_promoterAi/ to match the
underscore-prefix exclusion rule for hgdownload sync (same pattern as
PrimateAI-3D under refs #37274). bigDataUrls and the makedoc updated.

Bump bigWig maxHeightPixels from 128:20:8 to 128:40:8 -- the peer-track
default of 20 is too cramped for a signed -1..+1 score.

Description page: drop the wrong primateai3d.basespace.illumina.com link
in Data Access; PromoterAI is not on BaseSpace, it's distributed via the
license agreement on the GitHub page (a download link is emailed after
submission). Reword Data Access and Methods accordingly.

Description page: add Illumina's recommended interpretation thresholds
(|score| >= 0.1, >= 0.2, >= 0.5) from the PromoterAI GitHub README, with
a note that higher cutoffs select smaller, higher-confidence sets.

diff --git src/hg/makeDb/doc/hg38/promoterAi.txt src/hg/makeDb/doc/hg38/promoterAi.txt
index 2f608686fb0..193022a2da4 100644
--- src/hg/makeDb/doc/hg38/promoterAi.txt
+++ src/hg/makeDb/doc/hg38/promoterAi.txt
@@ -1,47 +1,49 @@
 # PromoterAI, Claude max, Mar 20 2026
 # Updated Apr 21 2026 (RM #37278 QA): streaming converter, transcript strand
 # carried through, per-transcript gene aggregation, AS gains a strands field,
 # bigBed score field now stores |PromoterAI|*1000 (impact magnitude).
 
-# Source: promoterAI_tss500.tsv.gz from https://primateai3d.basespace.illumina.com/
-# (license-gated download; linked from https://github.com/Illumina/PromoterAI)
+# Source: promoterAI_tss500.tsv.gz from Illumina, obtained by completing the
+# license agreement linked from https://github.com/Illumina/PromoterAI
+# (academic / non-commercial use; download link emailed after submission).
 # 262M rows, 118.6M unique variants, 39.5M unique positions, scores within 500bp of TSS
 # Input fields (1-based): chrom, pos, ref, alt, gene, gene_id, transcript_id,
 # strand (1 or -1), tss_pos, promoterAI
 
 cd /hive/data/genomes/hg38/bed/promoterai
-# download promoterAI_tss500.tsv.gz from Illumina BaseSpace (requires registration)
+# download promoterAI_tss500.tsv.gz from Illumina (license agreement on GitHub;
+# Illumina emails a download link after submission)
 
 # convert to 4 bedGraph files (one per alt allele) + overlap BED.
 # Streaming: reads input row-by-row assuming input is sorted by (chrom, pos),
 # so memory use is proportional to the number of transcripts at a single
 # position, not the whole file. Safe on a 4 GB node.
 # Picks max absolute score when transcripts overlap; overlap BED has all
 # per-transcript scores + strands, tagged with the consensus strand (or '.'
 # when transcripts disagree on strand, i.e. bidirectional promoters).
 python3 ~/kent/src/hg/makeDb/scripts/promoterAiToBigWig.py
 
 # sort bedGraphs and convert to bigWig
 for alt in A C G T; do
     sort -k1,1 -k2,2n promoterAi_${alt}.bedGraph > promoterAi_${alt}.sorted.bedGraph
     bedGraphToBigWig promoterAi_${alt}.sorted.bedGraph /hive/data/genomes/hg38/chrom.sizes promoterAi_${alt}.bw
     rm promoterAi_${alt}.bedGraph promoterAi_${alt}.sorted.bedGraph
 done
 
 # sort overlap BED and convert to bigBed (bed9+6 -- see promoterAiOverlaps.as)
 sort -S 2G -k1,1 -k2,2n promoterAi_overlaps.bed > promoterAi_overlaps.sorted.bed
 bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/promoterAiOverlaps.as -tab \
     promoterAi_overlaps.sorted.bed /hive/data/genomes/hg38/chrom.sizes promoterAi_overlaps.bb
 rm promoterAi_overlaps.bed promoterAi_overlaps.sorted.bed
 
 # symlinks
-mkdir -p /gbdb/hg38/promoterAi
-ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_A.bw /gbdb/hg38/promoterAi/a.bw
-ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_C.bw /gbdb/hg38/promoterAi/c.bw
-ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_G.bw /gbdb/hg38/promoterAi/g.bw
-ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_T.bw /gbdb/hg38/promoterAi/t.bw
-ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_overlaps.bb /gbdb/hg38/promoterAi/overlaps.bb
+mkdir -p /gbdb/hg38/_promoterAi
+ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_A.bw /gbdb/hg38/_promoterAi/a.bw
+ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_C.bw /gbdb/hg38/_promoterAi/c.bw
+ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_G.bw /gbdb/hg38/_promoterAi/g.bw
+ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_T.bw /gbdb/hg38/_promoterAi/t.bw
+ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_overlaps.bb /gbdb/hg38/_promoterAi/overlaps.bb
 
 # Rebuild notes (Apr 21 2026): only the overlap bigBed needed regenerating
 # because the bigWig best-score logic is unchanged. The existing bigWigs were
 # left in place; only promoterAi_overlaps.bb was swapped (old kept as .bak).