6c567fd9a03e87610681a43d2183ebb43547d1ad lrnassar Fri Apr 24 17:58:57 2026 -0700 PromoterAI: review followups. refs #37278 Move /gbdb/hg38/promoterAi/ to /gbdb/hg38/_promoterAi/ to match the underscore-prefix exclusion rule for hgdownload sync (same pattern as PrimateAI-3D under refs #37274). bigDataUrls and the makedoc updated. Bump bigWig maxHeightPixels from 128:20:8 to 128:40:8 -- the peer-track default of 20 is too cramped for a signed -1..+1 score. Description page: drop the wrong primateai3d.basespace.illumina.com link in Data Access; PromoterAI is not on BaseSpace, it's distributed via the license agreement on the GitHub page (a download link is emailed after submission). Reword Data Access and Methods accordingly. Description page: add Illumina's recommended interpretation thresholds (|score| >= 0.1, >= 0.2, >= 0.5) from the PromoterAI GitHub README, with a note that higher cutoffs select smaller, higher-confidence sets. diff --git src/hg/makeDb/doc/hg38/promoterAi.txt src/hg/makeDb/doc/hg38/promoterAi.txt index 2f608686fb0..193022a2da4 100644 --- src/hg/makeDb/doc/hg38/promoterAi.txt +++ src/hg/makeDb/doc/hg38/promoterAi.txt @@ -1,47 +1,49 @@ # PromoterAI, Claude max, Mar 20 2026 # Updated Apr 21 2026 (RM #37278 QA): streaming converter, transcript strand # carried through, per-transcript gene aggregation, AS gains a strands field, # bigBed score field now stores |PromoterAI|*1000 (impact magnitude). -# Source: promoterAI_tss500.tsv.gz from https://primateai3d.basespace.illumina.com/ -# (license-gated download; linked from https://github.com/Illumina/PromoterAI) +# Source: promoterAI_tss500.tsv.gz from Illumina, obtained by completing the +# license agreement linked from https://github.com/Illumina/PromoterAI +# (academic / non-commercial use; download link emailed after submission). # 262M rows, 118.6M unique variants, 39.5M unique positions, scores within 500bp of TSS # Input fields (1-based): chrom, pos, ref, alt, gene, gene_id, transcript_id, # strand (1 or -1), tss_pos, promoterAI cd /hive/data/genomes/hg38/bed/promoterai -# download promoterAI_tss500.tsv.gz from Illumina BaseSpace (requires registration) +# download promoterAI_tss500.tsv.gz from Illumina (license agreement on GitHub; +# Illumina emails a download link after submission) # convert to 4 bedGraph files (one per alt allele) + overlap BED. # Streaming: reads input row-by-row assuming input is sorted by (chrom, pos), # so memory use is proportional to the number of transcripts at a single # position, not the whole file. Safe on a 4 GB node. # Picks max absolute score when transcripts overlap; overlap BED has all # per-transcript scores + strands, tagged with the consensus strand (or '.' # when transcripts disagree on strand, i.e. bidirectional promoters). python3 ~/kent/src/hg/makeDb/scripts/promoterAiToBigWig.py # sort bedGraphs and convert to bigWig for alt in A C G T; do sort -k1,1 -k2,2n promoterAi_${alt}.bedGraph > promoterAi_${alt}.sorted.bedGraph bedGraphToBigWig promoterAi_${alt}.sorted.bedGraph /hive/data/genomes/hg38/chrom.sizes promoterAi_${alt}.bw rm promoterAi_${alt}.bedGraph promoterAi_${alt}.sorted.bedGraph done # sort overlap BED and convert to bigBed (bed9+6 -- see promoterAiOverlaps.as) sort -S 2G -k1,1 -k2,2n promoterAi_overlaps.bed > promoterAi_overlaps.sorted.bed bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/promoterAiOverlaps.as -tab \ promoterAi_overlaps.sorted.bed /hive/data/genomes/hg38/chrom.sizes promoterAi_overlaps.bb rm promoterAi_overlaps.bed promoterAi_overlaps.sorted.bed # symlinks -mkdir -p /gbdb/hg38/promoterAi -ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_A.bw /gbdb/hg38/promoterAi/a.bw -ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_C.bw /gbdb/hg38/promoterAi/c.bw -ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_G.bw /gbdb/hg38/promoterAi/g.bw -ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_T.bw /gbdb/hg38/promoterAi/t.bw -ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_overlaps.bb /gbdb/hg38/promoterAi/overlaps.bb +mkdir -p /gbdb/hg38/_promoterAi +ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_A.bw /gbdb/hg38/_promoterAi/a.bw +ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_C.bw /gbdb/hg38/_promoterAi/c.bw +ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_G.bw /gbdb/hg38/_promoterAi/g.bw +ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_T.bw /gbdb/hg38/_promoterAi/t.bw +ln -s /hive/data/genomes/hg38/bed/promoterai/promoterAi_overlaps.bb /gbdb/hg38/_promoterAi/overlaps.bb # Rebuild notes (Apr 21 2026): only the overlap bigBed needed regenerating # because the bigWig best-score logic is unchanged. The existing bigWigs were # left in place; only promoterAi_overlaps.bb was swapped (old kept as .bak).