4e986673a37800eacf1d36309b0cd38564a4bb1f max Wed Mar 25 07:55:59 2026 -0700 PromoterAI track scripts, docs, and makeDoc; remove unused primateAiToBigBed.py, refs #37278 Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/promoterAi.html src/hg/makeDb/trackDb/human/promoterAi.html new file mode 100644 index 00000000000..5868dd75570 --- /dev/null +++ src/hg/makeDb/trackDb/human/promoterAi.html @@ -0,0 +1,72 @@ +

Description

+

+PromoterAI is a deep learning model from Illumina that predicts the +impact of single nucleotide variants in gene promoter regions. It scores all possible +substitutions within 500 bp of annotated transcription start sites (TSS), covering +approximately 39.5 million genomic positions across all protein-coding genes. +

+ +

+Scores range from -1 to 1. Positive scores indicate predicted disruption of promoter +function, negative scores indicate the variant is predicted to be tolerated. The model +was trained using primate conservation and promoter sequence features, similar in approach +to the related PrimateAI-3D model for coding variants. +

+ +

Display Conventions

+

+This track is a composite with four bigWig subtracks, one for each possible alternate +allele (A, C, G, T). When zoomed in, the exact score for each possible mutation is shown +on mouseover. When zoomed out, the display shows an average across the visible window; +this average is indicated by a "~" prefix in the mouseover. +

+ +

+A fifth subtrack ("PromoterAI overlaps") shows positions where overlapping +transcripts produce different scores for the same variant. At these positions, the bigWig +shows the score with the largest absolute value, while the overlap track shows all +per-transcript scores. About 3.8% of positions have overlapping transcripts with +differing scores. The track shows the list of transcripts and scores for these positions. +Of these, for more than 60% of these positions, the difference is smaller than 0.01, +which is why we added a filter, active per default, that hides all annotations in this +track where the difference is smaller than this cutoff. The filter can be switched off +on the track configuration page. +

+ +

Data Access

+

+Due to the data license, this track is not available for bulk download from UCSC. +The source data can be downloaded from the +PromoterAI +GitHub page. +

+ +

Methods

+

+The PromoterAI hg38 TSS-500 file was downloaded. The file +contains pre-computed scores for all possible single nucleotide substitutions within +500 bp of annotated TSS positions. For positions covered by multiple transcripts, +the score with the largest absolute value was used for the bigWig tracks. Positions +where transcripts produced different scores (4.45M of 118.6M unique variants, 3.8%) +were additionally written to a bigBed overlap track with per-transcript detail. +A conversion script is available from +our Github. +

+ +

Credits

+

+Thanks to Illumina for making PromoterAI predictions publicly available. +

+ +

References

+

+Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, +Fiziev PP, Kuderna LFK et al. + +The landscape of tolerated genetic variation in humans and primates. +Science. 2023 Jun 2;380(6648):eabn8197. +PMID: 37262156; PMC: PMC10187174 +