4e986673a37800eacf1d36309b0cd38564a4bb1f max Wed Mar 25 07:55:59 2026 -0700 PromoterAI track scripts, docs, and makeDoc; remove unused primateAiToBigBed.py, refs #37278 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/promoterAi.html src/hg/makeDb/trackDb/human/promoterAi.html new file mode 100644 index 00000000000..5868dd75570 --- /dev/null +++ src/hg/makeDb/trackDb/human/promoterAi.html @@ -0,0 +1,72 @@ +<h2>Description</h2> +<p> +<a href="https://github.com/Illumina/PromoterAI" +target="_blank">PromoterAI</a> is a deep learning model from Illumina that predicts the +impact of single nucleotide variants in gene promoter regions. It scores all possible +substitutions within 500 bp of annotated transcription start sites (TSS), covering +approximately 39.5 million genomic positions across all protein-coding genes. +</p> + +<p> +Scores range from -1 to 1. Positive scores indicate predicted disruption of promoter +function, negative scores indicate the variant is predicted to be tolerated. The model +was trained using primate conservation and promoter sequence features, similar in approach +to the related PrimateAI-3D model for coding variants. +</p> + +<h2>Display Conventions</h2> +<p> +This track is a composite with four bigWig subtracks, one for each possible alternate +allele (A, C, G, T). When zoomed in, the exact score for each possible mutation is shown +on mouseover. When zoomed out, the display shows an average across the visible window; +this average is indicated by a "~" prefix in the mouseover. +</p> + +<p> +A fifth subtrack ("PromoterAI overlaps") shows positions where overlapping +transcripts produce different scores for the same variant. At these positions, the bigWig +shows the score with the largest absolute value, while the overlap track shows all +per-transcript scores. About 3.8% of positions have overlapping transcripts with +differing scores. The track shows the list of transcripts and scores for these positions. +Of these, for more than 60% of these positions, the difference is smaller than 0.01, +which is why we added a filter, active per default, that hides all annotations in this +track where the difference is smaller than this cutoff. The filter can be switched off +on the track configuration page. +</p> + +<h2>Data Access</h2> +<p> +Due to the data license, this track is not available for bulk download from UCSC. +The source data can be downloaded from the +<a href="https://github.com/Illumina/PromoterAI" target="_blank">PromoterAI +GitHub page</a>. +</p> + +<h2>Methods</h2> +<p> +The PromoterAI hg38 TSS-500 file was downloaded. The file +contains pre-computed scores for all possible single nucleotide substitutions within +500 bp of annotated TSS positions. For positions covered by multiple transcripts, +the score with the largest absolute value was used for the bigWig tracks. Positions +where transcripts produced different scores (4.45M of 118.6M unique variants, 3.8%) +were additionally written to a bigBed overlap track with per-transcript detail. +A conversion script is available from +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/promoterAiToBigWig.py" +target="_blank">our Github</a>. +</p> + +<h2>Credits</h2> +<p> +Thanks to Illumina for making PromoterAI predictions publicly available. +</p> + +<h2>References</h2> +<p> +Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, +Fiziev PP, Kuderna LFK <em>et al</em>. +<a href="https://doi.org/10.1126/science.abn8197" target="_blank"> +The landscape of tolerated genetic variation in humans and primates</a>. +<em>Science</em>. 2023 Jun 2;380(6648):eabn8197. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37262156" target="_blank">37262156</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187174/" target="_blank">PMC10187174</a> +</p>