50466766840ded6cb8bd5cb868bdf2ff3f613bc0 lrnassar Tue Apr 21 11:17:15 2026 -0700 QA fixes for PrimateAI-3D track. Config (primateAi.ra): - Fix broken Ensembl transcript linkout: urls $S expanded to chromosome name; switch to the Ensembl transcript page with $$ - Add numeric filters on percentile and raw score (label notes the paper's 0.821 clinical threshold) - Add maxWindowToDraw 2000000 Data (primateAiToBigBed.py): - Change hardcoded strand '+' to '.': the source file has no strand column - Accept input/output paths as CLI args (previously hardcoded the hg38 input path) - Handle variable field count: ~2.4M rows in the hg19 source are missing the refseq column Description (primateAi.html): - Fix two broken hgTrackUi&... internal links to the Zoonomia 447-way track - Regenerate the first reference via getTrackReferences (wrong article number and wrong PMC ID in the previous text) - Fix the GitHub URL for the conversion script in Methods - Move the Zoonomia 447-way mention out of Description; rephrase the license note to describe precisely what is disabled relatedTracks.ra: - Add reciprocal cross-links for primateAi <-> alphaMissense (hg38), primateAi <-> revel (hg38 + hg19), and primateAi <-> promoterAi (hg38). Also includes promoterAi <-> alphaMissense cross-links. refs #37274 #37279 diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index a3ac78c3b1f..afd3c3ed330 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -1,86 +1,86 @@
PrimateAI-3D is a semi-supervised 3D convolutional neural network that predicts the pathogenicity of all possible missense variants in the human genome. It was trained on 4.5 million benign missense variants: 4.3 million common variants from 809 non-human primate individuals across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants. -Activate the Zoonomia 447 way Mammal/Primate alignment -track to show these variants.
The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes.
Each variant is colored blue (benign) or red (pathogenic) based on the raw score. The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score, where higher values indicate greater predicted pathogenicity. Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction. Items can be filtered by prediction (benign/pathogenic) and by percentile score.
Score interpretation: raw scores range from 0 to 1, with higher values indicating greater predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for distinguishing pathogenic from benign missense variants. The percentile field shows where a variant's score ranks relative to all other scored variants. 75% of variants are classified as benign, 25% as pathogenic.
-Due to the data license, this track is not available for bulk download from UCSC and the API, the Table Browser
-and the "Download track data" button do not work. However, the source data can be downloaded from the
+Due to the data license, the Table Browser, Data Integrator, and the REST API's
+getData endpoint are disabled for this track. The source data can be
+downloaded from the
PrimateAI-3D website
(requires registration). The primate variant database is available at
PrimAD.
-Note that our Zoonomia 447 way alignment
-track includes the primate variants.
+Our Zoonomia 447-way Mammal/Primate alignment
+track displays the primate variants used in training PrimateAI-3D.
The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. The tab-separated file contains pre-computed scores for all possible single nucleotide missense variants. Positions were formatted as bigBed. The percentile score was put into the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million scored variants are included. A conversion script is available from -our Github.
Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available.
-Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, -Fiziev PP, Kuderna LFK et al. - +Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna +LFK et al. + The landscape of tolerated genetic variation in humans and primates. -Science. 2023 Jun 2;380(6648):eabn8197. +Science. 2023 Jun 2;380(6648):eabn8153. PMID: 37262156; PMC: PMC10187174 +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10713091/" target="_blank">PMC10713091
-Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, -Dutta A, Shon J et al. +Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, +Shon J et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018 Aug;50(8):1161-1170. PMID: 30038395; PMC: PMC6237276