e82f973dc7a5a814f0fd23999aa27222fa1260da max Fri Mar 20 09:56:15 2026 -0700 primateAI track, refs #37274 diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html new file mode 100644 index 00000000000..a3ac78c3b1f --- /dev/null +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -0,0 +1,86 @@ +
+PrimateAI-3D is a +semi-supervised 3D convolutional neural network that predicts the pathogenicity of all +possible missense variants in the human genome. It was trained on 4.5 million benign +missense variants: 4.3 million common variants from 809 non-human primate individuals +across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, +TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants. +Activate the Zoonomia 447 way Mammal/Primate alignment +track to show these variants. +
+ ++The model operates on voxelized protein structures at 2 Å resolution (from +AlphaFold or homology models) combined with multiple sequence alignments from 592 species. +It uses three complementary loss functions: benign variant classification, 3D +fill-in-the-blank prediction on masked amino acids, and a language model ranking component. +This track shows 70.7 million scored variants across all protein-coding genes. +
+ ++Each variant is colored blue (benign) or +red (pathogenic) based on the raw score. +The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score, +where higher values indicate greater predicted pathogenicity. +Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction. +Items can be filtered by prediction (benign/pathogenic) and by percentile score. +
+ ++Score interpretation: raw scores range from 0 to 1, with higher values indicating greater +predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for +distinguishing pathogenic from benign missense variants. The percentile field shows +where a variant's score ranks relative to all other scored variants. 75% of variants +are classified as benign, 25% as pathogenic. +
+ ++Due to the data license, this track is not available for bulk download from UCSC and the API, the Table Browser +and the "Download track data" button do not work. However, the source data can be downloaded from the +PrimateAI-3D website +(requires registration). The primate variant database is available at +PrimAD. +Note that our Zoonomia 447 way alignment +track includes the primate variants. +
+ ++The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. +The tab-separated file contains pre-computed scores for all possible single nucleotide +missense variants. Positions were formatted as bigBed. The percentile score was put into +the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million +scored variants are included. +A conversion script is available from +our Github. +
+ ++Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available. +
+ ++Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, +Fiziev PP, Kuderna LFK et al. + +The landscape of tolerated genetic variation in humans and primates. +Science. 2023 Jun 2;380(6648):eabn8197. +PMID: 37262156; PMC: PMC10187174 +
+ ++Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, +Dutta A, Shon J et al. + +Predicting the clinical impact of human mutation with deep neural networks. +Nat Genet. 2018 Aug;50(8):1161-1170. +PMID: 30038395; PMC: PMC6237276 +