de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 lrnassar Tue Apr 21 18:22:59 2026 -0700 PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover. Variant analysts typically work at the nucleotide level, and the current item label (amino acid change) collapses distinguishable variants: ~17% of items share their (chrom, pos, AA-change) tuple with another item because of codon degeneracy (e.g. three C>A, C>G, C>T at the same position can all appear as "M>I"). Labeling by nucleotide change makes every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on hg19 from overlapping transcripts). - primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)"; new field aaChange (placed before ref/alt) holds the amino acid change. - primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column, and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and a colored prediction string. - primateAi.ra: add labelFields name,aaChange and defaultLabelFields name so users can toggle the on-feature label between nt change (default) and AA change. - primateAi.html: expand Display Conventions with the label-convention rationale and a legend for each mouseover field. refs #37274 diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index afd3c3ed330..efbefed0947 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -1,86 +1,104 @@
PrimateAI-3D is a semi-supervised 3D convolutional neural network that predicts the pathogenicity of all possible missense variants in the human genome. It was trained on 4.5 million benign missense variants: 4.3 million common variants from 809 non-human primate individuals across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants.
The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes.
Each variant is colored blue (benign) or
-red (pathogenic) based on the raw score.
-The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score,
-where higher values indicate greater predicted pathogenicity.
-Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction.
-Items can be filtered by prediction (benign/pathogenic) and by percentile score.
+red (pathogenic) based on the Illumina-provided
+Prediction field. Because the three possible alternate bases at a given
+position sometimes produce the same amino acid change (codon degeneracy),
+each item is labeled by default with its nucleotide change (e.g. C>T)
+rather than its amino acid change. The label can be switched to the amino acid
+change via the "Label fields" control in the Track Settings.
-Score interpretation: raw scores range from 0 to 1, with higher values indicating greater -predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for -distinguishing pathogenic from benign missense variants. The percentile field shows -where a variant's score ranks relative to all other scored variants. 75% of variants -are classified as benign, 25% as pathogenic. +Hovering over a variant shows: +
++Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D +score, or by percentile.
Due to the data license, the Table Browser, Data Integrator, and the REST API's
getData endpoint are disabled for this track. The source data can be
downloaded from the
PrimateAI-3D website
(requires registration). The primate variant database is available at
PrimAD.
Our Zoonomia 447-way Mammal/Primate alignment
track displays the primate variants used in training PrimateAI-3D.
The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. The tab-separated file contains pre-computed scores for all possible single nucleotide missense variants. Positions were formatted as bigBed. The percentile score was put into the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million scored variants are included. A conversion script is available from our Github.
Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available.
Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK et al. The landscape of tolerated genetic variation in humans and primates. Science. 2023 Jun 2;380(6648):eabn8153. PMID: 37262156; PMC: PMC10713091
Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018 Aug;50(8):1161-1170. PMID: 30038395; PMC: PMC6237276