30374e3fc3390902c35bb463510567f1b6f7a96e lrnassar Wed Apr 22 13:44:44 2026 -0700 PrimateAI-3D: clarify origin of the 0.821 threshold per Max. refs #37274 Description previously juxtaposed the paper's 0.821 clinical threshold with the 75/25 benign/pathogenic split in a way that implied the two were related. Per Max on the ticket: the 0.821 threshold comes from Gao et al. 2023 Fig. 5A (calibrated against de novo missense excess in a clinical cohort, n=7,238 pathogenic calls), and the "prediction" column values are Illumina's own calls — not a simple application of the 0.821 threshold (some variants below it are labeled pathogenic and vice versa). diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index efbefed0947..7f4ea570c65 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -1,104 +1,111 @@

Description

PrimateAI-3D is a semi-supervised 3D convolutional neural network that predicts the pathogenicity of all possible missense variants in the human genome. It was trained on 4.5 million benign missense variants: 4.3 million common variants from 809 non-human primate individuals across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants.

The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes.

Display Conventions

Each variant is colored blue (benign) or red (pathogenic) based on the Illumina-provided Prediction field. Because the three possible alternate bases at a given position sometimes produce the same amino acid change (codon degeneracy), each item is labeled by default with its nucleotide change (e.g. C>T) rather than its amino acid change. The label can be switched to the amino acid change via the "Label fields" control in the Track Settings.

Hovering over a variant shows:

Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D score, or by percentile.

Data Access

Due to the data license, the Table Browser, Data Integrator, and the REST API's getData endpoint are disabled for this track. The source data can be downloaded from the PrimateAI-3D website (requires registration). The primate variant database is available at PrimAD. Our Zoonomia 447-way Mammal/Primate alignment track displays the primate variants used in training PrimateAI-3D.

Methods

The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. The tab-separated file contains pre-computed scores for all possible single nucleotide missense variants. Positions were formatted as bigBed. The percentile score was put into the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million scored variants are included. A conversion script is available from our Github.

Credits

Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available.

References

Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK et al. The landscape of tolerated genetic variation in humans and primates. Science. 2023 Jun 2;380(6648):eabn8153. PMID: 37262156; PMC: PMC10713091

Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018 Aug;50(8):1161-1170. PMID: 30038395; PMC: PMC6237276