6e61d3349b36cbcc01500c1483cc7bfbc141d9ea lrnassar Wed Apr 22 13:47:33 2026 -0700 PrimateAI-3D: tighten 0.821 threshold wording per the paper. refs #37274 Confirmed against Gao 2023 (PMC10713091): the calibration cohort is the Deciphering Developmental Disorders (DDD) neurodevelopmental cohort, not ClinVar. The cutoff was chosen so that the count of pathogenic calls (n=7,238) matched the excess of de novo missense mutations above the trinucleotide background expectation in that cohort. diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index 7f4ea570c65..67f715c352b 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -1,111 +1,113 @@

Description

PrimateAI-3D is a semi-supervised 3D convolutional neural network that predicts the pathogenicity of all possible missense variants in the human genome. It was trained on 4.5 million benign missense variants: 4.3 million common variants from 809 non-human primate individuals across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants.

The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes.

Display Conventions

Each variant is colored blue (benign) or red (pathogenic) based on the Illumina-provided Prediction field. Because the three possible alternate bases at a given position sometimes produce the same amino acid change (codon degeneracy), each item is labeled by default with its nucleotide change (e.g. C>T) rather than its amino acid change. The label can be switched to the amino acid change via the "Label fields" control in the Track Settings.

Hovering over a variant shows:

Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D score, or by percentile.

Data Access

Due to the data license, the Table Browser, Data Integrator, and the REST API's getData endpoint are disabled for this track. The source data can be downloaded from the PrimateAI-3D website (requires registration). The primate variant database is available at PrimAD. Our Zoonomia 447-way Mammal/Primate alignment track displays the primate variants used in training PrimateAI-3D.

Methods

The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. The tab-separated file contains pre-computed scores for all possible single nucleotide missense variants. Positions were formatted as bigBed. The percentile score was put into the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million scored variants are included. A conversion script is available from our Github.

Credits

Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available.

References

Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK et al. The landscape of tolerated genetic variation in humans and primates. Science. 2023 Jun 2;380(6648):eabn8153. PMID: 37262156; PMC: PMC10713091

Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018 Aug;50(8):1161-1170. PMID: 30038395; PMC: PMC6237276