de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 lrnassar Tue Apr 21 18:22:59 2026 -0700 PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover. Variant analysts typically work at the nucleotide level, and the current item label (amino acid change) collapses distinguishable variants: ~17% of items share their (chrom, pos, AA-change) tuple with another item because of codon degeneracy (e.g. three C>A, C>G, C>T at the same position can all appear as "M>I"). Labeling by nucleotide change makes every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on hg19 from overlapping transcripts). - primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)"; new field aaChange (placed before ref/alt) holds the amino acid change. - primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column, and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and a colored prediction string. - primateAi.ra: add labelFields name,aaChange and defaultLabelFields name so users can toggle the on-feature label between nt change (default) and AA change. - primateAi.html: expand Display Conventions with the label-convention rationale and a legend for each mouseover field. refs #37274 diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index afd3c3ed330..efbefed0947 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -1,86 +1,104 @@

Description

PrimateAI-3D is a semi-supervised 3D convolutional neural network that predicts the pathogenicity of all possible missense variants in the human genome. It was trained on 4.5 million benign missense variants: 4.3 million common variants from 809 non-human primate individuals across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants.

The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes.

Display Conventions

Each variant is colored blue (benign) or -red (pathogenic) based on the raw score. -The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score, -where higher values indicate greater predicted pathogenicity. -Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction. -Items can be filtered by prediction (benign/pathogenic) and by percentile score. +red (pathogenic) based on the Illumina-provided +Prediction field. Because the three possible alternate bases at a given +position sometimes produce the same amino acid change (codon degeneracy), +each item is labeled by default with its nucleotide change (e.g. C>T) +rather than its amino acid change. The label can be switched to the amino acid +change via the "Label fields" control in the Track Settings.

-Score interpretation: raw scores range from 0 to 1, with higher values indicating greater -predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for -distinguishing pathogenic from benign missense variants. The percentile field shows -where a variant's score ranks relative to all other scored variants. 75% of variants -are classified as benign, 25% as pathogenic. +Hovering over a variant shows: +

+ + +

+Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D +score, or by percentile.

Data Access

Due to the data license, the Table Browser, Data Integrator, and the REST API's getData endpoint are disabled for this track. The source data can be downloaded from the PrimateAI-3D website (requires registration). The primate variant database is available at PrimAD. Our Zoonomia 447-way Mammal/Primate alignment track displays the primate variants used in training PrimateAI-3D.

Methods

The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. The tab-separated file contains pre-computed scores for all possible single nucleotide missense variants. Positions were formatted as bigBed. The percentile score was put into the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million scored variants are included. A conversion script is available from our Github.

Credits

Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available.

References

Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK et al. The landscape of tolerated genetic variation in humans and primates. Science. 2023 Jun 2;380(6648):eabn8153. PMID: 37262156; PMC: PMC10713091

Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018 Aug;50(8):1161-1170. PMID: 30038395; PMC: PMC6237276