de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 lrnassar Tue Apr 21 18:22:59 2026 -0700 PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover. Variant analysts typically work at the nucleotide level, and the current item label (amino acid change) collapses distinguishable variants: ~17% of items share their (chrom, pos, AA-change) tuple with another item because of codon degeneracy (e.g. three C>A, C>G, C>T at the same position can all appear as "M>I"). Labeling by nucleotide change makes every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on hg19 from overlapping transcripts). - primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)"; new field aaChange (placed before ref/alt) holds the amino acid change. - primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column, and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and a colored prediction string. - primateAi.ra: add labelFields name,aaChange and defaultLabelFields name so users can toggle the on-feature label between nt change (default) and AA change. - primateAi.html: expand Display Conventions with the label-convention rationale and a legend for each mouseover field. refs #37274 diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index afd3c3ed330..efbefed0947 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -7,43 +7,61 @@ across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants.

The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes.

Display Conventions

Each variant is colored blue (benign) or -red (pathogenic) based on the raw score. -The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score, -where higher values indicate greater predicted pathogenicity. -Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction. -Items can be filtered by prediction (benign/pathogenic) and by percentile score. +red (pathogenic) based on the Illumina-provided +Prediction field. Because the three possible alternate bases at a given +position sometimes produce the same amino acid change (codon degeneracy), +each item is labeled by default with its nucleotide change (e.g. C>T) +rather than its amino acid change. The label can be switched to the amino acid +change via the "Label fields" control in the Track Settings.

-Score interpretation: raw scores range from 0 to 1, with higher values indicating greater -predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for -distinguishing pathogenic from benign missense variants. The percentile field shows -where a variant's score ranks relative to all other scored variants. 75% of variants -are classified as benign, 25% as pathogenic. +Hovering over a variant shows: +

+ + +

+Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D +score, or by percentile.

Data Access

Due to the data license, the Table Browser, Data Integrator, and the REST API's getData endpoint are disabled for this track. The source data can be downloaded from the PrimateAI-3D website (requires registration). The primate variant database is available at PrimAD. Our Zoonomia 447-way Mammal/Primate alignment track displays the primate variants used in training PrimateAI-3D.

Methods