6e61d3349b36cbcc01500c1483cc7bfbc141d9ea lrnassar Wed Apr 22 13:47:33 2026 -0700 PrimateAI-3D: tighten 0.821 threshold wording per the paper. refs #37274 Confirmed against Gao 2023 (PMC10713091): the calibration cohort is the Deciphering Developmental Disorders (DDD) neurodevelopmental cohort, not ClinVar. The cutoff was chosen so that the count of pathogenic calls (n=7,238) matched the excess of de novo missense mutations above the trinucleotide background expectation in that cohort. diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index 7f4ea570c65..67f715c352b 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -1,111 +1,113 @@ <h2>Description</h2> <p> <a href="https://primateai3d.basespace.illumina.com/" target="_blank">PrimateAI-3D</a> is a semi-supervised 3D convolutional neural network that predicts the pathogenicity of all possible missense variants in the human genome. It was trained on 4.5 million benign missense variants: 4.3 million common variants from 809 non-human primate individuals across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants. </p> <p> The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes. </p> <h2>Display Conventions</h2> <p> Each variant is colored <span style="color:blue">blue (benign)</span> or <span style="color:red">red (pathogenic)</span> based on the Illumina-provided <b>Prediction</b> field. Because the three possible alternate bases at a given position sometimes produce the same amino acid change (codon degeneracy), each item is labeled by default with its <b>nucleotide change</b> (e.g. <code>C>T</code>) rather than its amino acid change. The label can be switched to the amino acid change via the "Label fields" control in the Track Settings. </p> <p> Hovering over a variant shows: </p> <ul> <li><b>Var</b> — the nucleotide substitution on the + strand (reference > alternate)</li> <li><b>AA</b> — the resulting amino acid change (single-letter reference > alternate)</li> <li><b>Score</b> — the raw PrimateAI-3D pathogenicity score (0–1). The authors suggest a clinical threshold of <b>0.821</b> for - distinguishing pathogenic from benign missense variants. This - threshold was calibrated against a subset of annotated mutations - in Gao et al. 2023 (Fig. 5A), chosen so that the number of - PrimateAI-3D pathogenic calls matched the observed excess of de - novo missense mutations in a clinical cohort (n = 7,238).</li> + distinguishing pathogenic from benign missense variants. In Gao + et al. 2023 (Fig. 5A) this threshold was derived from the + Deciphering Developmental Disorders (DDD) neurodevelopmental + cohort: the cutoff was chosen so that the number of variants + scored as pathogenic (n = 7,238) matched the observed + excess of de novo missense mutations above the trinucleotide + background expectation in that cohort.</li> <li><b>Perc</b> — the percentile rank of the raw score across all scored variants (0–1). The track score field (0–1000) is this value scaled by 1000.</li> <li><b>Pred</b> — Illumina's binary call: <span style="color:#0000c8">benign</span> or <span style="color:#c80000">pathogenic</span>, as provided in the source file. About 75% of variants in the track are benign and 25% pathogenic. Note that this call is <em>not</em> a simple application of the 0.821 raw-score threshold — some variants with raw scores below 0.821 are labeled pathogenic and vice versa.</li> </ul> <p> Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D score, or by percentile. </p> <h2>Data Access</h2> <p> Due to the data license, the Table Browser, Data Integrator, and the REST API's <code>getData</code> endpoint are disabled for this track. The source data can be downloaded from the <a href="https://primateai3d.basespace.illumina.com/" target="_blank">PrimateAI-3D website</a> (requires registration). The primate variant database is available at <a href="https://primad.basespace.illumina.com/" target="_blank">PrimAD</a>. Our <a href="hgTrackUi?db=hg38&g=cons447way">Zoonomia 447-way Mammal/Primate</a> alignment track displays the primate variants used in training PrimateAI-3D. </p> <h2>Methods</h2> <p> The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. The tab-separated file contains pre-computed scores for all possible single nucleotide missense variants. Positions were formatted as bigBed. The percentile score was put into the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million scored variants are included. A conversion script is available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/primateai/primateAiToBigBed.py" target="_blank">our Github</a>. </p> <h2>Credits</h2> <p> Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available. </p> <h2>References</h2> <p> Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK <em>et al</em>. <a href="https://www.science.org/doi/10.1126/science.abn8197?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed" target="_blank"> The landscape of tolerated genetic variation in humans and primates</a>. <em>Science</em>. 2023 Jun 2;380(6648):eabn8153. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37262156" target="_blank">37262156</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10713091/" target="_blank">PMC10713091</a> </p> <p> Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J <em>et al</em>. <a href="https://doi.org/10.1038/s41588-018-0167-z" target="_blank"> Predicting the clinical impact of human mutation with deep neural networks</a>. <em>Nat Genet</em>. 2018 Aug;50(8):1161-1170. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30038395" target="_blank">30038395</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237276/" target="_blank">PMC6237276</a> </p>