src/hg/makeDb/trackDb/human/primateAi.html 30374e3fc3390902c35bb463510567f1b6f7a96e

30374e3fc3390902c35bb463510567f1b6f7a96e
lrnassar
  Wed Apr 22 13:44:44 2026 -0700
PrimateAI-3D: clarify origin of the 0.821 threshold per Max. refs #37274

Description previously juxtaposed the paper's 0.821 clinical threshold
with the 75/25 benign/pathogenic split in a way that implied the two
were related. Per Max on the ticket: the 0.821 threshold comes from
Gao et al. 2023 Fig. 5A (calibrated against de novo missense excess
in a clinical cohort, n=7,238 pathogenic calls), and the "prediction"
column values are Illumina's own calls — not a simple application of
the 0.821 threshold (some variants below it are labeled pathogenic and
vice versa).

diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html
index efbefed0947..7f4ea570c65 100644
--- src/hg/makeDb/trackDb/human/primateAi.html
+++ src/hg/makeDb/trackDb/human/primateAi.html
@@ -1,104 +1,111 @@
 <h2>Description</h2>
 <p>
 <a href="https://primateai3d.basespace.illumina.com/" target="_blank">PrimateAI-3D</a> is a
 semi-supervised 3D convolutional neural network that predicts the pathogenicity of all
 possible missense variants in the human genome. It was trained on 4.5 million benign
 missense variants: 4.3 million common variants from 809 non-human primate individuals
 across 233 species, plus common human variants (&gt;0.1% allele frequency) from gnomAD,
 TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants.
 </p>
 
 <p>
 The model operates on voxelized protein structures at 2 &Aring; resolution (from
 AlphaFold or homology models) combined with multiple sequence alignments from 592 species.
 It uses three complementary loss functions: benign variant classification, 3D
 fill-in-the-blank prediction on masked amino acids, and a language model ranking component.
 This track shows 70.7 million scored variants across all protein-coding genes.
 </p>
 
 <h2>Display Conventions</h2>
 <p>
 Each variant is colored <span style="color:blue">blue (benign)</span> or
 <span style="color:red">red (pathogenic)</span> based on the Illumina-provided
 <b>Prediction</b> field. Because the three possible alternate bases at a given
 position sometimes produce the same amino acid change (codon degeneracy),
 each item is labeled by default with its <b>nucleotide change</b> (e.g. <code>C&gt;T</code>)
 rather than its amino acid change. The label can be switched to the amino acid
 change via the &quot;Label fields&quot; control in the Track Settings.
 </p>
 
 <p>
 Hovering over a variant shows:
 </p>
 <ul>
   <li><b>Var</b> &mdash; the nucleotide substitution on the + strand
       (reference&nbsp;&gt;&nbsp;alternate)</li>
   <li><b>AA</b> &mdash; the resulting amino acid change
       (single-letter reference&nbsp;&gt;&nbsp;alternate)</li>
   <li><b>Score</b> &mdash; the raw PrimateAI-3D pathogenicity score (0&ndash;1).
-      The authors suggest a clinical threshold of 0.821 for distinguishing
-      pathogenic from benign missense variants.</li>
+      The authors suggest a clinical threshold of <b>0.821</b> for
+      distinguishing pathogenic from benign missense variants. This
+      threshold was calibrated against a subset of annotated mutations
+      in Gao et al. 2023 (Fig. 5A), chosen so that the number of
+      PrimateAI-3D pathogenic calls matched the observed excess of de
+      novo missense mutations in a clinical cohort (n&nbsp;=&nbsp;7,238).</li>
   <li><b>Perc</b> &mdash; the percentile rank of the raw score across all
       scored variants (0&ndash;1). The track score field (0&ndash;1000) is this
       value scaled by 1000.</li>
   <li><b>Pred</b> &mdash; Illumina&apos;s binary call:
       <span style="color:#0000c8">benign</span> or
-      <span style="color:#c80000">pathogenic</span>. In the track as
-      distributed, about 75% of variants are benign and 25% are pathogenic.</li>
+      <span style="color:#c80000">pathogenic</span>, as provided in the
+      source file. About 75% of variants in the track are benign and 25%
+      pathogenic. Note that this call is <em>not</em> a simple application
+      of the 0.821 raw-score threshold &mdash; some variants with raw
+      scores below 0.821 are labeled pathogenic and vice versa.</li>
 </ul>
 
 <p>
 Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D
 score, or by percentile.
 </p>
 
 <h2>Data Access</h2>
 <p>
 Due to the data license, the Table Browser, Data Integrator, and the REST API's
 <code>getData</code> endpoint are disabled for this track. The source data can be
 downloaded from the
 <a href="https://primateai3d.basespace.illumina.com/" target="_blank">PrimateAI-3D website</a>
 (requires registration). The primate variant database is available at
 <a href="https://primad.basespace.illumina.com/" target="_blank">PrimAD</a>.
 Our <a href="hgTrackUi?db=hg38&g=cons447way">Zoonomia 447-way Mammal/Primate</a> alignment
 track displays the primate variants used in training PrimateAI-3D.
 </p>
 
 <h2>Methods</h2>
 <p>
 The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website.
 The tab-separated file contains pre-computed scores for all possible single nucleotide
 missense variants. Positions were formatted as bigBed. The percentile score was put into
 the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million
 scored variants are included.
 A conversion script is available from
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/primateai/primateAiToBigBed.py"
 target="_blank">our Github</a>.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available.
 </p>
 
 <h2>References</h2>
 <p>
 Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna
 LFK <em>et al</em>.
 <a href="https://www.science.org/doi/10.1126/science.abn8197?url_ver=Z39.88-2003&amp;rfr_id=ori:rid:crossref.org&amp;rfr_dat=cr_pub%20%200pubmed"
 target="_blank">
 The landscape of tolerated genetic variation in humans and primates</a>.
 <em>Science</em>. 2023 Jun 2;380(6648):eabn8153.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37262156" target="_blank">37262156</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10713091/" target="_blank">PMC10713091</a>
 </p>
 
 <p>
 Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A,
 Shon J <em>et al</em>.
 <a href="https://doi.org/10.1038/s41588-018-0167-z" target="_blank">
 Predicting the clinical impact of human mutation with deep neural networks</a>.
 <em>Nat Genet</em>. 2018 Aug;50(8):1161-1170.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30038395" target="_blank">30038395</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237276/" target="_blank">PMC6237276</a>
 </p>