de2ccf6d827865f11d3c8edd9ceeb1b6394a7380
lrnassar
  Tue Apr 21 18:22:59 2026 -0700
PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover.

Variant analysts typically work at the nucleotide level, and the current
item label (amino acid change) collapses distinguishable variants: ~17%
of items share their (chrom, pos, AA-change) tuple with another item
because of codon degeneracy (e.g. three C>A, C>G, C>T at the same
position can all appear as "M>I"). Labeling by nucleotide change makes
every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on
hg19 from overlapping transcripts).

- primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)";
new field aaChange (placed before ref/alt) holds the amino acid
change.
- primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column,
and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and
a colored prediction string.
- primateAi.ra: add labelFields name,aaChange and defaultLabelFields
name so users can toggle the on-feature label between nt change
(default) and AA change.
- primateAi.html: expand Display Conventions with the label-convention
rationale and a legend for each mouseover field.

refs #37274

diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html
index afd3c3ed330..efbefed0947 100644
--- src/hg/makeDb/trackDb/human/primateAi.html
+++ src/hg/makeDb/trackDb/human/primateAi.html
@@ -7,43 +7,61 @@
 across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD,
 TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants.
 </p>
 
 <p>
 The model operates on voxelized protein structures at 2 &Aring; resolution (from
 AlphaFold or homology models) combined with multiple sequence alignments from 592 species.
 It uses three complementary loss functions: benign variant classification, 3D
 fill-in-the-blank prediction on masked amino acids, and a language model ranking component.
 This track shows 70.7 million scored variants across all protein-coding genes.
 </p>
 
 <h2>Display Conventions</h2>
 <p>
 Each variant is colored <span style="color:blue">blue (benign)</span> or
-<span style="color:red">red (pathogenic)</span> based on the raw score.
-The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score,
-where higher values indicate greater predicted pathogenicity.
-Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction.
-Items can be filtered by prediction (benign/pathogenic) and by percentile score.
+<span style="color:red">red (pathogenic)</span> based on the Illumina-provided
+<b>Prediction</b> field. Because the three possible alternate bases at a given
+position sometimes produce the same amino acid change (codon degeneracy),
+each item is labeled by default with its <b>nucleotide change</b> (e.g. <code>C&gt;T</code>)
+rather than its amino acid change. The label can be switched to the amino acid
+change via the &quot;Label fields&quot; control in the Track Settings.
 </p>
 
 <p>
-Score interpretation: raw scores range from 0 to 1, with higher values indicating greater
-predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for
-distinguishing pathogenic from benign missense variants. The percentile field shows
-where a variant&apos;s score ranks relative to all other scored variants. 75% of variants
-are classified as benign, 25% as pathogenic.
+Hovering over a variant shows:
+</p>
+<ul>
+  <li><b>Var</b> &mdash; the nucleotide substitution on the + strand
+      (reference&nbsp;&gt;&nbsp;alternate)</li>
+  <li><b>AA</b> &mdash; the resulting amino acid change
+      (single-letter reference&nbsp;&gt;&nbsp;alternate)</li>
+  <li><b>Score</b> &mdash; the raw PrimateAI-3D pathogenicity score (0&ndash;1).
+      The authors suggest a clinical threshold of 0.821 for distinguishing
+      pathogenic from benign missense variants.</li>
+  <li><b>Perc</b> &mdash; the percentile rank of the raw score across all
+      scored variants (0&ndash;1). The track score field (0&ndash;1000) is this
+      value scaled by 1000.</li>
+  <li><b>Pred</b> &mdash; Illumina&apos;s binary call:
+      <span style="color:#0000c8">benign</span> or
+      <span style="color:#c80000">pathogenic</span>. In the track as
+      distributed, about 75% of variants are benign and 25% are pathogenic.</li>
+</ul>
+
+<p>
+Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D
+score, or by percentile.
 </p>
 
 <h2>Data Access</h2>
 <p>
 Due to the data license, the Table Browser, Data Integrator, and the REST API's
 <code>getData</code> endpoint are disabled for this track. The source data can be
 downloaded from the
 <a href="https://primateai3d.basespace.illumina.com/" target="_blank">PrimateAI-3D website</a>
 (requires registration). The primate variant database is available at
 <a href="https://primad.basespace.illumina.com/" target="_blank">PrimAD</a>.
 Our <a href="hgTrackUi?db=hg38&g=cons447way">Zoonomia 447-way Mammal/Primate</a> alignment
 track displays the primate variants used in training PrimateAI-3D.
 </p>
 
 <h2>Methods</h2>