de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 lrnassar Tue Apr 21 18:22:59 2026 -0700 PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover. Variant analysts typically work at the nucleotide level, and the current item label (amino acid change) collapses distinguishable variants: ~17% of items share their (chrom, pos, AA-change) tuple with another item because of codon degeneracy (e.g. three C>A, C>G, C>T at the same position can all appear as "M>I"). Labeling by nucleotide change makes every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on hg19 from overlapping transcripts). - primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)"; new field aaChange (placed before ref/alt) holds the amino acid change. - primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column, and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and a colored prediction string. - primateAi.ra: add labelFields name,aaChange and defaultLabelFields name so users can toggle the on-feature label between nt change (default) and AA change. - primateAi.html: expand Display Conventions with the label-convention rationale and a legend for each mouseover field. refs #37274 diff --git src/hg/makeDb/trackDb/human/primateAi.html src/hg/makeDb/trackDb/human/primateAi.html index afd3c3ed330..efbefed0947 100644 --- src/hg/makeDb/trackDb/human/primateAi.html +++ src/hg/makeDb/trackDb/human/primateAi.html @@ -7,43 +7,61 @@ across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants. </p> <p> The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes. </p> <h2>Display Conventions</h2> <p> Each variant is colored <span style="color:blue">blue (benign)</span> or -<span style="color:red">red (pathogenic)</span> based on the raw score. -The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score, -where higher values indicate greater predicted pathogenicity. -Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction. -Items can be filtered by prediction (benign/pathogenic) and by percentile score. +<span style="color:red">red (pathogenic)</span> based on the Illumina-provided +<b>Prediction</b> field. Because the three possible alternate bases at a given +position sometimes produce the same amino acid change (codon degeneracy), +each item is labeled by default with its <b>nucleotide change</b> (e.g. <code>C>T</code>) +rather than its amino acid change. The label can be switched to the amino acid +change via the "Label fields" control in the Track Settings. </p> <p> -Score interpretation: raw scores range from 0 to 1, with higher values indicating greater -predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for -distinguishing pathogenic from benign missense variants. The percentile field shows -where a variant's score ranks relative to all other scored variants. 75% of variants -are classified as benign, 25% as pathogenic. +Hovering over a variant shows: +</p> +<ul> + <li><b>Var</b> — the nucleotide substitution on the + strand + (reference > alternate)</li> + <li><b>AA</b> — the resulting amino acid change + (single-letter reference > alternate)</li> + <li><b>Score</b> — the raw PrimateAI-3D pathogenicity score (0–1). + The authors suggest a clinical threshold of 0.821 for distinguishing + pathogenic from benign missense variants.</li> + <li><b>Perc</b> — the percentile rank of the raw score across all + scored variants (0–1). The track score field (0–1000) is this + value scaled by 1000.</li> + <li><b>Pred</b> — Illumina's binary call: + <span style="color:#0000c8">benign</span> or + <span style="color:#c80000">pathogenic</span>. In the track as + distributed, about 75% of variants are benign and 25% are pathogenic.</li> +</ul> + +<p> +Items can be filtered by prediction (benign/pathogenic), by raw PrimateAI-3D +score, or by percentile. </p> <h2>Data Access</h2> <p> Due to the data license, the Table Browser, Data Integrator, and the REST API's <code>getData</code> endpoint are disabled for this track. The source data can be downloaded from the <a href="https://primateai3d.basespace.illumina.com/" target="_blank">PrimateAI-3D website</a> (requires registration). The primate variant database is available at <a href="https://primad.basespace.illumina.com/" target="_blank">PrimAD</a>. Our <a href="hgTrackUi?db=hg38&g=cons447way">Zoonomia 447-way Mammal/Primate</a> alignment track displays the primate variants used in training PrimateAI-3D. </p> <h2>Methods</h2>