b7b279e5f1240419fb3a408fff5c82a998a36e76 max Wed Jun 3 08:17:09 2026 -0700 EVE track QA fixes: fix .as field descriptions, clarify protein count in HTML - thickStart/thickEnd .as descriptions corrected (whole protein span, not start/stop codon) - reserved .as description corrected (always 0, not itemRgb) - HTML Methods section: explain that 2,951 of 3,219 proteins have released VCF scores Co-Authored-By: Claude Sonnet 4.6 diff --git src/hg/makeDb/trackDb/human/eve.html src/hg/makeDb/trackDb/human/eve.html index 1ae02d9f474..294e9ab26c5 100644 --- src/hg/makeDb/trackDb/human/eve.html +++ src/hg/makeDb/trackDb/human/eve.html @@ -36,31 +36,32 @@

Hovering over a cell shows the wildtype amino acid, the protein position, the variant amino acid, the EVE score, and the Class25 classification (benign, uncertain, or pathogenic using a 25% uncertainty threshold).

For reverse-strand genes, protein positions are displayed left to right in genomic order (C-terminus to N-terminus on the screen), consistent with the standard genome browser orientation.

Methods

EVE trains a Bayesian variational autoencoder (VAE) separately for each of 3,219 -disease-associated human proteins. For each protein, a multiple sequence alignment is +disease-associated human proteins; of these, 2,951 have missense VCF scores in the +public bulk download (the remainder have sequence alignments but no released scores). For each protein, a multiple sequence alignment is retrieved by searching roughly 250 million protein sequences from UniRef, and the VAE learns the distribution of amino acid sequences across species, capturing both per-position conservation and co-evolutionary dependencies between positions. An evolutionary index for each single amino acid variant is then computed as the approximate negative log-likelihood ratio of the variant versus the wildtype sequence, estimated by sampling from the VAE posterior (ensembled over five independently trained models). A global-local mixture of Gaussian mixture models, fit to the index distributions across all variants and all proteins, converts this continuous index to an EVE score between 0 (benign) and 1 (pathogenic) and assigns each variant to a benign, uncertain, or pathogenic class. The uncertainty of each classification reflects the predictive entropy of the mixture model, and a threshold on this entropy controls what fraction of variants is labeled uncertain. The Class25 field used in the track mouseovers classifies variants using a 25% uncertainty threshold, which the authors report yields approximately 90% accuracy on known ClinVar labels. See Frazer et al. 2021 for full details.

@@ -76,39 +77,39 @@ makedoc file, and the conversion script is available in our GitHub repository. Two proteins (G6PT1, UniProt O43826; and MAFIP, Q8WZ33) were excluded because their VCF coordinates mapped to assembly scaffolds (chrCHR_HG2217_PATCH and chrGL000194.1) absent from the standard hg38 assembly. The remaining 2,949 proteins covering approximately 1.7 million amino acid positions are included in this track.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-separated tables. From scripts, the data can be accessed through our -API, track=eve.

+API, track=eve.

For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called eve.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found -here. +here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/eve/eve.bb -chrom=chr17 -start=43000000 -end=43200000 stdout

The original annotation source data can be downloaded from https://evemodel.org/download/bulk.

Credits

Thanks to Jonathan Frazer, Pascal Notin, Mafalda Dias, and Debora S. Marks at Harvard Medical School and Yarin Gal at the University of Oxford for making the EVE scores publicly available at evemodel.org.

Methods

Data Access

Credits

References