b7b279e5f1240419fb3a408fff5c82a998a36e76
max
Wed Jun 3 08:17:09 2026 -0700
EVE track QA fixes: fix .as field descriptions, clarify protein count in HTML
- thickStart/thickEnd .as descriptions corrected (whole protein span, not start/stop codon)
- reserved .as description corrected (always 0, not itemRgb)
- HTML Methods section: explain that 2,951 of 3,219 proteins have released VCF scores
Co-Authored-By: Claude Sonnet 4.6
Hovering over a cell shows the wildtype amino acid, the protein position, the variant amino
acid, the EVE score, and the Class25 classification (benign, uncertain, or pathogenic using
a 25% uncertainty threshold).
For reverse-strand genes, protein positions are displayed left to right in genomic
order (C-terminus to N-terminus on the screen), consistent with the standard genome
browser orientation.
EVE trains a Bayesian variational autoencoder (VAE) separately for each of 3,219
-disease-associated human proteins. For each protein, a multiple sequence alignment is
+disease-associated human proteins; of these, 2,951 have missense VCF scores in the
+public bulk download (the remainder have sequence alignments but no released scores). For each protein, a multiple sequence alignment is
retrieved by searching roughly 250 million protein sequences from UniRef, and the VAE
learns the distribution of amino acid sequences across species, capturing both
per-position conservation and co-evolutionary dependencies between positions. An
evolutionary index for each single amino acid variant is then computed as the
approximate negative log-likelihood ratio of the variant versus the wildtype sequence,
estimated by sampling from the VAE posterior (ensembled over five independently trained
models). A global-local mixture of Gaussian mixture models, fit to the index distributions
across all variants and all proteins, converts this continuous index to an EVE score
between 0 (benign) and 1 (pathogenic) and assigns each variant to a benign, uncertain, or
pathogenic class. The uncertainty of each classification reflects the predictive entropy of
the mixture model, and a threshold on this entropy controls what fraction of variants is
labeled uncertain. The Class25 field used in the track mouseovers classifies variants using
a 25% uncertainty threshold, which the authors report yields approximately 90% accuracy on
known ClinVar labels. See Frazer et al. 2021 for full details.
Methods
The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-separated tables. From scripts, the data can be accessed through our -API, track=eve.
+API, track=eve.For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called eve.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found -here. +here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/eve/eve.bb -chrom=chr17 -start=43000000 -end=43200000 stdout
The original annotation source data can be downloaded from https://evemodel.org/download/bulk.
Thanks to Jonathan Frazer, Pascal Notin, Mafalda Dias, and Debora S. Marks at Harvard Medical School and Yarin Gal at the University of Oxford for making the EVE scores publicly available at evemodel.org.