3e99f46ae896bf28c1fe300b5176ded59a73632e dschmelt Thu Apr 21 16:53:24 2022 -0700 Minor edits to eva track, refs #29031 diff --git src/hg/makeDb/trackDb/evaSnp.html src/hg/makeDb/trackDb/evaSnp.html index b9e784f..5f384b5 100644 --- src/hg/makeDb/trackDb/evaSnp.html +++ src/hg/makeDb/trackDb/evaSnp.html @@ -1,90 +1,92 @@

Description

This track contains mappings of single nucleotide variants and small insertions and deletions (indels) — collectively Simple Nucleotide Variants (SNVs) — from the European Variation Archive (EVA) Release 3 for the $organism $db genome. The dbSNP database at NCBI no longer hosts non-human variants. -

Interpreting and Configuring the Graphical Display

Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a - block that spans two or more bases. + block that spans two or more bases. The display is set to collapse to dense visability + when there are more than 100k variants in the window. When the window size is more than 1M bp, + the display is switched to density graph mode.

Searching, details and filtering

- -

Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser.

Searching, details, and filtering

+Navigation to an individual variant can be accomplished by typing or copying +the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the +Browser.

A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator.

Variants can be filtered using the track controls to show subsets of the data by either EVA SO term, UCSC-generated functional annotation, or -by color, which bins the UCSC functional annotations into general classes. -

Variants can be filtered using the track controls to show subsets of the +data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or +by color, which bins the UCSC functional effects into general classes.

Mouse-over

Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and -the aaChange when one is available, which is the change in ammino acid in HGVS.p +the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes.

Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms the gene models used were ncbiRefSeqCurated, except for mm39 which used ncbiRefSeqSelect.

Track colors

Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below.

Color Variant Type

Protein-altering variants and splice site variants

Synonymous codon variants

Non-coding transcript or Untranslated Region (UTR) variants

Intergenic and intronic variants

Color	Variant Type
	Protein-altering variants and splice site variants
	Synonymous codon variants
	Non-coding transcript or Untranslated Region (UTR) variants
	Intergenic and intronic variants

Sequence ontology

Sequence ontology (SO)

Variants are classified by EVA into one of the following sequence ontology terms:

substitution — A single nucleotide in the reference is replaced by another, alternate allele
deletion — One or more nucleotides is deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G.
insertion — @@ -99,61 +101,64 @@
multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC.
sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet.

Methods

Data were downloaded from the European Variation Archive EVA release 3 (2022-02-24) current_ids.vcf.gz files corresponding to the proper assembly.

-Chromosome names were converted to UCSC-style, and the variants passed through the +Chromosome names were converted to UCSC-style, a few problematic variants were removed, +and the variants passed through the Variant Annotation Integrator in order to predict consequence. For every organism the ncbiRefSeqCurated gene models were used to predict the consequences, except for mm39 which used the ncbiRefSeqSelect models.

Variants were then colored according to their predicted consequence in the following fashion:

Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation
Synonymous codon variants - synonymous_variant, stop_retained_variant
Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant
Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration

Sequence Ontology ("SO:") -terms were converted to the variant classes, then the files were converted to BED +terms were converted to the variant classes, then the files were converted to BED, and then bigBed format.

No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). -

+For complete documentation of the processing of these tracks, read the + +EVA Release 3 MakeDoc.

Data Access

The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called evaSnp.bb. Individual regions or the whole genome annotation can be obtained using our tool