| Color |
@@ -75,134 +75,134 @@
Sequence Ontology (SO)
Variants are classified by EVA into one of the following sequence ontology terms:
- substitution —
A single nucleotide in the reference is replaced by another, alternate allele.
- deletion —
One or more nucleotides are deleted. The representation in the database is to
display one additional nucleotide in both the Reference field (Ref) and the
- Alternate Allele field (Alt). E.g. a variant that is a deletion of an A
- maybe be represented as Ref = GA and Alt = G.
+ Alternate Allele field (Alt). E.g., a variant that is a deletion of an A
+ may be represented as Ref = GA and Alt = G.
- insertion —
One or more nucleotides are inserted. The representation in the database is to
display one additional nucleotide in both the Reference field (Ref) and the
- Alternate Allele field (Alt). E.g. a variant that is an insertion of a T may
+ Alternate Allele field (Alt). E.g., a variant that is an insertion of a T may
be represented as Ref = G and Alt = GT.
- delins —
Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of
different length, except that there is more than one type of nucleotide,
e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC.
- multipleNucleotideVariant —
More than one nucleotide is substituted by an equal number of different
nucleotides, e.g., Ref = AA, Alt = GC.
- sequence alteration —
A parent term meant to signify a deviation from another sequence. Can be
assigned to variants that have not been characterized yet.
Methods
Data were downloaded from the European Variation Archive EVA
current_ids.vcf.gz files corresponding to the proper assembly.
-Chromosome names were converted to UCSC-style
-and the variants passed through the
+Chromosome names were converted to UCSC-style,
+and the variants were passed through the
Variant Annotation Integrator to
-predict consequence. For every organism the NCBI RefSeq curated models were used when available,
+predict consequence. For every organism, the NCBI RefSeq curated models were used when available,
followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models
were possible.
Variants were then colored according to their predicted consequence in the following fashion:
- Protein-altering variants and
splice site variants
- exon_loss_variant, frameshift_variant,
inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant,
splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained,
stop_lost, coding_sequence_variant, transcript_ablation
- Synonymous codon variants
- synonymous_variant, stop_retained_variant
- Non-coding transcript or
Untranslated Region (UTR) variants
- 5_prime_UTR_variant,
3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant
- Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant,
intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration
Sequence Ontology ("SO:")
terms were converted to the variant classes, then the files were converted to BED,
-and then bigBed format.
+and then to bigBed format.
No functional annotations were provided by the EVA (e.g., missense, nonsense, etc).
These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016).
Amino-acid substitutions for missense variants are based
on RefSeq alignments of mRNA transcripts, which do not always match the amino acids
predicted from translating the genomic sequence. Therefore, in some instances, the
variant and the genomic nucleotide and associated amino acid may be reversed.
E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from
the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10,
-and danRer11 the mitochondrial sequence was removed or renamed to match UCSC.
+and danRer11, the mitochondrial sequence was removed or renamed to match UCSC.
For complete documentation of the processing of these tracks, see the makedoc corresponding
to the version of interest. For example, the
EVA Release 8 MakeDoc.
Data Access
Note: It is not recommended to use LiftOver to convert SNPs between assemblies,
and more information about how to convert SNPs between assemblies can be found on the following
FAQ entry.
-The data can be explored interactively with the Table Browser,
+The data can be explored interactively with the Table Browser
or the Data Integrator. For automated analysis, the data may be
queried from our REST API. Please refer to our
mailing list archives
-for questions, or our Data Access FAQ for more
+for questions or our Data Access FAQ for more
information.
For automated download and analysis, this annotation is stored in a bigBed file that
can be downloaded from our download server. Use the corresponding version number for the track
-of interest, e.g. evaSnp8.bb.
+of interest, e.g., evaSnp8.bb.
Individual regions or the whole genome annotation can be obtained using our tool
-bigBedToBed which can be compiled from the source code or downloaded as a precompiled
+bigBedToBed, which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tool can also be used to obtain only features within a given range, e.g.
bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/evaSnp8.bb -chrom=chr21 -start=0 -end=100000000 stdout
Credits
This track was produced from the European
Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation
-Integrator and NCBI's RefSeq as well as ensembl gene models.
+Integrator and NCBI's RefSeq, as well as ensembl gene models.
References
Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF,
Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all
species. Nucleic Acids Res. 2021 Oct 28:gkab960.
doi:10.1093/nar/gkab960.
Epub ahead of print. PMID: 34718739. PMID: PMC8728205.
Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS,