711684a07793b68eabf1281c5a6cca8bd6fa2183 lrnassar Wed Jul 24 16:35:34 2024 -0700 Staging evaSnp6 track refs #34119 diff --git src/hg/makeDb/trackDb/evaSnpContainer.html src/hg/makeDb/trackDb/evaSnpContainer.html new file mode 100644 index 0000000..7d7add9 --- /dev/null +++ src/hg/makeDb/trackDb/evaSnpContainer.html @@ -0,0 +1,215 @@ +<h2>Description</h2> +<p> +These tracks contain mappings of single nucleotide variants +and small insertions and deletions (indels) +from the European Variation Archive +(<a href="https://www.ebi.ac.uk/eva/" target="_blank">EVA</A>) +for the $organism $db genome. The dbSNP database at NCBI no longer +hosts non-human variants. +</p> + +<h2>Interpreting and Configuring the Graphical Display</h2> +<p> +Variants are shown as single tick marks at most zoom levels. +When viewing the track at or near base-level resolution, the displayed +width of the SNP variant corresponds to the width of the variant in the +reference sequence. Insertions are indicated by a single tick mark displayed +between two nucleotides, single nucleotide polymorphisms are displayed as the +width of a single base, and multiple nucleotide variants are represented by a +block that spans two or more bases. The display is set to automatically collapse to +dense visibility when there are more than 100k variants in the window. +When the window size is more than 250k bp, the display is switched to density graph mode. +</p> + +<h3>Searching, details, and filtering</h3> +<p> +Navigation to an individual variant can be accomplished by typing or copying +the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the +Browser.</p> + +<p> +A click on an item in the graphical display displays a page with data about +that variant. Data fields include the Reference and Alternate Alleles, the +class of the variant as reported by EVA, the source of the data, the amino acid +change, if any, and the functional class as determined by UCSC's Variant Annotation +Integrator. +</p> + +<p>Variants can be filtered using the track controls to show subsets of the +data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or +by color, which bins the UCSC functional effects into general classes.</p> + +<h3>Mouse-over</h3> +<p> +Mousing over an item shows the ucscClass, which is the consequence according to the +<a target="_blank" href="/cgi-bin/hgVai">Variant Annotation Integrator</a>, and +the aaChange when one is available, which is the change in amino acid in HGVS.p +terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over +in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID +separated by spaces describing all possible AA changes.</p> +<p> +Multiple items may appear due to different variant predictions on multiple gene transcripts. +For all organisms the gene models used were the NCBI RefSeq curated when available, if not then +ensembl genes, or finally UCSC mappings of RefSeq if neither of the previous models was possible. +</p> + +<h3>Track colors</h3> + +<p> +Variants are colored according to the most potentially deleterious functional effect prediction +according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section +below. +</p> + +<p> +<table cellpadding='2'> + <thead><tr> + <th style="border-bottom: 2px solid;">Color</th> + <th style="border-bottom: 2px solid;">Variant Type</th> + </tr></thead> + <tr><td style="background-color: red"></td><td>Protein-altering variants and splice site variants</td></tr> + <tr><td style="background-color: green"></td><td>Synonymous codon variants</td></tr> + <tr><td style="background-color: blue"></td><td>Non-coding transcript or Untranslated Region (UTR) variants</td></tr> + <tr><td style="background-color: black"></td><td>Intergenic and intronic variants</td></tr> +</table> +</p> + +<h3>Sequence ontology (SO)</h3> + +<p> +Variants are classified by EVA into one of the following <a target="_blank" +href="http://www.sequenceontology.org/">sequence ontology</a> terms: +</p> + +<ul> + <li> <b>substitution</b> — + A single nucleotide in the reference is replaced by another, alternate allele + <li> <b>deletion</b> — + One or more nucleotides is deleted. The representation in the database is to + display one additional nucleotide in both the Reference field (Ref) and the + Alternate Allele field (Alt). E.g. a variant that is a deletion of an A + maybe be represented as Ref = GA and Alt = G. + <li> <b>insertion</b> — + One or more nucleotides is inserted. The representation in the database is to + display one additional nucleotide in both the Reference field (Ref) and the + Alternate Allele field (Alt). E.g. a variant that is an insertion of a T maybe + be represented as Ref = G and Alt = GT + <li> <b>delins</b> — + Similar to tandemRepeat, in that the runs of Ref and Alt Alleles are of + different length, except that there is more than one type of nucleotide, + e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. + <li> <b>multipleNucleotideVariant</b> — + More than one nucleotide is substituted by an equal number of different + nucleotides, e.g., Ref = AA, Alt = GC. + <li> <b>sequence alteration</b> — + A parent term meant to signify a deviation from another sequence. Can be + assigned to variants that have not been characterized yet. +</ul> +</p> + +<h2>Methods</h2> +<p> +Data were downloaded from the European Variation Archive EVA +<a href="https://ftp.ebi.ac.uk/pub/databases/eva/rs_releases/" +target="_blank">current_ids.vcf.gz</a> files corresponding to the proper assembly.</p> +<p> +Chromosome names were converted to UCSC-style +and the variants passed through the +<a target="_blank" href="/cgi-bin/hgVai">Variant Annotation Integrator</a> to +predict consequence. For every organism the NCBI RefSeq curated models were used when available, +followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models +were possible.</p> +<p> +Variants were then colored according to their predicted consequence in the following fashion: +<ul> +<li><b><font color=red>Protein-altering variants</font></b> and + <b><font color=red> splice site variants</font></b> +- exon_loss_variant, frameshift_variant, +inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, +splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, +stop_lost, coding_sequence_variant, transcript_ablation</li> +<li><b><font color=green>Synonymous codon variants</font></b> +- synonymous_variant, stop_retained_variant</li> +<li><b><font color=blue>Non-coding transcript </font></b> or + <b><font color=blue>Untranslated Region (UTR) variants</font></b> +- 5_prime_UTR_variant, +3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant</li> +<li><b>Intergenic and intronic variants</b> - upstream_gene_variant, downstream_gene_variant, +intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration</li></ul> +</p> + +<p> +Sequence Ontology ("<a href="http://www.sequenceontology.org/browser/current_release" +target="_blank">SO</a>:") +terms were converted to the variant classes, then the files were converted to BED, +and then bigBed format. +</p> +<p> +No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). +These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). +Amino-acid substitutions for missense variants are based +on RefSeq alignments of mRNA transcripts, which do not always match the amino acids +predicted from translating the genomic sequence. Therefore, in some instances, the +variant and the genomic nucleotide and associated amino acid may be reversed. +E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from +the persepective the genomic sequence. Also, in bosTau9, galGal5, rheMac8, +danRer10 and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. +For complete documentation of the processing of these tracks, see the makedoc corresponding +to the version of interest. For example, the +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/evaSnp6.txt"> +EVA Release 6 MakeDoc</a>.</p> + +<h2>Data Access</h2> +<p> +<b>Note:</b> It is not recommeneded to use LiftOver to convert SNPs between assemblies, +and more information about how to convert SNPs between assemblies can be found on the following +<a href="/FAQ/FAQreleases.html#snpConversion">FAQ entry</a>.</p> +<p> +The data can be explored interactively with the <a href="/cgi-bin/hgTables">Table Browser</a>, +or the <a href="/cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis, the data may be +queried from our <a href="/goldenPath/help/api.html">REST API</a>. Please refer to our +<a href="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome">mailing list archives</a> +for questions, or our <a href="/FAQ/FAQdownloads.html#download36">Data Access FAQ</a> for more +information.</p> + +<p> +For automated download and analysis, this annotation is stored in a bigBed file that +can be downloaded from <a href="https://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/" +target="_blank">our download server</a>. Use the corresponding version number for the track +of interest, e.g. <tt>evaSnp6.bb</tt>. +Individual regions or the whole genome annotation can be obtained using our tool +<tt>bigBedToBed</tt> which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +<a href="https://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. +The tool can also be used to obtain only features within a given range, e.g. +<br><br> +<tt>bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/evaSnp6.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt> +</p> + +<h2>Credits</h2> +<p> +This track was produced from the <a target="_blank" href="https://www.ebi.ac.uk/eva/">European +Variation Archive release</a> data. Consequences were predicted using UCSC's Variant Annotation +Integrator and NCBI's RefSeq as well as ensembl gene models. +</p> + +<h2>References</h2> +<p> +Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, +Tsukanov K, Venkataraman S <em>et al.</em> <a href="https://doi.org/10.1093/nar/gkab960" +target="_blank">The European Variation Archive: a FAIR resource of genomic variation for all +species</a>. <em>Nucleic Acids Res.</em> 2021 Oct 28:gkab960. +<a href="https://doi.org/10.1093/nar/gkab960" target="_blank">doi:10.1093/nar/gkab960</a>. +Epub ahead of print. PMID: <a href="https://pubmed.ncbi.nlm.nih.gov/34718739/" +target="_blank">34718739</a>. PMID: <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/pmc8728205/" +target="_blank">PMC8728205</a>. +</p> +<p> +Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, +Haussler D, Kent WJ. +<a href="https://academic.oup.com/bioinformatics/article/32/9/1430/1744314/" +target="_blank">UCSC Data Integrator and Variant Annotation Integrator</a>. +<em>Bioinformatics</em>. 2016 May 1;32(9):1430-2. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/26740527" target="_blank">26740527</a>; PMC: +<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4848401/" target="_blank">PMC4848401</a> +</p>