9a31f94233618b5b6c16814d550efddc226d5340 kuhn Thu Oct 28 16:29:22 2021 -0700 minor wording changes diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html index 09a717b..5398446 100644 --- src/hg/makeDb/trackDb/human/dbSnp153Composite.html +++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html @@ -1,34 +1,34 @@
This track shows short genetic variants (up to approximately 50 base pairs) from dbSNP build 153: single-nucleotide variants (SNVs), -small insertions, deletions, and complex deletion/insertions, +small insertions, deletions, and complex deletion/insertions (indels), relative to the reference genome assembly. Most variants in dbSNP are rare, not true polymorphisms, and some variants are known to be pathogenic.
For hg38 (GRCh38), approximately 667 million distinct variants (RefSNP clusters with rs# ids) -have been mapped to over 702 million genomic locations +have been mapped to more than 702 million genomic locations including alternate haplotype and fix patch sequences. dbSNP remapped variants from hg38 to hg19 (GRCh37); approximately 658 million distinct variants were mapped to -over 683 million genomic locations +more than 683 million genomic locations including alternate haplotype and fix patch sequences (not all of which are included in UCSC's hg19).
This track includes four subtracks of variants:
SNVs and pure deletions are displayed as boxes covering the affected base(s). Pure insertions are drawn as single-pixel tickmarks between the base before and the base after the insertion.
Insertions and/or deletions in repetitive regions may be represented by a half-height box showing uncertainty in placement, followed by a full-height box showing the number of deleted bases, or a full-height tickmark to indicate an insertion. When an insertion or deletion falls in a repetitive region, the placement may be ambiguous. For example, if the reference genome contains "TAAAG" but some individuals have "TAAG" at the same location, then the variant is a deletion of a single A relative to the reference genome. However, which A was deleted? There is no way to tell whether the first, second or third A was removed. Different variant mapping tools may place the deletion at different bases in the reference genome. -In order to reduce errors in merging variant calls made with different left vs. right biases, +To reduce errors in merging variant calls made with different left vs. right biases, dbSNP made a major change in its representation of deletion/insertion variants in build 152. Now, instead of assigning a single-base genomic location at one of the A's, dbSNP expands the coordinates to encompass the whole repetitive region, so the variant is represented as a deletion of 3 A's combined with an insertion of 2 A's. In the track display, there will be a half-height box covering the first two A's, -followed by a full-height box covering the third A, in order to show a net loss of one base +followed by a full-height box covering the third A, to show a net loss of one base but an uncertain placement within the three A's.
Variants are colored according to functional effect on genes annotated by dbSNP:
Protein-altering variants and splice site variants are
red.
Synonymous codon variants are
green.
Non-coding transcript or Untranslated Region (UTR) variants are
blue.
@@ -106,31 +106,31 @@ major/minor alleles (when available) and minor allele frequency (when available). Allele frequencies are reported independently by twelve projects (some of which may have overlapping sets of samples):
dbSNP has collected genetic variant reports from researchers worldwide for over 20 years. + target=_blank>more than 20 years. Since the advent of next-generation sequencing methods and the population sequencing efforts that they enable, dbSNP has grown exponentially, requiring a new data schema, computational pipeline, web infrastructure, and download files. (Holmes et al.) The same challenges of exponential growth affected UCSC's presentation of dbSNP variants, so we have taken the opportunity to change our internal representation and import pipeline. Most notably, flanking sequences are no longer provided by dbSNP, -since most submissions have been genomic variant calls in VCF format as opposed to +because most submissions have been genomic variant calls in VCF format as opposed to independent sequences.
-We downloaded dbSNP's JSON files available from +We downloaded JSON files available from dbSNP at ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/, extracted a subset of the information about each variant, and collated it into a bigBed file using the bigDbSnp.as schema with the information necessary for filtering and displaying the variants, as well as a separate file containing more detailed information to be displayed on each variant's details page (dbSnpDetails.as schema).
Since dbSNP has grown to include approximately 700 million variants, the size of the All dbSNP (153) @@ -498,31 +498,31 @@
Several utilities for working with bigBed-formatted binary files can be downloaded here. -Run a utility with no arguments in order to see a brief description of the utility and its options. +Run a utility with no arguments to see a brief description of the utility and its options.
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout@@ -553,31 +553,31 @@
UCSC also has an API that can be used to retrieve values from a particular chromosome range.
A list of rs# IDs can be pasted/uploaded in the Variant Annotation Integrator -tool in order to find out which genes (if any) the variants are located in, +tool to find out which genes (if any) the variants are located in, as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc.
Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information.