98f451794d0c9571ff8bd9433206041295ecfe2c angie Fri Mar 13 09:33:14 2020 -0700 Expand the Data Access section to provide more details and examples for frequent MLQs. refs #25095 diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html index 651ba9b..4565c43 100644 --- src/hg/makeDb/trackDb/human/dbSnp153Composite.html +++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html @@ -421,44 +421,169 @@ We downloaded dbSNP's JSON files available from <a href="ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/" target=_blank>ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/</a>, extracted a subset of the information about each variant, and collated it into a bigBed file using the <a href="https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/lib/bigDbSnp.as" target=_blank>bigDbSnp.as</a> schema with the information necessary for filtering and displaying the variants, as well as a separate file containing more detailed information to be displayed on each variant's details page (<a href="https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/lib/dbSnpDetails.as" target=_blank>dbSnpDetails.as</a> schema). <h2>Data Access</h2> <p> -The raw data underlying the UCSC Genome Browser track can be explored interactively with the -<a href="../../cgi-bin/hgTables" target=_blank>Table Browser</a> or -<a href="../../cgi-bin/hgIntegrator" target=_blank>Data Integrator</a>. +Since dbSNP has grown to include approximately 700 million variants, the size of the All dbSNP (153) +subtrack can cause the +<a href="../../cgi-bin/hgTables" target=_blank>Table Browser</a> and +<a href="../../cgi-bin/hgIntegrator" target=_blank>Data Integrator</a> +to time out, leading to a blank page or truncated output, +unless queries are restricted to a chromosomal region or +to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser), +or to one of the subset tracks such as Common (~15 million variants) or ClinVar (~0.5M variants). +</p><p> For automated analysis, the track data files can be downloaded from the downloads server for -<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a> and -<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> -(dbSnp153.bb); the detailed variant properties can be downloaded from -<a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/" target=_blank>hgFixed</a> -(dbSnp153Details.tab.gz). +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> and +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a>. +<table class="descTbl"> + <tr> + <th colspan=3>file</th> + <th>format</th> + <th>subtrack</th> + </tr> + <tr> + <td>dbSnp153.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>All dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153ClinVar.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153ClinVar.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153ClinVar.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>ClinVar dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153Common.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Common.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Common.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>Common dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153Mult.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Mult.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Mult.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>Mult. dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153BadCoords.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153BadCoords.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153BadCoords.bb" + target=_blank>hg38</a></td> + <td>bigBed4</td> + <td>Map Err (153)</td> + </tr> + <tr> + <td colspan=3> + <a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz" + target=_blank>dbSnp153Details.tab.gz</a> + </td> + <td>gzip-compressed tab-separated text</td> + <td>Detailed variant properties, independent of genome assembly version</td> + </tr> +</table> +</p> +<p> +Several utilities for working with bigBed-formatted binary files can be downloaded +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads" + target=_blank>here</a>. +Run a utility with no arguments in order to see a brief description of the utility and its options. +<ul> + <li><b>bigBedInfo</b> provides summary statistics about a bigBed file including the number of + items in the file. With the <b>-as</b> option, the output includes an + autoSql + definition of data columns, useful for interpreting the column values.</li> + <li><b>bigBedToBed</b> converts the binary bigBed data to tab-separated text. + Output can be restricted to a particular region by using the -chrom, -start + and -end options.</li> + <li><b>bigBedNamedItems</b> extracts rows for one or more rs# IDs.</li> +</ul> +</p> + +<h4>Example: retrieve all variants in the region chr1:200001-200400</h4> + +<pre><tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre> + +<h4>Example: retrieve variant rs6657048</h4> + +<pre><tt>bigBedNamedItems dbSnp153.bb rs6657048 stdout</tt></pre> + +<h4>Example: retrieve all variants with rs# IDs in file myIds.txt</h4> + +<pre><tt>bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed</tt></pre> + +<p> +The columns in the bigDbSnp/bigBed files and dbSnp153Details.tab.gz file are described in +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bigDbSnp.as" + target=_blank>bigDbSnp.as</a> and +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/dbSnpDetails.as" + target=_blank>dbSnpDetails.as</a> respectively. +For columns that contain lists of allele frequency data, the order of projects +providing the data listed is as follows: +<ol> + <li><a href="https://www.internationalgenome.org/" target=_blank>1000Genomes</a></li> + <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD exomes</a></li> + <li><a href="https://www.nhlbiwgs.org/" target=_blank>TOPMED</a></li> + <li><a href="http://exac.broadinstitute.org/" target=_blank>ExAC</a></li> + <li><a href="https://www.pagestudy.org/" target=_blank>PAGE STUDY</a></li> + <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD genomes</a></li> + <li><a href="https://esp.gs.washington.edu/" target=_blank>GoESP</a></li> + <li><a href="https://www.geenivaramu.ee/en" target=_blank>Estonian</a></li> + <li><a href="http://www.bris.ac.uk/alspac/participants/genome/" target=_blank>ALSPAC</a></li> + <li><a href="https://twinsuk.ac.uk/" target=_blank>TWINSUK</a></li> + <li><a href="https://swefreq.nbis.se/dataset/SweGen" target=_blank>NorthernSweden</a></li> + <li><a href="https://genomes.vn" target=_blank>Vietnamese</a></li> +</ol> +</p><p> +UCSC also has an +<a href="../goldenPath/help/api.html" + target=_blank>API</a> +that can be used to retrieve values from a particular chromosome range. +</p><p> +A list of rs# IDs can be pasted/uploaded in the +<a href="hgVai" target=_blank>Variant Annotation Integrator</a> +tool in order to find out which genes (if any) the variants are located in, +as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc. </p><p> -Please refer to our +Please refer to our searchable <A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/download+snps" target=_blank>mailing list archives</a> -for questions and example queries, or our +for more questions and example queries, or our <a HREF="../FAQ/FAQdownloads.html#download36" target=_blank>Data Access FAQ</a> for more information. </p> <h2>References</h2> <p> Holmes JB, Moyer E, Phan L, Maglott D, Kattman B. <a href="https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btz856" target="_blank"> SPDI: Data Model for Variants and Applications at NCBI</a>. <em>Bioinformatics</em>. 2019 Nov 18;. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31738401" target="_blank">31738401</a> </p> <p>