8f68c0d267b2581bf16895528ab8064ce1b377cc lrnassar Thu Jun 25 11:56:12 2020 -0700 Adding new SNP download section to link from hgTables refs #25775 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index e0cb451..3e1ca8e 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -33,30 +33,31 @@ <li><a href="#download25">Mapping chimp chromosome numbers to human chromosomes numbers</a></li> <li><a href="#download28">Converting genome coordinates between assemblies</a></li> <li><a href="#download33">Linking gene name with accession number</a></li> <li><a href="#download31">Obtaining a list of Known Genes</a></li> <li><a href="#download16">Repeat-masking data</a></li> <li><a href="#download17">Availability of repeat-masked data</a></li> <li><a href="#download24">RepeatMasker version differences - UCSC vs. Repeatmasker website</a></li> <li><a href="#download18">Obtaining promoter sequence</a></li> <li><a href="#download19">Data from Evolutionary Conservation Score tracks</a></li> <li><a href="#download20">Minus strand coordinates - axtNet files</a></li> <li><a href="#download21">Mapping UCSC STS marker IDS to those of other groups</a></li> <li><a href="#download22">deCODE map data</a></li> <li><a href="#download29">Direct MariaDB (MySQL) access to data</a></li> <li><a href="#download34">Name of fourth column in BED output</a></li> <li><a href="#download36">Track data access</a></li> +<li><a href="#snp">How do I download dbSNP data?</a></li> <li><a href="#download37">Known issues with Table Browser GTF output</a></li> <li><a href="#download38">Table Browser output file not ordered</a></li> <li><a href="#download39">'Permisssion denied' error when trying to use command-line utilities</a></li> <li><a href="#download40">Restricted Track Data</a></li> <li><a href="#downloadAnalysis">What is the genome analysis set?</a></li> </ul> <hr> <p> <a href="index.html">Return to FAQ Table of Contents</a></p> <a name="download1"></a> <h2>Downloading sequence and annotation data</h2> <h6>How do I obtain the sequence and/or annotation data for a release?</h6> <p> Sequence and annotation data downloads are usually made available within the first week of the @@ -890,36 +891,157 @@ for downloading source code and binaries can be found <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool can also be used to obtain only features within a given range using one of the hgdownload servers, example:</p> <ul> <li> North American server: <pre><code>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> </li> <li> European server: <pre><code>bigBedToBed http://hgdownload-euro.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> </li> </ul> <p> -<strong>SNP data:</strong> If queries against the SNP table on one of our public MariaDB servers or on your -own MariaDB installation are slow, then they can be sped up by using the "bin" field; you -can <a href="../contacts.html">contact us</a> for more information.</p> +<a name="snp"></a> +<h2>How do I download dbSNP data?</h2> +<p> +For versions dbSNP153 and above, the data is formatted in bigBed files. Previous versions are MySQL +tables. For help with versions before dbSNP153, see <a href="#download29">accessing MySQL data</a>. +This FAQ entry pertains to versions dbSNP153 and above.</p> +<p> +Since dbSNP has grown to include over 700 million variants, the size of the All dbSNP (153+) +subtrack can cause the +<a href="/cgi-bin/hgTables" target=_blank>Table Browser</a> and +<a href="/cgi-bin/hgIntegrator" target=_blank>Data Integrator</a> +to time out, leading to a blank page or truncated output, +unless queries are restricted to a chromosomal region or +to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser), +or to one of the subset tracks such as Common or ClinVar. +</p><p> +For automated analysis, the track data files can be downloaded from the downloads server for +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> and +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a>. Below +are specific examples for <b>dbSNP153</b>, however, the same methods and directories +will work by substituting a more recent dbSNP release. +<table class="descTbl"> + <tr> + <th colspan=3>file</th> + <th>format</th> + <th>subtrack</th> + </tr> + <tr> + <td>dbSnp153.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>All dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153ClinVar.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153ClinVar.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153ClinVar.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>ClinVar dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153Common.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Common.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Common.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>Common dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153Mult.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Mult.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Mult.bb" + target=_blank>hg38</a></td> + <td>bigDbSnp (bigBed4+13)</td> + <td>Mult. dbSNP (153)</td> + </tr> + <tr> + <td>dbSnp153BadCoords.bb</td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153BadCoords.bb" + target=_blank>hg19</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153BadCoords.bb" + target=_blank>hg38</a></td> + <td>bigBed4</td> + <td>Map Err (153)</td> + </tr> + <tr> + <td colspan=3> + <a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz" + target=_blank>dbSnp153Details.tab.gz</a> + </td> + <td>gzip-compressed tab-separated text</td> + <td>Detailed variant properties, independent of genome assembly version</td> + </tr> +</table> +</p> +<p> +Several utilities for working with bigBed-formatted binary files can be downloaded +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads" + target=_blank>here</a>. +Run a utility with no arguments in order to see a brief description of the utility and its options. +<ul> + <li><b>bigBedInfo</b> provides summary statistics about a bigBed file including the number of + items in the file. With the <b>-as</b> option, the output includes an + autoSql + definition of data columns, useful for interpreting the column values.</li> + <li><b>bigBedToBed</b> converts the binary bigBed data to tab-separated text. + Output can be restricted to a particular region by using the -chrom, -start + and -end options.</li> + <li><b>bigBedNamedItems</b> extracts rows for one or more rs# IDs.</li> +</ul> +</p> + +<p><b>Example:</b> retrieve all variants in the region chr1:200001-200400</p> +<pre><tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre> +<p><b>Example:</b> retrieve variant rs6657048</p> +<pre><tt>bigBedNamedItems dbSnp153.bb rs6657048 stdout</tt></pre> +<p><b>Example:</b> retrieve all variants with rs# IDs in file myIds.txt</p> +<pre><tt>bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed</tt></pre> <p> -Read more in <a href="http://genome.ucsc.edu/blog/"> our blog</a> about +The columns in the bigDbSnp/bigBed files and dbSnp153Details.tab.gz file are described in +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bigDbSnp.as" + target=_blank>bigDbSnp.as</a> and +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/dbSnpDetails.as" + target=_blank>dbSnpDetails.as</a> respectively. +</p><p> +UCSC has an +<a href="/goldenPath/help/api.html" + target=_blank>API</a> +that can be used to retrieve values from a particular chromosome range. +A list of rs# IDs can also be pasted/uploaded in the +<a href="/cgi-bin/hgVai" target=_blank>Variant Annotation Integrator</a> +tool in order to find out which genes (if any) the variants are located in, +as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc. +</p><p> +See our searchable +<A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/download+snps" +target=_blank>mailing list archives</a> +for more information and example queries. We also have information on +<a href="http://genome.ucsc.edu/blog/">our blog</a> about <a href="http://genome.ucsc.edu/blog/?s=programmatic"> Accessing the Genome Browser Programmatically</a> to acquire data. </p> <a name="download37"></a> <h2>Obtaining GTF (Gene Transfer Format)</h2> <h6>What is the best method for obtaining GTF output?</h6> <p> Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> option return data in <a href="../FAQ/FAQformat.html#format4">GTF format</a> is limited as explained below. To convert custom GenePred format data into GTF, the best method is to use the command-line format conversion utility, <code>genePredToGtf</code>. This can optionally be set up to automatically connect to the UCSC public SQL database and return GTF files in a few minutes using <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format#Using_kent_commands_with_the_public_database_server"> this short guide</a>.</p>