8f68c0d267b2581bf16895528ab8064ce1b377cc
lrnassar
  Thu Jun 25 11:56:12 2020 -0700
Adding new SNP download section to link from hgTables refs #25775

diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html
index e0cb451..3e1ca8e 100755
--- src/hg/htdocs/FAQ/FAQdownloads.html
+++ src/hg/htdocs/FAQ/FAQdownloads.html
@@ -33,30 +33,31 @@
 <li><a href="#download25">Mapping chimp chromosome numbers to human chromosomes numbers</a></li>
 <li><a href="#download28">Converting genome coordinates between assemblies</a></li>
 <li><a href="#download33">Linking gene name with accession number</a></li>
 <li><a href="#download31">Obtaining a list of Known Genes</a></li>
 <li><a href="#download16">Repeat-masking data</a></li>
 <li><a href="#download17">Availability of repeat-masked data</a></li>
 <li><a href="#download24">RepeatMasker version differences - UCSC vs. Repeatmasker website</a></li> 
 <li><a href="#download18">Obtaining promoter sequence</a></li>
 <li><a href="#download19">Data from Evolutionary Conservation Score tracks</a></li>
 <li><a href="#download20">Minus strand coordinates - axtNet files</a></li>
 <li><a href="#download21">Mapping UCSC STS marker IDS to those of other groups</a></li>
 <li><a href="#download22">deCODE map data</a></li>
 <li><a href="#download29">Direct MariaDB (MySQL) access to data</a></li>
 <li><a href="#download34">Name of fourth column in BED output</a></li>
 <li><a href="#download36">Track data access</a></li>
+<li><a href="#snp">How do I download dbSNP data?</a></li>
 <li><a href="#download37">Known issues with Table Browser GTF output</a></li>
 <li><a href="#download38">Table Browser output file not ordered</a></li>
 <li><a href="#download39">'Permisssion denied' error when trying to use command-line utilities</a></li>
 <li><a href="#download40">Restricted Track Data</a></li>
 <li><a href="#downloadAnalysis">What is the genome analysis set?</a></li>
 </ul>
 <hr>
 <p>
 <a href="index.html">Return to FAQ Table of Contents</a></p>
 
 <a name="download1"></a>
 <h2>Downloading sequence and annotation data</h2>
 <h6>How do I obtain the sequence and/or annotation data for a release?</h6>
 <p> 
 Sequence and annotation data downloads are usually made available within the first week of the 
@@ -890,36 +891,157 @@
 for downloading source code and binaries can be found 
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool can 
 also be used to obtain only features within a given range using one of the hgdownload servers,
 example:</p> 
 <ul>
   <li>
     North American server:
     <pre><code>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> 
   </li>
   <li>
     European server:
     <pre><code>bigBedToBed http://hgdownload-euro.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> 
   </li>
 </ul>
 <p> 
-<strong>SNP data:</strong> If queries against the SNP table on one of our public MariaDB servers or on your
-own MariaDB installation are slow, then they can be sped up by using the &quot;bin&quot; field; you 
-can <a href="../contacts.html">contact us</a> for more information.</p>
+<a name="snp"></a>
+<h2>How do I download dbSNP data?</h2>
+<p>
+For versions dbSNP153 and above, the data is formatted in bigBed files. Previous versions are MySQL
+tables. For help with versions before dbSNP153, see <a href="#download29">accessing MySQL data</a>.
+This FAQ entry pertains to versions dbSNP153 and above.</p>
+<p>
+Since dbSNP has grown to include over 700 million variants, the size of the All dbSNP (153+)
+subtrack can cause the
+<a href="/cgi-bin/hgTables" target=_blank>Table Browser</a> and
+<a href="/cgi-bin/hgIntegrator" target=_blank>Data Integrator</a>
+to time out, leading to a blank page or truncated output,
+unless queries are restricted to a chromosomal region or
+to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser),
+or to one of the subset tracks such as Common or ClinVar.
+</p><p>
+For automated analysis, the track data files can be downloaded from the downloads server for
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> and
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a>. Below
+are specific examples for <b>dbSNP153</b>, however, the same methods and directories
+will work by substituting a more recent dbSNP release.
+<table class="descTbl">
+  <tr>
+    <th colspan=3>file</th>
+    <th>format</th>
+    <th>subtrack</th>
+  </tr>
+  <tr>
+    <td>dbSnp153.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>All dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153ClinVar.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153ClinVar.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153ClinVar.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>ClinVar dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153Common.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Common.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Common.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>Common dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153Mult.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Mult.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Mult.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>Mult. dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153BadCoords.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153BadCoords.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153BadCoords.bb"
+           target=_blank>hg38</a></td>
+    <td>bigBed4</td>
+    <td>Map Err (153)</td>
+  </tr>
+  <tr>
+    <td colspan=3>
+      <a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz"
+         target=_blank>dbSnp153Details.tab.gz</a>
+    </td>
+    <td>gzip-compressed tab-separated text</td>
+    <td>Detailed variant properties, independent of genome assembly version</td>
+  </tr>
+</table>
+</p>
+<p>
+Several utilities for working with bigBed-formatted binary files can be downloaded
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads"
+   target=_blank>here</a>.
+Run a utility with no arguments in order to see a brief description of the utility and its options.
+<ul>
+  <li><b>bigBedInfo</b> provides summary statistics about a bigBed file including the number of
+    items in the file.  With the <b>-as</b> option, the output includes an
+    autoSql
+    definition of data columns, useful for interpreting the column values.</li>
+  <li><b>bigBedToBed</b> converts the binary bigBed data to tab-separated text.
+    Output can be restricted to a particular region by using the -chrom, -start
+    and -end options.</li>
+  <li><b>bigBedNamedItems</b> extracts rows for one or more rs# IDs.</li>
+</ul>
+</p>
+
+<p><b>Example:</b> retrieve all variants in the region chr1:200001-200400</p>
+<pre><tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre>
+<p><b>Example:</b> retrieve variant rs6657048</p>
+<pre><tt>bigBedNamedItems dbSnp153.bb rs6657048 stdout</tt></pre>
+<p><b>Example:</b> retrieve all variants with rs# IDs in file myIds.txt</p>
+<pre><tt>bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed</tt></pre>
 
 <p>
-Read more in <a href="http://genome.ucsc.edu/blog/"> our blog</a> about
+The columns in the bigDbSnp/bigBed files and dbSnp153Details.tab.gz file are described in
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bigDbSnp.as"
+   target=_blank>bigDbSnp.as</a> and
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/dbSnpDetails.as"
+   target=_blank>dbSnpDetails.as</a> respectively.
+</p><p>
+UCSC has an
+<a href="/goldenPath/help/api.html"
+   target=_blank>API</a>
+that can be used to retrieve values from a particular chromosome range.
+A list of rs# IDs can also be pasted/uploaded in the
+<a href="/cgi-bin/hgVai" target=_blank>Variant Annotation Integrator</a>
+tool in order to find out which genes (if any) the variants are located in,
+as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc.
+</p><p>
+See our searchable
+<A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/download+snps"
+target=_blank>mailing list archives</a>
+for more information and example queries. We also have information on
+<a href="http://genome.ucsc.edu/blog/">our blog</a> about
 <a href="http://genome.ucsc.edu/blog/?s=programmatic"> Accessing the Genome Browser Programmatically</a>
 to acquire data.
 </p>
 
 <a name="download37"></a>
 <h2>Obtaining GTF (Gene Transfer Format)</h2>
 <h6>What is the best method for obtaining GTF output?</h6>
 <p>
 Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> option return data in
 <a href="../FAQ/FAQformat.html#format4">GTF format</a> is limited as explained below.
 To convert custom GenePred format data into GTF, the best method is to use the 
 command-line format conversion utility, <code>genePredToGtf</code>. This can optionally be set up 
 to automatically connect to the UCSC public SQL database and return GTF files in a few minutes using 
 <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format#Using_kent_commands_with_the_public_database_server">
 this short guide</a>.</p>