98f451794d0c9571ff8bd9433206041295ecfe2c
angie
  Fri Mar 13 09:33:14 2020 -0700
Expand the Data Access section to provide more details and examples for frequent MLQs.  refs #25095

diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html
index 651ba9b..4565c43 100644
--- src/hg/makeDb/trackDb/human/dbSnp153Composite.html
+++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html
@@ -421,44 +421,169 @@
 We downloaded dbSNP's JSON files available from
 <a href="ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/"
 target=_blank>ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/</a>,
 extracted a subset of the information about each variant, and collated
 it into a bigBed file using the
 <a href="https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/lib/bigDbSnp.as"
 target=_blank>bigDbSnp.as</a> schema with the information
 necessary for filtering and displaying the variants,
 as well as a separate file containing more detailed information to be
 displayed on each variant's details page
 (<a href="https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/lib/dbSnpDetails.as"
 target=_blank>dbSnpDetails.as</a> schema).
 
 <h2>Data Access</h2>
 <p>
-The raw data underlying the UCSC Genome Browser track can be explored interactively with the
-<a href="../../cgi-bin/hgTables" target=_blank>Table Browser</a> or
-<a href="../../cgi-bin/hgIntegrator" target=_blank>Data Integrator</a>.
+Since dbSNP has grown to include approximately 700 million variants, the size of the All dbSNP (153)
+subtrack can cause the
+<a href="../../cgi-bin/hgTables" target=_blank>Table Browser</a> and
+<a href="../../cgi-bin/hgIntegrator" target=_blank>Data Integrator</a>
+to time out, leading to a blank page or truncated output,
+unless queries are restricted to a chromosomal region or
+to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser),
+or to one of the subset tracks such as Common (~15 million variants) or ClinVar (~0.5M variants).
+</p><p>
 For automated analysis, the track data files can be downloaded from the downloads server for
-<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a> and
-<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a>
-(dbSnp153.bb); the detailed variant properties can be downloaded from
-<a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/" target=_blank>hgFixed</a>
-(dbSnp153Details.tab.gz).
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> and
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a>.
+<table class="descTbl">
+  <tr>
+    <th colspan=3>file</th>
+    <th>format</th>
+    <th>subtrack</th>
+  </tr>
+  <tr>
+    <td>dbSnp153.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>All dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153ClinVar.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153ClinVar.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153ClinVar.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>ClinVar dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153Common.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Common.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Common.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>Common dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153Mult.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Mult.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Mult.bb"
+           target=_blank>hg38</a></td>
+    <td>bigDbSnp (bigBed4+13)</td>
+    <td>Mult. dbSNP (153)</td>
+  </tr>
+  <tr>
+    <td>dbSnp153BadCoords.bb</td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153BadCoords.bb"
+           target=_blank>hg19</a></td>
+    <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153BadCoords.bb"
+           target=_blank>hg38</a></td>
+    <td>bigBed4</td>
+    <td>Map Err (153)</td>
+  </tr>
+  <tr>
+    <td colspan=3>
+      <a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz"
+         target=_blank>dbSnp153Details.tab.gz</a>
+    </td>
+    <td>gzip-compressed tab-separated text</td>
+    <td>Detailed variant properties, independent of genome assembly version</td>
+  </tr>
+</table>
+</p>
+<p>
+Several utilities for working with bigBed-formatted binary files can be downloaded
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads"
+   target=_blank>here</a>.
+Run a utility with no arguments in order to see a brief description of the utility and its options.
+<ul>
+  <li><b>bigBedInfo</b> provides summary statistics about a bigBed file including the number of
+    items in the file.  With the <b>-as</b> option, the output includes an
+    autoSql
+    definition of data columns, useful for interpreting the column values.</li>
+  <li><b>bigBedToBed</b> converts the binary bigBed data to tab-separated text.
+    Output can be restricted to a particular region by using the -chrom, -start
+    and -end options.</li>
+  <li><b>bigBedNamedItems</b> extracts rows for one or more rs# IDs.</li>
+</ul>
+</p>
+
+<h4>Example: retrieve all variants in the region chr1:200001-200400</h4>
+
+<pre><tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre>
+
+<h4>Example: retrieve variant rs6657048</h4>
+
+<pre><tt>bigBedNamedItems dbSnp153.bb rs6657048 stdout</tt></pre>
+
+<h4>Example: retrieve all variants with rs# IDs in file myIds.txt</h4>
+
+<pre><tt>bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed</tt></pre>
+
+<p>
+The columns in the bigDbSnp/bigBed files and dbSnp153Details.tab.gz file are described in
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bigDbSnp.as"
+   target=_blank>bigDbSnp.as</a> and
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/dbSnpDetails.as"
+   target=_blank>dbSnpDetails.as</a> respectively.
+For columns that contain lists of allele frequency data, the order of projects
+providing the data listed is as follows:
+<ol>
+  <li><a href="https://www.internationalgenome.org/" target=_blank>1000Genomes</a></li>
+  <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD exomes</a></li>
+  <li><a href="https://www.nhlbiwgs.org/" target=_blank>TOPMED</a></li>
+  <li><a href="http://exac.broadinstitute.org/" target=_blank>ExAC</a></li>
+  <li><a href="https://www.pagestudy.org/" target=_blank>PAGE STUDY</a></li>
+  <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD genomes</a></li>
+  <li><a href="https://esp.gs.washington.edu/" target=_blank>GoESP</a></li>
+  <li><a href="https://www.geenivaramu.ee/en" target=_blank>Estonian</a></li>
+  <li><a href="http://www.bris.ac.uk/alspac/participants/genome/" target=_blank>ALSPAC</a></li>
+  <li><a href="https://twinsuk.ac.uk/" target=_blank>TWINSUK</a></li>
+  <li><a href="https://swefreq.nbis.se/dataset/SweGen" target=_blank>NorthernSweden</a></li>
+  <li><a href="https://genomes.vn" target=_blank>Vietnamese</a></li>
+</ol>
+</p><p>
+UCSC also has an
+<a href="../goldenPath/help/api.html"
+   target=_blank>API</a>
+that can be used to retrieve values from a particular chromosome range.
+</p><p>
+A list of rs# IDs can be pasted/uploaded in the
+<a href="hgVai" target=_blank>Variant Annotation Integrator</a>
+tool in order to find out which genes (if any) the variants are located in,
+as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc.
 </p><p>
-Please refer to our
+Please refer to our searchable
 <A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/download+snps"
 target=_blank>mailing list archives</a>
-for questions and example queries, or our
+for more questions and example queries, or our
 <a HREF="../FAQ/FAQdownloads.html#download36" target=_blank>Data Access FAQ</a>
 for more information.
 </p>
 
 <h2>References</h2>
 
 <p>
 Holmes JB, Moyer E, Phan L, Maglott D, Kattman B.
 <a href="https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btz856"
 target="_blank">
 SPDI: Data Model for Variants and Applications at NCBI</a>.
 <em>Bioinformatics</em>. 2019 Nov 18;.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31738401" target="_blank">31738401</a>
 </p>
 <p>