a9c610f57dfd949068316c822d7c5f4a2beb9c48
lrnassar
  Wed Sep 18 11:55:33 2019 -0700
Adding FAQ entry on analysis sets refs #24071

diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html
index 3ac3023..e6f82b8 100755
--- src/hg/htdocs/FAQ/FAQdownloads.html
+++ src/hg/htdocs/FAQ/FAQdownloads.html
@@ -37,30 +37,31 @@
 <li><a href="#download16">Repeat-masking data</a></li>
 <li><a href="#download17">Availability of repeat-masked data</a></li>
 <li><a href="#download24">RepeatMasker version differences - UCSC vs. Repeatmasker website</a></li> 
 <li><a href="#download18">Obtaining promoter sequence</a></li>
 <li><a href="#download19">Data from Evolutionary Conservation Score tracks</a></li>
 <li><a href="#download20">Minus strand coordinates - axtNet files</a></li>
 <li><a href="#download21">Mapping UCSC STS marker IDS to those of other groups</a></li>
 <li><a href="#download22">deCODE map data</a></li>
 <li><a href="#download29">Direct MariaDB (MySQL) access to data</a></li>
 <li><a href="#download34">Name of fourth column in BED output</a></li>
 <li><a href="#download36">Track data access</a></li>
 <li><a href="#download37">Known issues with Table Browser GTF output</a></li>
 <li><a href="#download38">Table Browser output file not ordered</a></li>
 <li><a href="#download39">'Permisssion denied' error when trying to use command-line utilities</a></li>
 <li><a href="#download40">Restricted Track Data</a></li>
+<li><a href="#downloadAnalysis">What is the genome analysis set?</a></li>
 </ul>
 <hr>
 <p>
 <a href="index.html">Return to FAQ Table of Contents</a></p>
 
 <a name="download1"></a>
 <h2>Downloading sequence and annotation data</h2>
 <h6>How do I obtain the sequence and/or annotation data for a release?</h6>
 <p> 
 Sequence and annotation data downloads are usually made available within the first week of the 
 release of a new assembly. The download directories are automatically updated nightly to 
 incorporate additions and modifications to the data.</p> 
 <p>
 You can download sequence and annotation data <a href="../goldenPath/help/ftp.html">using our FTP 
 server</a>, but we recommend using rsync, which has the advantage of starting up where it left off 
@@ -976,16 +977,43 @@
 </p>
 
 <a name="download40"></a>
 <h2>Restricted Track Data</h2>
 <h6>Why can I not download some data in the Table Browser or find the download files?</h6>
 <p>
 Some data is provided by external groups and is not available for download or mirroring
 by any third party without the permission of the owners, such as the OMIM track data, which
 is the property of Johns Hopkins University. For some tools, such as attempting a getData fetch
 with our API of restricted tracks, a 403 'Forbidden' error will be returned. Please email our private internal
 <a href="mailto:&#103;&#101;&#110;&#111;me&#45;&#119;&#119;&#119;&#64;&#115;&#111;&#101;.uc&#115;&#99;.&#101;d&#117;"
 >&#103;&#101;&#110;&#111;me&#45;&#119;&#119;&#119;&#64;&#115;&#111;&#101;.uc&#115;&#99;.&#101;d&#117;</a>
 mailing list if you have any questions.
 </p>
 
+<a name="downloadAnalysis"></a>
+<h2>Analysis set</h2>
+<h6>Some genomes in the download server also reference an analysis set, what is the difference?</h6>
+<p>
+For certain genomes (GRCm38/mm10, GRCh37/hg19, GRCh38/hg38), NCBI provides an analysis set in 
+addition to the standard genome files. These are FASTA files with modified sequence identifiers 
+and index files convenient for analysis with Next Generation Sequencing tools. These files are 
+particularly helpful for NGS pipelines including variant calling and RNA-Seq analysis.</p>
+
+<p>
+Though not all analysis sets contain the same information, features include:</p>
+<ul>
+<li>Removal of alternate and fix sequences which can interfere with read alignment programs</li>
+<li>Hard masking of duplicate copies of the pseudo-autosomal regions (PARs) and centromeric 
+arrays<li>
+<li>Addition of &quot;decoy&quot; sequences</li>
+<li>Index files generated by BWA, Samtools, Bowtie and HISAT2</li></ul>
+
+<p>
+For more information on analysis sets, see the <a 
+href="https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#seqsforalign" target="_blank">NCBI 
+FAQ</a>. Information on what is contained in each specific assembly analysis set can be 
+found in the README by clicking the <strong>Genome sequence files</strong> link for the 
+assembly of interest in our 
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html">Downloads page</a>.
+</p>
+
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->