src/hg/htdocs/FAQ/FAQdownloads.html dd63f1db2fabe42eed12f9311b482f3c8fee4b94

dd63f1db2fabe42eed12f9311b482f3c8fee4b94
kuhn
  Sun Sep 26 13:20:43 2021 -0700
fixed single-char typo

diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html
index 75d3994..819bcbf 100755
--- src/hg/htdocs/FAQ/FAQdownloads.html
+++ src/hg/htdocs/FAQ/FAQdownloads.html
@@ -1,1188 +1,1188 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser FAQ" -->
 <!--#set var="ROOT" value=".." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>Frequently Asked Questions: Data and Downloads</h1>
 
 <h2>Topics</h2>
 
 <ul>
 <li><a href="#download1">Downloading sequence and annotation data</a></li>
 <li><a href="#download35">Metadata tables for GenBank and RefSeq moved to hgFixed database</a></li>
 <li><a href="#download32">Extracting sequence in batch from an assembly</a></li>
 <li><a href="#download23">Downloading data from the UCSC DAS server</a></li>
 <li><a href="#download27">Downloading the UCSC Genome Browser source</a></li>
 <li><a href="#download2">Download restrictions</a></li>
 <li><a href="#download3">Opening .fa files</a></li>
 <li><a href="#download4">Data differences between downloaded data and browser display</a></li>
 <li><a href="#download5">Strange characters in FASTA file</a></li>
 <li><a href="#download6">Selection of GenBank ESTs</a></li>
 <li><a href="#download7">EST strand direction</a></li>
 <li><a href="#download8">Missing RefSeq ID</a></li>
 <li><a href="#download9">Finished vs. draft segments</a></li>
 <li><a href="#downloadAlt">chr_alt Chromosome</a></li>
 <li><a href="#downloadFix">chr_fix Chromosome</a></li>
 <li><a href="#download10">chrN_random tables</a></li>
 <li><a href="#download11">Chromosome Un</a></li>
 <li><a href="#download12">Chromosome M</a></li>
 <li><a href="#download13">N characters at beginning of human chr22</a></li>
 <li><a href="#download30">Erroneous duplicated chrY_random region on Mouse Build 34 (mm6)</a></li>
 <li><a href="#download25">Mapping chimp chromosome numbers to human chromosomes numbers</a></li>
 <li><a href="#download28">Converting genome coordinates between assemblies</a></li>
 <li><a href="#download33">Linking gene name with accession number</a></li>
 <li><a href="#download31">Obtaining a list of Known Genes</a></li>
 <li><a href="#download16">Repeat-masking data</a></li>
 <li><a href="#download17">Availability of repeat-masked data</a></li>
 <li><a href="#download24">RepeatMasker version differences - UCSC vs. Repeatmasker website</a></li> 
 <li><a href="#download18">Obtaining promoter sequence</a></li>
 <li><a href="#download19">Data from Evolutionary Conservation Score tracks</a></li>
 <li><a href="#download20">Minus strand coordinates - axtNet files</a></li>
 <li><a href="#download21">Mapping UCSC STS marker IDS to those of other groups</a></li>
 <li><a href="#download22">deCODE map data</a></li>
 <li><a href="#download29">Direct MariaDB (MySQL) access to data</a></li>
 <li><a href="#download34">Name of fourth column in BED output</a></li>
 <li><a href="#download36">Track data access</a></li>
 <li><a href="#snp">How do I download dbSNP data?</a></li>
 <li><a href="#snpAlleles">Why doesn't this SNP have two alleles?</a></li>
 <li><a href="#download37">Known issues with Table Browser GTF output</a></li>
 <li><a href="#download38">Table Browser output file not ordered</a></li>
 <li><a href="#download39">'Permisssion denied' error when trying to use command-line utilities</a></li>
 <li><a href="#download40">Restricted Track Data</a></li>
 <li><a href="#downloadAnalysis">What is the genome analysis set?</a></li>
 </ul>
 <hr>
 <p>
 <a href="index.html">Return to FAQ Table of Contents</a></p>
 
 <a name="download1"></a>
 <h2>Downloading sequence and annotation data</h2>
 <h6>How do I obtain the sequence and/or annotation data for a release?</h6>
 <p> 
 Sequence and annotation data downloads are usually made available within the first week of the 
 release of a new assembly. The download directories are automatically updated nightly to 
 incorporate additions and modifications to the data.</p> 
 <p>
 You can download sequence and annotation data <a href="../goldenPath/help/ftp.html">using our FTP 
 server</a>, but we recommend using rsync, which has the advantage of starting up where it left off 
 after a failure, when run again. Please see the previous link for examples.</p> 
 <p>
 You can also download data from our 
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html">Downloads</a> page or our 
 <a href="../cgi-bin/das/dsn" target="_blank">DAS server</a>. To download a specific subset of the 
 data or to configure the output format of the data, use the 
 <a href="../cgi-bin/hgTables">Table Browser</a>. For information on extracting a large set of 
 sequences from an assembly, see	<a href="#download32">Extracting sequence in batch from an 
 assembly</a>.</p> 
 <p>
 For more information on using the UCSC DAS server, see <a href="#download23">Downloading data from 
 the UCSC DAS server</a>.</p> 
 <p> 
 <p>
 Another option for querying sequence and annotation data is the <a href='../goldenPath/help/api.html' 
 target=_blank>REST API</a>. This interface allows for extraction of sequence and annotations from
 both UCSC assemblies and from hubs.</p>
 <p>
 <strong>To quickly download large volumes of data you can use UDR (UDT Enabled Rysnc):</strong> UDR 
 provides users much faster download rates. Here is an example using UDR, once installed, to download
 all the mouse mm9 ENCODE information that amounts to several terabytes:</p>
 <pre><code>$ udr rsync -avP hgdownload.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/</code></pre>
 <pre><code>$ udr rsync -avP hgdownload-euro.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/</code></pre>
 <p>
 
 Optional: download from our secondary download server.
 <pre><code>$ udr rsync -avP hgdownload2.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/</code></pre> 
 
 Please read more about the new UDR method <a href="../../goldenPath/newsarch.html#030315" 
 target="_blank">here</a>.</p>
 
 <a name="download35"></a>
 <h2>Metadata tables for GenBank and RefSeq moved to hgFixed database</h2>
 <h6>I can no longer find metadata tables like gbCdnaInfo for an assembly.</h6>
 <p> 
 As of June 2016, the location of metadata tables that support the GenBank and RefSeq tracks 
 (RefSeq, Other RefSeq, mRNA, EST, etc.) have been moved from directories of individual assemblies 
 to one global database, hgFixed.</p> 
 <p>
 The tables below (previously found per assembly) can now be downloaded from the 
 <a href="http://hgdownload.soe.ucsc.edu/goldenPath/hgFixed/database/">hgFixed database</a>:</p>
 
 <!-- 3-column layout -->
 <div class="row">
   <!-- Left column -->
   <div class="col-md-3">
   <ul> 
     <li>author</li>
     <li>cds</li>
     <li>cell</li>
     <li>description</li>
     <li>development</li>
     <li>gbCdnaInfo</li>
     <li>gbExtFile</li>
     <li>gbLoaded</li>
   </ul>
   </div>
   <!-- middle column -->
   <div class="col-md-3">
   <ul> 
     <li>gbMiscDiff</li>
     <li>gbSeq</li>
     <li>gbWarn</li>
     <li>geneName</li>
     <li>imageClone</li>
     <li>keyword</li>
     <li>library</li>
     <li>mrnaClone</li>
   </ul>
   </div>
   <!-- right column -->
   <div class="col-md-3">
   <ul>
     <li>organism</li>
     <li>productName</li>
     <li>refLink</li>
     <li>refSeqStatus</li>
     <li>refSeqSummary</li>
     <li>sex</li>
     <li>source</li>
     <li>tissue</li>
   </ul> 
   </div>
   <div class="col-md-3">
   </div>
 </div>
 <p>
 These tables are also accessible from: </p>
 <ul>
   <li> 
   The <a href="../cgi-bin/hgTables" >Table Browser</a>, as connected tables and joined fields 
   described when clicking the &quot;describe table schema &quot; button</li>
   <li>
   One of our two <a href="../goldenPath/help/mysql.html">public access MariaDB servers</a>
   in the US and Europe</li>
 </ul>
 
 <a name="download32"></a>
 <h2>Extracting sequence in batch from an assembly</h2>
 <h6>I have a lot of coordinates for an assembly and want to extract the corresponding sequences.
 What is the best way to proceed? </h6>
 <p> 
 There are two ways to extract genomic sequence in batch from an assembly:</p>
 <p>
 A. Download the appropriate fasta files from our 
 <a href="ftp://hgdownload.soe.ucsc.edu/goldenPath/">ftp server</a> and extract sequence data using 
 your own tools or the tools from our source tree. This is the recommended method when you have very 
 large sequence datasets or will be extracting data frequently. Sequence data for most assemblies is 
 located in the assembly's &quot;chromosomes&quot; subdirectory on the downloads server. For example,
 the sequence for human assembly hg17 can be found in 
 <a href="ftp://hgdownload.soe.ucsc.edu/goldenPath/hg17/chromosomes/">ftp://hgdownload.soe.ucsc.edu/goldenPath/hg17/chromosomes/</a>.  
 You'll find instructions for obtaining our source programs and utilities 
 <a href="FAQlicense.html#license3">here</a>. Some programs that you may find useful are nibFrag and 
 twoBitToFa, as well as other fa* programs. To obtain usage information about most programs, execute 
 it without arguments.</p> 
 <p> 
 B. Use the Table browser to extract sequence. This is a convenient way to obtain small amounts of 
 sequence.</p>
 <ol>
   <li>
   Create a <a href="../goldenPath/help/hgTracksHelp.html#CustomTracks">custom track</a> of the 
   genomic coordinates in <a href="FAQformat.html#format1">BED format</a> and upload into the Genome 
   Browser. </li>
   <li>
   Select the custom track in the Table browser, then select the &quot;sequence&quot; output format 
   to retrieve data. We recommend that you save the file locally as gzip. </li>
 </ol>
 
 <a name="download23"></a>
 <h2>Downloading data from the UCSC DAS server</h2>
 <h6>How do I download data using the UCSC DAS server?</h6>
 <p> 
 The UCSC DAS server provides access to genome annotation data for all current assemblies featured in
 the Genome Browser. To view a list of the assemblies available from the DAS server and their base 
 URLs, see <a href="../cgi-bin/das/dsn">http://genome.ucsc.edu/cgi-bin/das/dsn</a>.</p> 
 <p>
 To construct a DAS query, combine an assembly's base URL with the sequence entry point and type 
 specifiers available for that assembly. The entry point specifies chromosome position, and the type 
 indicates the annotation table requested. You can view the lists of entry points and types available
 for an assembly with requests of the form:</p>
 <pre><code>http://genome.ucsc.edu/cgi-bin/das/[db_name]/entry_points
 http://genome.ucsc.edu/cgi-bin/das/[db_name]/types </code></pre>
 <p>
 where [db_name] is the UCSC name for the assembly, e.g. hg16, mm4.</p>
 <p>
 For example, here is a query that returns all the records in the refGene table for the chromosome 
 position chr1:1-100000 on the hg16 assembly:
 <pre><a href="../cgi-bin/das/hg16/features?segment=1:1,100000;type=refGene">http://genome.ucsc.edu/cgi-bin/das/hg16/features?segment=1:1,100000;type=refGene</a></pre>
 <p>
 For more information on DAS, see the <a href="http://www.biodas.org" 
 target="_blank">Biodas website</a> and the <a href="http://www.biodas.org/documents/spec.html" 
 target="_blank">DAS specification</a>.</p>
 <p>
 A more recent alternative to the DAS server is the <a href='../goldenPath/help/api.html' 
 target=_blank>REST API</a>.</p>
 
 <a name="download27"></a>
 <h2>Downloading the UCSC Genome Browser source</h2>
 <h6>Where can I download the Genome Browser source code and executables?</h6>
 <p> 
 The Genome Browser source code and executables are freely available for academic, nonprofit, and 
 personal use (see <a href="FAQlicense.html#license2">Licensing the Genome Browser or Blat</a> for 
 commercial licensing requirements). The latest version of the source code may be downloaded 
 <a href="http://genome-store.ucsc.edu">here</a>.</p> 
 <p>
 See <a href="FAQblat.html#blat3">Downloading Blat source and documentation</a> for information on 
 Blat downloads.</p>
 
 <a name="download2"></a>
 <h2>Download restrictions</h2>
 <h6>Do you have restrictions on the amount of downloads one can do?</h6>
 <p>
 Generally, we'd prefer that you not hit our interactive site with programs, unless they are 
 themselves front ends for interactive sites. We can handle the traffic from all the clicks that 
 biologists are likely to generate, but not from programs. Program-driven use is limited to a 
 maximum of one hit every 15 seconds and no more than 5,000 hits per day.</p> 
 <p>
 If you need to run batch Blat jobs, see <a href="FAQblat.html#blat3">Downloading Blat source and 
 documentation</a> for a copy of Blat you can run locally.</p>
 
 <a name="download3"></a>
 <h2>Opening .fa files</h2>
 <h6>I am trying to look at the final decoding of the human genome. How can I open the *.fa
 files?</h6>
 <p> 
 Microsoft Word or any program that can handle large text files will do. Some of the chromosomes 
 begin with long blocks of <em>N</em>s. You may want to search for an <em>A</em> to get past
 them.</p>
 <p>
 Unless you have a particular need to view or use the raw data files, you might find it more 
 interesting to look at the data using the Genome Browser. Type the name of a gene in which you're 
 interested into the position box (or use the default position), then click the submit button. In 
 the resulting Genome Browser display, click the DNA link on the menu bar at the top of the page. 
 Select the Extended case/color options button at the bottom of the next page. Now you can color the 
 DNA sequence to display which portions are repeats, known genes, genetic markers, etc.</p>
 
 <a name="download4"></a>
 <h2>Data differences between downloaded data and browser display</h2>
 <p>
 <h6>I downloaded the genome annotations from your MariaDB database tables, but the mRNA locations 
 didn't match what was showing in the Genome Browser. Shouldn't they be in synch?</h6>
 <p> 
 Yes. The Genome Browser and Table Browser are both driven by the same underlying MariaDB database. 
 Check that your downloaded tables are from the same assembly version as the one you are viewing in 
 the Genome Browser. If the assembly dates don't match, the coordinates of the data within the 
 tables may differ. In a very rare instance, you could also be affected by the brief lag time between
 the update of the live databases underlying the Genome Browser and the time it takes for text dumps 
 of these databases to become available in the downloads directory.</p> 
 
 <a name="download5"></a>
 <h2>Strange characters in FASTA file</h2>
 <h6>I noticed several characters other than <em>A</em>, <em>C</em>, <em>G</em>, <em>T</em>, and 
 <em>N</em> in my fasta file, for example <em>y</em>, <em>k</em>, <em>s</em>, etc. Is the file 
 corrupted or are these characters valid?</h6>
 <p>
 The characters most commonly seen in sequence are <em>A</em>, <em>C</em>, <em>G</em>, <em>T</em>, 
 and <em>N</em>, but there are several other valid characters that are used in clones to indicate 
 ambiguity about the identity of certain bases in the sequence. It's not uncommon to see these 
 &quot;wobble&quot; codes at polymorphic positions in DNA sequences. The following chart (IUPAC-IUB 
 Symbols for Nucleotide Nomenclature: Cornish-Bowden (1985). <em>Nucl. Acids Res.</em> 13:3021-3030) 
 lists nucleotide symbols, including those used for ambiguity:</p>
 <PRE>--------------------------------------
 Symbol       Meaning      Nucleic Acid
 --------------------------------------
    A            A           Adenine
    C            C           Cytosine
    G            G           Guanine
    T            T           Thymine
    U            U           Uracil
    M          A or C
    R          A or G        Purine
    W          A or T
    S          C or G
    Y          C or T        Pyrimidine
    K          G or T
    V        A or C or G
    H        A or C or T
    D        A or G or T
    B        C or G or T
    X      G or A or T or C
    N      G or A or T or C </PRE>
 
 <a name="download6"></a>
 <h2>Selection of GenBank ESTs</h2>
 <h6>I am interested in ESTs. How do you select which ones from GenBank to display in the Genome 
 Browser?</h6>
 <p>
 All ESTs in GenBank on the date of the track data freeze for the given organism are used - none are 
 discarded. When two ESTs have identical sequences, both are retained because this can be significant
 corroboration of a splice site.</p> 
 <p>
 ESTs are aligned against the genome using the Blat program. When a single EST aligns in multiple 
 places, the alignment having the highest base identity is found. Only alignments that have a base 
 identity level within a selected percentage of the best are kept. Alignments must also have a 
 minimum base identity to be kept. For more information on the selection criteria specific to each 
 organism, consult the description page accompanying the EST track for that organism.</p> 
 <p>
 The maximum intron length allowed by Blat is 500,000 bases, which may eliminate some ESTs with very 
 long introns that might otherwise align. If an EST aligns non-contiguously (i.e. an intron has been 
 spliced out), it is also a candidate for the Spliced EST track, provided it meets various quality 
 controls for intron and exon length and match quality. Start and stop coordinates of each alignment 
 block are available from the appropriate table within the 
 <a href="../cgi-bin/hgTables">Table Browser</a>.</p> 
 <p>
 Note that only 250 EST tracks can be viewed at a time within the browser. If more than 250 tracks 
 exist for the selected region, the display defaults to a denser display mode to prevent the user's 
 web browser from being overloaded. You can restore the EST track display to a fuller display mode by
 zooming in on the chromosomal range or by using the EST track filter to restrict the number of 
 tracks displayed. </p> 
 <p>
 For tracks such as Non[Organism] ESTs and Non[Organism] mRNAs, some selection is done on the full 
 set at GenBank. If a sequence is too divergent from the organism's genome to generate a significant 
 Blat hit, it is not included in the track. </p>
 
 <a name="download7"></a>
 <h2>EST strand direction</h2>
 <h6>Could you help me with my interpretation of EST data? If the EST is taken from the minus (-)
 strand, does this always mean that the transcript is generated on the minus strand? Are two 
 corresponding ESTs that are assigned - and + always complementary?</br><br>
 I want to confirm the strand assignment for two human ESTs:
 <ul>
   <li>
   BQ016549 (chr22:22,310,674-22,332,143 on hg18): + strand in text and - strand in graphical 
   display</li> 
   <li>
   AA928010 (chr22:20,345,264-20,354,528 on hg18): - strand in text and + strand in graphical 
   display</li> 
 </ul>
 The graphical display goes with the orientation of the gene in that location.</h6>
 <p> 
 From the examples above, it can be seen that the strand to which an EST aligns is not necessarily 
 reflected in the direction of transcription shown by the arrows in the display. When UCSC downloads 
 mRNAs and ESTs from GenBank and aligns them to a genome assembly using Blat, each EST aligns to the 
 + or - strand (forward or reverse direction) of the genome, which we record as + or - in the strand 
 field of the corresponding database table, e.g. all_ests or chrN_est. The strand information (+/-) 
 therefore indicates the direction of the match between the EST and the matching genomic sequence. It
 bears no relationship to the direction of transcription of the RNA with which it might be 
 associated. Determining the direction of transcription for ESTs is not an easy task so we do some 
 calculations to make the best guess for the transcription direction.</p> 
 <p>
 ESTs are sequenced from either the 5' or the 3' end. When sequenced from the 5' end, the resulting 
 sequence is the same as that of the mRNA which it represents. With a 3' end read, the resulting 
 sequence matches the opposite strand of the cDNA clone. Therefore, it is the reverse complement of 
 the actual mRNA sequence.  A problem occurs if the EST contributor reverse-complements the 3'-read 
 sequence before depositing it into GenBank, with the idea that people will want the mRNA 
 (transcription-direction) sequence. It is not always possible to determine if this has been done. 
 Therefore, we do some calculations to try to determine the correct direction of transcription for 
 the EST sequence.</p> 
 <p>
 If an EST alignment produces canonical introns (with gt-ag splice-site pairs), this is used to 
 determine the transcription direction. For example when an EST is aligned to the genome, a canonical
 intron would look like this:</p>
 <pre>NNNNexonNNNNgtnnnnintronnnnnnnnagNNNNexon </pre> 
 <p>
 Here, the two nucleotides on either end of the intron show the canonical gt-ag splice site pairs. 
 To find transcription direction, we use a method that relies on finding gt-ag canonical pairs in one
 direction more often than in the opposite direction. The calculation is:</p>
 <pre>gt/ag introns minus ct/ac introns = intronOrientation</pre> 
 <p>	
 The sign of this calculated intronOrientation field (stored in the estOrientInfo table) shows the 
 orientation of the transcript relative to the EST. Therefore, if intronOrientation is positive, 
 then the EST appears in the display with the arrows pointing in the same direction as the
 EST.</p>
 
 <a name="download8"></a>
 <h2>Missing RefSeq ID</h2>
 <h6>Why isn't my refseq ID in your database?</h6> 
 <p> 
 It may have been added after we last downloaded data from GenBank, or it may have been replaced or 
 removed. You can check the submission date and status of an accession on the 
 <a href="https://www.ncbi.nlm.nih.gov/nucleotide/" target="_blank">NCBI Entrez Nucleotide
 site</a>.</p>
 
 <a name="download9"></a>
 <h2>Finished vs. draft segments</h2>
 <h6>Do chr<em>N</em>.fa tables contain both finished and draft segments? If so, how do you 
 determine which segments are finished?</h6> 
 <p> 
 Yes, these tables contain both finished and draft segments. Use the corresponding 
 chr<em>N</em>_gold table to look them up. The quality of the draft varies. In general, the larger 
 the contig it is in, the better the quality. The quality of the last 500 bases on either end of a 
 contig tends to be lower than that of the rest of the contig.  
 <p> 
 How do you determine the accuracy? The base-calling program <a href="http://www.phrap.org/" 
 target="_blank">Phred</a> analyzes the traces from the sequencing machines and assigns a quality 
 score to these. These quality scores are used by the <a href="http://www.phrap.org/" 
 target="_blank">Phrap</a> assembly program, which gives quality scores for the bases on the assembly
 as well.</p>
 
 <a name="downloadAlt"></a>
 <h2>chr_alt chromosomes</h2>
 <h6>What is chr_alt?</h6>
 <p>
 The chr_alt chromosomes, such as <em>chr5_KI270794v1_alt</em>, are alternative sequences that differ
 from the reference genome currently available for a few assemblies including danRer11, mm10, hg19,
 and hg38. These are regions of the genome that exhibit sufficient variability to prevent adequate
 representation by a single sequence. UCSC labels these haplotype sequences by appending 
 &quot;_alt&quot; to their names. These alternative loci scaffolds (such as KI270794.1 in the hg38 
 assembly, referenced as chr5_KI270794v1_alt in the browser), are mapped to the genome and provide 
-suppemental genomic information on these variable locations. To find the regions these alternate 
+supplemental genomic information on these variable locations. To find the regions these alternate
 sequences correspond to in the genome you may use the 
 <a href="../cgi-bin/hgTrackUi?db=hg38&g=altSeqLiftOverPsl" target="_blank">Alt Haplotypes track</a> 
 if one is available. 
 </p>
 <p>
 Additional information on alternative loci can be found on our <a 
 href="http://genome.ucsc.edu/blog/patches/" target="_blank">hg38 patches blog post</a> 
 as well as the <a 
 href="http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/info/definitions.shtml#ALTERNATE"
 target="_blank">Genome Reference Consortium (GRC) website</a>.
 </p>
 
 <a name="downloadFix"></a>
 <h2>chr_fix chromosomes</h2>
 <h6>What is chr_fix?</h6>
 <p>
 The chr_fix chromosomes, such as <em>chr1_KN538361v1_fix</em>, are fix patches currently available
 for the mm10, hg19, and hg38 assemblies that represent changes to the existing sequence. These are
 generally error corrections (such as base changes, component replacements/updates, switch point
 updates or tiling path changes) or assembly improvements (such as extension of sequence into gaps). 
 These fix patch scaffold sequences are given chromosome context through alignments to the 
 corresponding chromosome regions. A list of all chromosomes including chr_fix sequences can be 
 found in the <a href="../cgi-bin/hgTracks?db=mm10&chromInfoPage=" target="_blank">mm10</a>,
 <a href="../cgi-bin/hgTracks?db=hg19&chromInfoPage=" target="_blank">hg19</a>, or
 <a href="../cgi-bin/hgTracks?db=hg38&chromInfoPage=" target="_blank">hg38</a> assembly sequences
 pages.
 </p>
 <p>
 More information on these patch sequences can be found on our 
 <a href="http://genome.ucsc.edu/blog/patches/" target="_blank">hg38 patches blog post</a> as well 
 as on the the <a href="https://www.ncbi.nlm.nih.gov/grc/help/faq/#fix-patches" 
 target="_blank">Genome Reference Consortium (GRC) website</a>.
 </p>
 
 <a name="download10"></a>
 <h2>chrN_random tables</h2>
 <h6>What are the chr<em>N</em>_random_[table] files in the human assembly? Why are they called 
 random? Is there something biologically random about the sequence in these tables or are they just 
 not placed within their given chromosomes?</h6>
 <p> 
 In the past, these tables contained data related to sequence that is known to be in a particular 
 chromosome, but could not be reliably ordered within the current sequence.</p> 
 <p> 
 Starting with the Apr. 2003 human assembly, these tables also include data for sequence that is not 
 in a finished state, but whose location in the chromosome is known, in addition to the unordered 
 sequence.  Because this sequence is not quite finished, it could not be included in the main 
 &quot;finished&quot; ordered and oriented section of the chromosome.</p>  
 <p> 
 Also, in a very few cases in the Apr. 2003 assembly, the random files contain data related to 
 sequence for alternative haplotypes. This is present primarily in chr6, where we have included two 
 alternative versions of the MHC region in chr6_random. There are a few clones in other chromosomes 
 that also correspond to a different haplotype. Because the primary reference sequence can only 
 display a single haplotype, these alternatives were included in random files. In subsequent 
 assemblies, these regions have been moved into separate files (<em>e.g.</em> chr6_hla_hap1).</p>
 
 <a name="download11"></a>
 <h2>Chromosome Un</h2>
 <h6>What is chrUn?</h6> 
 <p> 
 ChrUn contains clone contigs that cannot be confidently placed on a specific chromosome. For the 
 chr<em>N</em>_random and chrUn_random files, we essentially just concatenate together all the 
 contigs into short pseudo-chromosomes. The coordinates of these are fairly arbitrary, although the 
 relative positions of the coordinates are good within a contig. You can find more information about 
 the data organization and format on the <a href="../goldenPath/datorg.html">Data Organization and 
 Format</a> page.</p>
 
 <a name="download12"></a>
 <h2>Chromosome M</h2>
 <h6>What is chromosome M (chrM)?</h6>
 <p>
 Mitochondrial DNA.</p>
 
 <a name="download13"></a>
 <h2>N characters at beginning of human chr22</h2>
 <h6>When I download human chr22 from your web site, the unzipped file contains only
 <em>N</em>s.</h6>
 <p>
 There is a large block of <em>N</em>s at the beginning and end of chr22. Search for an <em>A</em> 
 to bypass the initial group of <em>N</em>s.</p>
 
 <a name="download30"></a>
 <h2>Erroneous duplicated chrY_random region on Mouse Build 34 (mm6)</h2>
 <h6>On the mm6 assembly, I've found duplicate contigs that are placed on both chrY and chrY_random. 
 Is this intentional?</h6>
 <p> 
 On the mm6 assembly, chrY_random erroneously contains a region duplicated from chrY. Because NCBI 
 discovered this assembly problem after the UCSC Genome Browser was processed, we were not able to 
 remove it from mm6 prior to the browser's release. The duplicated section occupies chrY:1-696,521 
 and chrY_random:29,615,053-30,311,573 (the end of the chromosome) and includes the following 
 repeated fragments:</p>
 <ul>
   <li>AC134433.3</li>
   <li>AC145392.2</li>
   <li>AC148319.2</li>
   <li>AC145571.3</li>
   <li>AC145393.4</li>
 </ul>
 <p>
 The fragments are assembled into the contig NT_111995 for chrY_random and also appear (under 
 different names) as regions on contigs MmY_110865_34, MmY_78990_34 and NT_078925.</p>
 
 <a name="download25"></a>
 <h2>Mapping chimp chromosome numbers to human chromosomes numbers</h2>
 <h6>How do the chimp and human chromosome numbering schemes compare?</h6>
 <p> 
 The following table shows the mapping of chromosomes in the chimp draft assemblies to human 
 chromosomes. Starting with the panTro2 assembly, the numbering scheme was changed to reflect a new 
 standard that preserves orthology with human chromosomes. Initially proposed by E.H. McConkey in 
 2004, the new numbering convention was subsequently endorsed by the International Chimpanzee 
 Sequencing and Analysis Consortium. This standard assigns the identifiers "2a" and "2b" to the two 
 chimp chromosomes that fused in the human genome to form chromosome 2 and renumbers the other 
 chromosomes to more closely match their human counterparts. As a result, chromosomes 2 and 23 
 (present in the panTro1 assembly) do not exist in later versions.</p>
 
 <div class="row">
   <!-- Left column  - used to indent table -->
   <div class="col-md-1">
   </div>
   <!-- Middle column -->
   <div class="col-md-6">
     <!--chrom table-->
     <table class="gbsCenterText" border=1>
       <tr>
         <th>Human Chr</th> 
         <th>Chimp Chr (panTro1)</th>
         <th>Chimp Chr (panTro2)</th> 
       </tr> 
       <tr>
         <td>1</td><td>1</td><td>1</td>
       </tr>
       <tr>
         <td>2 (part)</td><td>12</td><td>2a</td>
       </tr>
       <tr>
         <td>2 (part)</td><td>13</td><td>2b</td>
       </tr>
       <tr>
         <td>3</td><td>2</td><td>3</td>
       </tr>
       <tr>
         <td>4</td><td>3</td><td>4</td>
       </tr>
       <tr>
         <td>5</td><td>4</td><td>5</td>
       </tr>
       <tr>
         <td>6</td><td>5</td><td>6</td>
       </tr>
       <tr>
         <td>7</td><td>6</td><td>7</td>
       </tr>
       <tr>
         <td>8</td><td>7</td><td>8</td> 
       </tr>
       <tr>
         <td>9</td><td>11</td><td>9</td>
       </tr>
       <tr>
         <td>10</td><td>8</td><td>10</td> 
       </tr>
       <tr>
         <td>11</td><td>9</td><td>11</td>
       </tr>
       <tr>
         <td>12</td><td>10</td><td>12</td>
       </tr>
       <tr>
         <td>13</td><td>14</td><td>13</td>
       </tr>
       <tr>
         <td>14</td><td>15</td><td>14</td>
       </tr>
       <tr>
         <td>15</td><td>16</td><td>15</td>
       </tr>
       <tr>
         <td>16</td><td>18</td><td>16</td>
       </tr>
       <tr>
         <td>17</td><td>19</td><td>17</td>
       </tr>
       <tr>
         <td>18</td><td>17</td><td>18</td>
       </tr>
       <tr>
         <td>19</td><td>20</td><td>19</td>
       </tr>
       <tr>
         <td>20</td><td>21</td><td>20</td>
       </tr>
       <tr>
         <td>21</td><td>22</td><td>21</td>
       </tr>
       <tr>
         <td>22</td><td>23</td><td>22</td> 
       </tr>
       <tr>
         <td>X</td><td>X</td><td>X</td>
       </tr>
       <tr>
         <td>Y</td><td>Y</td><td>Y</td>
       </tr>
     </table>
   </div>
 </div>
 
 <a name="download28"></a>
 <h2>Converting genome coordinates between assemblies</h2>
 <h6>I've been researching a specific area of the human genome on the current assembly, and now 
 you've just released a new version. Is there an easy way to locate my area of interest on the new 
 assembly?</h6>
 <p>
 You can migrate sequences from one assembly to another by using the <a href="../cgi-bin/hgBlat">Blat</a> 
 alignment tool or by converting assembly coordinates. There are two conversion tools available 
 on the Genome Browser web site: the Convert utility and the LiftOver tool. The Convert utility, 
 which is accessed from the View menu on the Genome Browser annotation tracks page, supports forward, 
 reverse, and cross-species conversions, but does not accept batch input. The 
 <a href="../cgi-bin/hgLiftOver">LiftOver</a> tool, accessed via the Tools link on the Genome 
 Browser home page, also supports forward, reverse, and cross-species conversions, as well as batch 
 conversions.</p> 
 <p>
 If you wish to update a large number of coordinates to a different assembly and have access to a 
 Linux platform, you may find it useful to try the command-line version of the LiftOver tool. The 
 executable file for this utility can be downloaded 
 <a href="https://genome-store.ucsc.edu" target="_blank">here</a>. LiftOver requires a 
 pre-generated <em>over.chain</em> file as input, available for selected 
 assemblies from the 
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#liftover">Downloads</a> page. If the desired 
 file is not available, send a request to the <a href="../contacts.html">genome mailing list</a> and 
 we may be able to  provide you with one.</p>
 <p>
 Here is an example on how to set up and run LiftOver from the command line:</p>
 <ol>
   <li>Download the LiftOver program for your computer's operating system 
 <a href="https://genome-store.ucsc.edu" target="_blank">here</a>
   <li>Change permissions on that file so that it can be executed
   <pre>
 chmod +x liftOver</pre></li>
   <li>Run the program with no arguments to see the usage statement
   <pre>
 ./liftOver</pre>
   <pre>
 liftOver - Move annotations from one assembly to another
 usage:
    liftOver oldFile map.chain newFile unMapped
 ...</pre></li>
   <li>Download your genome conversion chain file from the 
   <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads directory</a>.
   For example, the human to mouse conversion (hg38ToMm10) can be downloaded like so:
   <pre>wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToMm10.over.chain.gz</pre>
   </li>
   <li>Prepare your BED file input. Here is a few lines from a BED file you can
   copy into a text file, saved as &quot;preLift.bed&quot;.
   <pre>
 chr1	11166587	11191615	MTOR
 chr9	136130562	136150630	ABO
 chr12	25358179	25403854	KRAS
 chrX	151335633	151619831	GABRA3</pre></li>
   <li>You can now use the following command to LiftOver a BED file with annotations in your original
   genome, &quot;preLift.bed&quot;, with your successful conversions in &quot;conversions.bed&quot; and
   unsuccessful conversions in &quot;unMapped&quot;.
   <pre>
 ./liftOver preLift.bed hg19ToHg38.over.chain.gz conversions.bed unMapped</pre></li>
 </ol>
 
 <a name="download33"></a>
 <h2>Linking gene name with accession number</h2>
 <h6>I have the accession number for a gene and would like to link it to the gene name. Is there 
 a table that shows both pieces of information?</h6>
 <p> 
 If you are looking at the RefSeq Genes, the <em>refFlat</em> table contains both the gene name 
 (usually a HUGO Gene Nomenclature Committee ID) and its accession number. For the Known Genes, use 
 the <em>kgAlias</em> table.</p>
 
 <a name="download31"></a>
 <h2>Obtaining a list of Known Genes</h2>
 <h6>How can I obtain a complete list of all the genes in the UCSC Known Genes table for a 
 particular organism?</h6>
 <p> 
 To obtain a complete copy of the entire Known Genes data set for an organism, open the Genome 
 Browser <a href="http://hgdownload.soe.ucsc.edu/downloads.html">Downloads page</a>, jump to the 
 section specific to the organism, click the Annotation database link in that section, then click the
 link for the <em>knownGene.txt.gz</em> table.</p> 
 <p>
 Data for a specific region or chromosome may be obtained from the Table Browser by selecting the 
 &quot;Genes and Gene Prediction Tracks&quot; group, the &quot;UCSC Genes&quot; track and the 
 &quot;knownGene&quot; table. Set the position to the region of interest, then click the 
 &quot;get output&quot; button.</p>
 
 <a name="download16"></a>
 <h2>Repeat-masking data</h2>
 <h6>What version of RepeatMasker do you use on your data? Which flags do you use?</h6>
 <p>
 UCSC uses the latest versions of RepeatMasker and repeat libraries available on the date when the 
 assembly data is processed. RepeatMasker version information can usually be found in the README text
 for the assembly's bigZips <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> 
 directory.</p> 
 <p>
 Masking is done using the RepeatMasker <em>-s</em> flag. For mouse repeats, we also use 
 <em>-m</em>. In addition to RepeatMasker, we use the Tandem Repeat Finder (trf) program, masking out
 repeats of period 12 or less. The repeats are just &quot;soft&quot; masked. Alignments are allowed 
 to extend through repeats, but not initiate in them.</p>
 
 <a name="download17"></a>
 <h2>Availability of repeat-masked data</h2>
 <h6>Are the repeat annotation files available for every chromosome?</h6>
 <p> 
 Yes, you can obtain the repeat-masked files via the Table Browser or from the organism's annotation 
 database downloads directory. The RepeatMasker annotation tables are named chr<em>N</em>_rmsk 
 (where <em>N</em> represents the chromosome number) and the Tandem Repeat Finder (TRF) tables are 
 named simpleRepeat.</p>
 
 <a name="download24"></a>
 <h2>RepeatMasker version differences - UCSC vs. RepeatMasker website</h2>
 <h6>When I run RepeatMasker independently from the RepeatMasker web server, my results vary from
 those of UCSC. What's the cause?</h6>
 <p> 
 UCSC occasionally uses updated versions of the RepeatMasker software and repeat libraries that are 
 not yet available on the RepeatMasker website (see <a href="#download16">Repeat-masking data</a> 
 for more information).</p>
 
 <a name="download18"></a>
 <h2>Obtaining promoter sequence</h2>
 <h6>How can I fetch promoter sequence upstream of a gene?</h6>
 <p> 
 The UCSC Genome Browser offers several ways to obtain this information, depending on your 
 requirements.</p> 
 <p>
 The Genome Browser <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads site</a> 
 provides prepackaged downloads of 1000 bp, 2000 bp, and 5000 bp upstream sequence for RefSeq genes 
 that have a coding portion and annotated 5' and 3' UTRs. You can obtain these from the bigZips 
 downloads directory for the assembly of interest.</p> 
 <p> 
 To fetch the upstream sequence for a specific gene, use the <a href="../cgi-bin/hgTables">Table 
 Browser</a>. Enter the genome, assembly, and select the knownGene table. Paste the gene name or 
 accession number in the identifier field. Choose sequence for the output format type, then click the
 get output button. On the next page, select genomic. On the final page, you will have the 
 opportunity to configure the amount of upstream promoter sequence to fetch, along with several 
 other options. Click Get Sequence when you've finished configuring the output.</p> 
 <p> 
 You can also use the Genome Browser to obtain sequence for a specific gene. Open the Genome Browser 
 window to display the gene in which you're interested. Click the entry for the gene in the RefSeq 
 or Known Genes track, then click the Genomic Sequence link. Alternatively, you can click the DNA 
 link in the top menu bar of the Genome Browser tracks window to access options for displaying the 
 sequence.</p> 
 <p>
 The Stanford Human Promoters track on the 
 <a href="../goldenPath/customTracks/custTracks.html">UCSC Custom Annotation Tracks page</a> shows 
 promoters for some of the human assemblies.</p>
 
 <a name="download19"></a>
 <h2>Data from Evolutionary Conservation Score tracks</h2>
 <h6>Where can I download the conservation score data from the Human/Mouse Evolutionary 
 Conservation Score track?</h6>
 <p> 
 The conservation score data are stored in a group of tables in the annotation database downloads 
 directory. The naming conventions of the tables vary among releases. In earlier assemblies, table 
 names are of the form chr<em>N</em>_humMusL, chr<em>N</em>_zoom1_humMusL, and or 
 chr<em>N</em>_zoom2500_humMusL. In later releases, the tables are named using specific release 
 numbers, such as chr<em>N</em>_hg16Mm3. The tables within a given set differ by the number of 
 bases/score interval and are used to generate the browser displays at different zooming levels.</p>
 
 <a name="download20"></a>
 <h2>Minus strand coordinates - axtNet</h2>
 <h6>I downloaded the axtNet alignments between the latest human and mouse assemblies. I found 
 that some of the alignments listed in the axtNet files do not agree with what is shown in the
 browser.</h6>
 <p> 
 Is this alignment on the minus strand? Minus strand coordinates in axt files are handled differently
 from how they are handled in the Genome Browser. To convert axt minus strand coordinates to Genome 
 Browser coordinates, use:</p>
 <pre>start = chromSize + 1 - axtEnd
 end = chromSize + 1 - axtStart</pre>
 <p>
 See an explanation of coordinate transforms in the 
 <a href="http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms"
 target=blank>genomeWiki</a>.</p>
 
 <a name="download21"></a>
 <h2>Mapping UCSC STS marker IDs to those of other groups</h2>
 <h6>How do I map the STS genetic marker IDs in the genome browser to the IDs assigned by other 
 groups?</h6>
 <p> 
 We assign our own IDs to each of the STS markers, but we also track the UniSTS IDs for each marker 
 in the downloadable stsInfo2 table. To determine the location of a specific marker, look up the 
 marker's name in the stsAlias table to determine the UCSC ID assigned to the marker, and then use 
 this ID to look it up in the stsMap table where the marker is located. For example, D10S249 has 
 UCSC ID 2880 and is located at chr10:240791-241019.</p> 
 
 <a name="download22"></a>
 <h2>deCODE map data</h2>
 <h6>Where can I get more information about the deCODE map?</h6>
 <p> 
 You can obtain this information from the combination of a couple of tables. The stsMap table 
 contains the physical position of all STS markers, including those on the deCODE map. This file 
 also contains information about the position on the genome-wide maps, including the deCODE map. A 
 second file, stsInfo2, contains additional information about each marker, including aliases, primer 
 sequence information, etc. This table is related to the first table by an ID (the identNo field in 
 both files).</p>
 
 <a name="download29"></a>
 <h2>Direct MariaDB (MySQL) access to data</h2>
 <h6>Is it possible to run SQL queries directly on the database rather than using the Table 
 Browser interface?</h6>
 <p> 
 Yes. See our documentation on <a href="../goldenPath/help/mysql.html">Downloading Data using 
 MariaDB (MySQL)</a>.</p> 
 <p>
 Connect to the US MariaDB server using the command:</p>
 <pre><code>mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A </code></pre>
 <p>Or to the European MariaDB server using the command:</p>
 <pre><code>mysql --user=genome --host=genome-euro-mysql.soe.ucsc.edu -A </code></pre>
 
 <a name="download34"></a>
 <h2>Name of fourth column in BED output</h2>
 <h6>When using the Table Browser to extract exons from a Gene track, what does the &quot;Name&quot; 
 column (fourth BED column) refer to?</h6>
 <p> 
 The fourth column of the BED output contains a lot of information separated by underscores. For 
 example:</p>
 <pre><code>uc009vjk.2_cds_1_0_chr1_324343_f </code></pre>
 <p>
 This information is represented as follows:</p>
 <pre><code>ucscId_sequenceType_sequenceTypeNumber_basesAdded_chromosome_positionOfFirstBaseOfItem_strand</code></pre>
 <ul>
   <li>
   UCSC ID: our identification for the transcripts in the UCSC Genes track.</li>
   <li>
   Sequence Type: exons, introns, cds, utr5, etc.</li>
   <li>
   Sequence Type Number: for every transcript, there will be a row for each sequence type (cds 
   or intron) and this identifies which is represented in this row; the first is denoted with 0. 
   So, if you requested exons, and a particular transcript has 10 exons, you will see a row for each 
   one and in this position they will be numbered 0-9.</li>
   <li>
   Bases Added: number of bases added to the regions requested.</li>
   <li>
   Chromosome: chromosome number the item is on.</li>
   <li>
   Position of First Base of Item: if you have specified bases added to the requested features (for 
   example, Exons plus 10 bases on each end), then columns 2 and 3 of the output wouldn't be the 
   exact coordinates of the exon, they would start and end 10 bases before/after the exon. So, this 
   part of the information is an easy way to see where the actual feature starts as displayed in the 
   browser. It is "as displayed in the browser" because the coordinates in our tables almost always 
   have 0-based starts (as they do in columns 2 and 3 of this output) but display as 1-based in the 
   browser (for more info see this <a href=FAQtracks.html#tracks1>FAQ</a>), but this start position 
   listed in this section of the 4th column is actually 1 based. It will be the exact coordinate the 
   feature starts on as displayed in the browser.</li>
   <li>
   Strand: forward(f) or reverse(-) strand.</li>
 </ul>
 
 <a name="download36"></a>
 <h2>Track Data Access</h2>
 <h6>How do I access the data underlying a track?</h6> 
 <p>
 The raw data underlying a track can be explored interactively with the 
 <a href="../cgi-bin/hgTables">Table Browser</a>, <a href="../cgi-bin/hgIntegrator">Data 
 Integrator</a>, or <a href="../cgi-bin/hgVai">Variant Annotation Integrator</a>. For automated 
 analysis, the genome annotation can be downloaded from the 
 <a href="http://hgdownload.soe.ucsc.edu/">downloads server</a>, one of our two
 <a href="http://genome.ucsc.edu/goldenPath/help/mysql.html">public MariaDB servers</a>, or 
 using our <a href='../goldenPath/help/api.html' target=_blank>REST API</a>.</p>
 <p> 
 <strong>bigBed data:</strong> For <a href="FAQformat.html#format1.5">bigBed</a> files, individual 
 regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be 
 compiled from the source code or downloaded as a precompiled binary for your system. Instructions 
 for downloading source code and binaries can be found 
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool can 
 also be used to obtain only features within a given range using one of the hgdownload servers,
 example:</p> 
 <ul>
   <li>
     North American server:
     <pre><code>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> 
   </li>
   <li>
     European server:
     <pre><code>bigBedToBed http://hgdownload-euro.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> 
   </li>
 </ul>
 <p>
 Read more in <a href="http://genome.ucsc.edu/blog/"> our blog</a> about
 <a href="http://genome.ucsc.edu/blog/?s=programmatic">Accessing the Genome Browser Programmatically</a>
 to acquire data.
 </p>
 <p> 
 <a name="snp"></a>
 <h2>How do I download dbSNP data?</h2>
 <p>
 For versions dbSNP153 and above, the data is formatted in bigBed files. Previous versions are MySQL
 tables. For help with versions before dbSNP153, see <a href="#download29">accessing MySQL data</a>.
 This FAQ entry pertains to versions dbSNP153 and above.</p>
 <p>
 Since dbSNP has grown to include over 700 million variants, the size of the All dbSNP (153+)
 subtrack can cause the
 <a href="/cgi-bin/hgTables" target=_blank>Table Browser</a> and
 <a href="/cgi-bin/hgIntegrator" target=_blank>Data Integrator</a>
 to time out, leading to a blank page or truncated output,
 unless queries are restricted to a chromosomal region or
 to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser),
 or to one of the subset tracks such as Common or ClinVar.
 </p><p>
 For automated analysis, the track data files can be downloaded from the downloads server for
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> and
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a>. Below
 are specific examples for <b>dbSNP153</b>, however, the same methods and directories
 will work by substituting a more recent dbSNP release.
 <table class="descTbl">
   <tr>
     <th colspan=3>file</th>
     <th>format</th>
     <th>subtrack</th>
   </tr>
   <tr>
     <td>dbSnp153.bb</td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153.bb"
            target=_blank>hg19</a></td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb"
            target=_blank>hg38</a></td>
     <td>bigDbSnp (bigBed4+13)</td>
     <td>All dbSNP (153)</td>
   </tr>
   <tr>
     <td>dbSnp153ClinVar.bb</td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153ClinVar.bb"
            target=_blank>hg19</a></td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153ClinVar.bb"
            target=_blank>hg38</a></td>
     <td>bigDbSnp (bigBed4+13)</td>
     <td>ClinVar dbSNP (153)</td>
   </tr>
   <tr>
     <td>dbSnp153Common.bb</td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Common.bb"
            target=_blank>hg19</a></td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Common.bb"
            target=_blank>hg38</a></td>
     <td>bigDbSnp (bigBed4+13)</td>
     <td>Common dbSNP (153)</td>
   </tr>
   <tr>
     <td>dbSnp153Mult.bb</td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Mult.bb"
            target=_blank>hg19</a></td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Mult.bb"
            target=_blank>hg38</a></td>
     <td>bigDbSnp (bigBed4+13)</td>
     <td>Mult. dbSNP (153)</td>
   </tr>
   <tr>
     <td>dbSnp153BadCoords.bb</td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153BadCoords.bb"
            target=_blank>hg19</a></td>
     <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153BadCoords.bb"
            target=_blank>hg38</a></td>
     <td>bigBed4</td>
     <td>Map Err (153)</td>
   </tr>
   <tr>
     <td colspan=3>
       <a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz"
          target=_blank>dbSnp153Details.tab.gz</a>
     </td>
     <td>gzip-compressed tab-separated text</td>
     <td>Detailed variant properties, independent of genome assembly version</td>
   </tr>
 </table>
 </p>
 <p>
 Several utilities for working with bigBed-formatted binary files can be downloaded
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads"
    target=_blank>here</a>.
 Run a utility with no arguments in order to see a brief description of the utility and its options.
 <ul>
   <li><b>bigBedInfo</b> provides summary statistics about a bigBed file including the number of
     items in the file.  With the <b>-as</b> option, the output includes an
     autoSql
     definition of data columns, useful for interpreting the column values.</li>
   <li><b>bigBedToBed</b> converts the binary bigBed data to tab-separated text.
     Output can be restricted to a particular region by using the -chrom, -start
     and -end options.</li>
   <li><b>bigBedNamedItems</b> extracts rows for one or more rs# IDs.</li>
 </ul>
 </p>
 
 <p><b>Example:</b> retrieve all variants in the region chr1:200001-200400</p>
 <pre><tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre>
 <p><b>Example:</b> retrieve variant rs6657048</p>
 <pre><tt>bigBedNamedItems dbSnp153.bb rs6657048 stdout</tt></pre>
 <p><b>Example:</b> retrieve all variants with rs# IDs in file myIds.txt</p>
 <pre><tt>bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed</tt></pre>
 
 <p>
 The columns in the bigDbSnp/bigBed files and dbSnp153Details.tab.gz file are described in
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bigDbSnp.as"
    target=_blank>bigDbSnp.as</a> and
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/dbSnpDetails.as"
    target=_blank>dbSnpDetails.as</a> respectively.
 </p><p>
 UCSC has an
 <a href="/goldenPath/help/api.html"
    target=_blank>API</a>
 that can be used to retrieve values from a particular chromosome range.
 A list of rs# IDs can also be pasted/uploaded in the
 <a href="/cgi-bin/hgVai" target=_blank>Variant Annotation Integrator</a>
 tool in order to find out which genes (if any) the variants are located in,
 as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc.
 </p><p>
 See our searchable
 <A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/download+snps"
 target=_blank>mailing list archives</a>
 for more information and example queries. We also have information on
 <a href="http://genome.ucsc.edu/blog/">our blog</a> about
 <a href="http://genome.ucsc.edu/blog/?s=programmatic"> Accessing the Genome Browser Programmatically</a>
 to acquire data.
 </p>
 
 <a name="snpAlleles"></a>
 <h2>Why doesn't this SNP have two alleles?</h2>
 <p>
 When using the SNP tracks, some records may contain information about one or more alleles instead of
 the usual two alleles for the SNP. The following information should explain how this is
 possible.</p>
 <dl>
   <dt>One allele (i.e. reference only):</dt>
   <dd>
     The human genome reference has gone through many different assembly versions. The reference
     genome has always been a mosaic of sequences from multiple individuals, so it contains some
     rare or singleton mutations and is not entirely free of errors. Some SNPs were discovered on
     previous assembly versions, and the latest assembly version has the corrected or common allele,
     which turns out to be the only observed allele (so the SNP was an artifact of the reference
     assembly having a rare mutation or error in the past, not a real SNP).</dd>
   <dt>Three alleles:</dt>
   <dd>
     It's rare, but possible, for the same base to be mutated to different values in different
     people.</dd>
   <dt>Four alleles:</dt>
   <dd>
     This would be even rarer than three alleles. In the past, it has often been a symptom of strand
     errors, for example, the same variant is reported separately as A/G on the forward strand and
     C/T on the reverse strand, but then the strand information being lost in processing and the
     reports merged to A/C/G/T.</dd>
 </dl>
 
 <a name="download37"></a>
 <h2>Obtaining GTF (Gene Transfer Format)</h2>
 <h6>What is the best method for obtaining GTF output?</h6>
 <p>
 Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> option return data in
 <a href="../FAQ/FAQformat.html#format4">GTF format</a> is limited as explained below.
 To convert custom GenePred format data into GTF, the best method is to use the 
 command-line format conversion utility, <code>genePredToGtf</code>. This can optionally be set up 
 to automatically connect to the UCSC public SQL database and return GTF files in a few minutes using 
 <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format#Using_kent_commands_with_the_public_database_server">
 this short guide</a>.</p>
 <p>
 For simplicity, GTF files have been generated using the <code>genePredToGtf</code> method 
 described above and are available on our download server for the main gene transcript sets.
 These can be found at the following download server address:
 <i>http://hgdownload.soe.ucsc.edu/goldenPath/$db/bigZips/genes/</i> 
 where <i>$db</i> is the assembly of interest. For example, the <a target="_blank" 
 href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/">hg38 GTF files</a>.</p>
 <p>
 <p>Summary of Table Browser limitations:</p>
 <ul>
   <li>The Table Browser has transcript IDs only, so although it includes both &quot;gene_id&quot;
 and &quot;transcript_id&quot; fields in its output, the value for transcript ID (e.g., ENST#) is 
 used for both fields.</li>
   <li>The Table Browser adds start and stop codon annotations whether or not the transcript alignment 
 includes proper start and stop codons.</li>
   <li>Some tables in older genome assemblies are not supported.</li>
 </ul>
 <p>
 <a href="../FAQ/FAQformat#format9">GenePred</a> (short for Gene Predictions) is a table
 format commonly used for gene tracks in the UCSC Genome Browser where each transcript has a single
 row. Tables are not stored in GTF as it would require many rows to describe a single transcript
 since each gene feature (i.e., exon) requires a separate line. The <code>genePredToGtf</code> command-line
 utility can be used to convert genePred to GTF. Download the <code>genePredToGtf</code> operating 
 system-specific command-line utility from the
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">utilities directory</a>.</p>
 <p>
 Please see the <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format"> Genes in GTF
 or GFF Format wiki page</a> for examples and various methods for conversion. The <code>genePredToGtf</code>
 utility can convert files from several sources, such as Table Browser output from a genePred table,
 a local downloaded gene set table like refGene.txt, or from querying
 <a href="../goldenpath/help/mysql.html">public MariaDB tables.</a></p>
 
 <a name="download38"></a>
 <h2>Table Browser output file order</h2>
 <h6>My table browser output file is not ordered by position, how is it ordered?</h6>
 <p>
 Most of our tables have a special first column called "bin" that helps with quickly displaying data on 
 the Genome Browser. This (chrom,bin) index causes query results to be ordered first by bin, then by 
 chromStart. This allows us to query and return results more quickly than if they were sorted by chromStart.
 </p>
 <p>
 A quick way to sort an output BED file by position is to use the following UNIX command on our
 <a href="../cgi-bin/hgTables">Table Browser</a> output BED file:
 <pre><code>sort -k1,1 -k2n,2n example.bed > example.sorted.bed</code></pre>
 </p>
 
 <a name="download39"></a>
 <h2>'Permisssion denied' error when trying to use command-line utilities</h2>
 <h6>Why do I get a 'Permission denied' error when I try to run command-line utilities?</h6>
 <p>
 In order for your computer to run a freshly downloaded utility, you will need to update the file
 system permissions to allow your operating system to run the program.
 <br>
 To make utilities usable, turn on its 'executable' bit: 
        <pre> <code>$ chmod +x ./filePath </code></pre>
        <pre> <code>$ ./filePath/utility_name</code></pre>
 Example:
        <pre><code>$ chmod +x /home/user/liftover/liftOver</code></pre>
         See also: <a href="http://en.wikipedia.org/wiki/Chmod" target="_blank">http://en.wikipedia.org/wiki/Chmod</a>
 
 </p>
 
 <a name="download40"></a>
 <h2>Restricted Track Data</h2>
 <h6>Why can I not download some data in the Table Browser or find the download files?</h6>
 <p>
 Some data is provided by external groups and is not available for download or mirroring
 by any third party without the permission of the owners, such as the OMIM track data, which
 is the property of Johns Hopkins University. For some tools, such as attempting a getData fetch
 with our API of restricted tracks, a 403 'Forbidden' error will be returned. Please email our private internal
 <a href="mailto:&#103;&#101;&#110;&#111;me&#45;&#119;&#119;&#119;&#64;&#115;&#111;&#101;.uc&#115;&#99;.&#101;d&#117;"
 >&#103;&#101;&#110;&#111;me&#45;&#119;&#119;&#119;&#64;&#115;&#111;&#101;.uc&#115;&#99;.&#101;d&#117;</a>
 mailing list if you have any questions.
 </p>
 
 <a name="downloadAnalysis"></a>
 <h2>Analysis set</h2>
 <h6>Some genomes in the download server also reference an analysis set, what is the difference?</h6>
 <p>
 For certain genomes (GRCm38/mm10, GRCh37/hg19, GRCh38/hg38), NCBI provides an analysis set in 
 addition to the standard genome files. These are FASTA files with modified sequence identifiers 
 and index files convenient for analysis with Next Generation Sequencing tools. These files are 
 particularly helpful for NGS pipelines including variant calling and RNA-Seq analysis.</p>
 
 <p>
 Though not all analysis sets contain the same information, features include:</p>
 <ul>
 <li>Removal of alternate and fix sequences which can interfere with read alignment programs</li>
 <li>Hard masking of duplicate copies of the pseudo-autosomal regions (PARs) and centromeric 
 arrays<li>
 <li>Addition of &quot;decoy&quot; sequences</li>
 <li>Index files generated by BWA, Samtools, Bowtie and HISAT2</li></ul>
 
 <p>
 For more information on analysis sets, see the <a 
 href="https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#seqsforalign" target="_blank">NCBI 
 FAQ</a>. Information on what is contained in each specific assembly analysis set can be 
 found in the README by clicking the <strong>Genome sequence files</strong> link for the 
 assembly of interest in our 
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html">Downloads page</a>.
 </p>
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->