bbabbd5d2566d47d923d51dbe350634783455999 mspeir Sun Oct 26 12:14:52 2025 -0700 change soe to gi, refs #35031 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index 054468a690e..1abecf9933f 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -62,68 +62,68 @@ <a href="index.html">Return to FAQ Table of Contents</a></p> <a name="download1"></a> <h2>Downloading sequence and annotation data</h2> <h6>How do I obtain the sequence and/or annotation data for a release?</h6> <p> Sequence and annotation data downloads are usually made available within the first week of the release of a new assembly. The download directories are automatically updated nightly to incorporate additions and modifications to the data.</p> <p> You can download sequence and annotation data <a href="../goldenPath/help/ftp.html">using our FTP server</a>, but we recommend using rsync, which has the advantage of starting up where it left off after a failure, when run again. Please see the previous link for examples.</p> <p> You can also download data from our -<a href="http://hgdownload.soe.ucsc.edu/downloads.html">Downloads</a> page or our +<a href="http://hgdownload.gi.ucsc.edu/downloads.html">Downloads</a> page or our <a href="../cgi-bin/das/dsn" target="_blank">DAS server</a>. To download a specific subset of the data or to configure the output format of the data, use the <a href="../cgi-bin/hgTables">Table Browser</a>. For information on extracting a large set of sequences from an assembly, see <a href="#download32">Extracting sequence in batch from an assembly</a>.</p> <p> For more information on using the UCSC DAS server, see <a href="#download23">Downloading data from the UCSC DAS server</a>.</p> <p> <p> Another option for querying sequence and annotation data is the <a href='../goldenPath/help/api.html' target=_blank>REST API</a>. This interface allows for extraction of sequence and annotations from both UCSC assemblies and from hubs.</p> <p> <strong>To quickly download large volumes of data you can use UDR (UDT Enabled Rysnc):</strong> UDR provides users much faster download rates. Here is an example using UDR, once installed, to download all the mouse mm9 ENCODE information that amounts to several terabytes:</p> -<pre><code>$ udr rsync -avP hgdownload.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/</code></pre> +<pre><code>$ udr rsync -avP hgdownload.gi.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/</code></pre> <pre><code>$ udr rsync -avP hgdownload-euro.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/</code></pre> <p> Optional: download from our secondary download server. <pre><code>$ udr rsync -avP hgdownload2.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/</code></pre> Please read more about the new UDR method <a href="../../goldenPath/newsarch.html#030315" target="_blank">here</a>.</p> <a name="download35"></a> <h2>Metadata tables for GenBank and RefSeq moved to hgFixed database</h2> <h6>I can no longer find metadata tables like gbCdnaInfo for an assembly.</h6> <p> As of June 2016, the location of metadata tables that support the GenBank and RefSeq tracks (RefSeq, Other RefSeq, mRNA, EST, etc.) have been moved from directories of individual assemblies to one global database, hgFixed.</p> <p> The tables below (previously found per assembly) can now be downloaded from the -<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hgFixed/database/">hgFixed database</a>:</p> +<a href="http://hgdownload.gi.ucsc.edu/goldenPath/hgFixed/database/">hgFixed database</a>:</p> <!-- 3-column layout --> <div class="row"> <!-- Left column --> <div class="col-md-3"> <ul> <li>author</li> <li>cds</li> <li>cell</li> <li>description</li> <li>development</li> <li>gbCdnaInfo</li> <li>gbExtFile</li> <li>gbLoaded</li> </ul> @@ -164,36 +164,36 @@ The <a href="../cgi-bin/hgTables" >Table Browser</a>, as connected tables and joined fields described when clicking the "data format description " button</li> <li> One of our two <a href="../goldenPath/help/mysql.html">public access MariaDB servers</a> in the US and Europe</li> </ul> <a name="download32"></a> <h2>Extracting sequence in batch from an assembly</h2> <h6>I have a lot of coordinates for an assembly and want to extract the corresponding sequences. What is the best way to proceed? </h6> <p> There are two ways to extract genomic sequence in batch from an assembly:</p> <p> A. Download the appropriate fasta files from our -<a href="ftp://hgdownload.soe.ucsc.edu/goldenPath/">ftp server</a> and extract sequence data using +<a href="ftp://hgdownload.gi.ucsc.edu/goldenPath/">ftp server</a> and extract sequence data using your own tools or the tools from our source tree. This is the recommended method when you have very large sequence datasets or will be extracting data frequently. Sequence data for most assemblies is located in the assembly's "chromosomes" subdirectory on the downloads server. For example, the sequence for human assembly hg17 can be found in -<a href="ftp://hgdownload.soe.ucsc.edu/goldenPath/hg17/chromosomes/">ftp://hgdownload.soe.ucsc.edu/goldenPath/hg17/chromosomes/</a>. +<a href="ftp://hgdownload.gi.ucsc.edu/goldenPath/hg17/chromosomes/">ftp://hgdownload.gi.ucsc.edu/goldenPath/hg17/chromosomes/</a>. You'll find instructions for obtaining our source programs and utilities <a href="FAQlicense.html#license3">here</a>. Some programs that you may find useful are nibFrag and twoBitToFa, as well as other fa* programs. To obtain usage information about most programs, execute it without arguments.</p> <p> B. Use the Table browser to extract sequence. This is a convenient way to obtain small amounts of sequence.</p> <ol> <li> Create a <a href="../goldenPath/help/hgTracksHelp.html#CustomTracks">custom track</a> of the genomic coordinates in <a href="FAQformat.html#format1">BED format</a> and upload into the Genome Browser. </li> <li> Select the custom track in the Table browser, then select the "sequence" output format to retrieve data. We recommend that you save the file locally as gzip. </li> @@ -650,85 +650,85 @@ reverse, and cross-species conversions, but does not accept batch input. The <a href="../cgi-bin/hgLiftOver">LiftOver</a> tool, accessed via the Tools link on the Genome Browser home page, also supports forward, reverse, and cross-species conversions, as well as batch conversions.</p> <p> <b>Note:</b> It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following <a href="FAQreleases.html#snpConversion">FAQ entry</a>.</p> <p> If you wish to update a large number of coordinates to a different assembly and have access to a Linux platform, you may find it useful to try the command-line version of the LiftOver tool. The executable file for this utility can be downloaded <a href="https://genome-store.ucsc.edu" target="_blank">here</a>. LiftOver requires a pre-generated <em>over.chain</em> file as input, available for selected assemblies from the -<a href="http://hgdownload.soe.ucsc.edu/downloads.html#liftover">Downloads</a> page. If the desired +<a href="http://hgdownload.gi.ucsc.edu/downloads.html#liftover">Downloads</a> page. If the desired file is not available, send a request to the <a href="../contacts.html">genome mailing list</a> and we may be able to provide you with one.</p> <h3 id="liftOver">Using liftOver</h3> <p> Here is an example on how to set up and run LiftOver from the command line:</p> <ol> <li>Download the LiftOver program for your computer's operating system <a href="https://genome-store.ucsc.edu" target="_blank">here</a> <li>Change permissions on that file so that it can be executed <pre> chmod +x liftOver</pre></li> <li>Run the program with no arguments to see the usage statement <pre> ./liftOver</pre> <pre> liftOver - Move annotations from one assembly to another usage: liftOver oldFile map.chain newFile unMapped ...</pre></li> <li>Download your genome conversion chain file from the - <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads directory</a>. + <a href="http://hgdownload.gi.ucsc.edu/downloads.html">downloads directory</a>. For example, the human to mouse conversion (hg38ToMm10) can be downloaded like so: - <pre>wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToMm10.over.chain.gz</pre> + <pre>wget http://hgdownload.gi.ucsc.edu/goldenPath/hg38/liftOver/hg38ToMm10.over.chain.gz</pre> </li> <li>Prepare your BED file input. Here is a few lines from a BED file you can copy into a text file, saved as "preLift.bed". <pre> chr1 11166587 11191615 MTOR chr9 136130562 136150630 ABO chr12 25358179 25403854 KRAS chrX 151335633 151619831 GABRA3</pre></li> <li>You can now use the following command to LiftOver a BED file with annotations in your original genome, "preLift.bed", with your successful conversions in "conversions.bed" and unsuccessful conversions in "unMapped". <pre> ./liftOver preLift.bed hg19ToHg38.over.chain.gz conversions.bed unMapped</pre></li> </ol> <a name="download33"></a> <h2>Linking gene name with accession number</h2> <h6>I have the accession number for a gene and would like to link it to the gene name. Is there a table that shows both pieces of information?</h6> <p> If you are looking at the RefSeq Genes, the <em>refFlat</em> table contains both the gene name (usually a HUGO Gene Nomenclature Committee ID) and its accession number. For the Known Genes, use the <em>kgAlias</em> table.</p> <a name="download31"></a> <h2>Obtaining a list of Known Genes</h2> <h6>How can I obtain a complete list of all the genes in the UCSC Known Genes table for a particular organism?</h6> <p> To obtain a complete copy of the entire Known Genes data set for an organism, open the Genome -Browser <a href="http://hgdownload.soe.ucsc.edu/downloads.html">Downloads page</a>, jump to the +Browser <a href="http://hgdownload.gi.ucsc.edu/downloads.html">Downloads page</a>, jump to the section specific to the organism, click the Annotation database link in that section, then click the link for the <em>knownGene.txt.gz</em> table.</p> <p> Data for a specific region or chromosome may be obtained from the Table Browser by selecting the "Genes and Gene Prediction Tracks" group, the "UCSC Genes" track and the "knownGene" table. Set the position to the region of interest, then click the "get output" button.</p> <a name="JASPARfilter"></a> <h2>Filtering for a transcription factor in the JASPAR database</h2> <h6>How do I display only one transcription factor?</h6> <div class="container-fluid"> <div class="row"> <div class="col-md-4 text-left"> <div class="row"> @@ -784,31 +784,31 @@ </div> </div> </div> <div class="col-md-6 text-center image-column"> <img src="/images/jasparTB.png" height="475" alt="JASPAR Table Browser"> </div> </div> </div> <a name="download16"></a> <h2>Repeat-masking data</h2> <h6>What version of RepeatMasker do you use on your data? Which flags do you use?</h6> <p> UCSC uses the latest versions of RepeatMasker and repeat libraries available on the date when the assembly data is processed. RepeatMasker version information can usually be found in the README text -for the assembly's bigZips <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> +for the assembly's bigZips <a href="http://hgdownload.gi.ucsc.edu/downloads.html">downloads</a> directory.</p> <p> Masking is done using the RepeatMasker <em>-s</em> flag. For mouse repeats, we also use <em>-m</em>. In addition to RepeatMasker, we use the Tandem Repeat Finder (trf) program, masking out repeats of period 12 or less. The repeats are just "soft" masked. Alignments are allowed to extend through repeats, but not initiate in them.</p> <a name="download17"></a> <h2>Availability of repeat-masked data</h2> <h6>Are the repeat annotation files available for every chromosome?</h6> <p> Yes, you can obtain the repeat-masked files via the Table Browser or from the organism's annotation database downloads directory. The RepeatMasker annotation tables are named chr<em>N</em>_rmsk (where <em>N</em> represents the chromosome number) and the Tandem Repeat Finder (TRF) tables are named simpleRepeat.</p> @@ -817,31 +817,31 @@ <h2>RepeatMasker version differences - UCSC vs. RepeatMasker website</h2> <h6>When I run RepeatMasker independently from the RepeatMasker web server, my results vary from those of UCSC. What's the cause?</h6> <p> UCSC occasionally uses updated versions of the RepeatMasker software and repeat libraries that are not yet available on the RepeatMasker website (see <a href="#download16">Repeat-masking data</a> for more information).</p> <a name="download18"></a> <h2>Obtaining promoter sequence</h2> <h6>How can I fetch promoter sequence upstream of a gene?</h6> <p> The UCSC Genome Browser offers several ways to obtain this information, depending on your requirements.</p> <p> -The Genome Browser <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads site</a> +The Genome Browser <a href="http://hgdownload.gi.ucsc.edu/downloads.html">downloads site</a> provides prepackaged downloads of 1000 bp, 2000 bp, and 5000 bp upstream sequence for RefSeq genes that have a coding portion and annotated 5' and 3' UTRs. You can obtain these from the bigZips downloads directory for the assembly of interest.</p> <p> To fetch the upstream sequence for a specific gene, use the <a href="../cgi-bin/hgTables">Table Browser</a>. Enter the genome, assembly, and select the knownGene table. Paste the gene name or accession number in the identifier field. Choose sequence for the output format type, then click the get output button. On the next page, select genomic. On the final page, you will have the opportunity to configure the amount of upstream promoter sequence to fetch, along with several other options. Click Get Sequence when you've finished configuring the output.</p> <p> You can also use the Genome Browser to obtain sequence for a specific gene. Open the Genome Browser window to display the gene in which you're interested. Click the entry for the gene in the RefSeq or Known Genes track, then click the Genomic Sequence link. Alternatively, you can click the DNA link in the top menu bar of the Genome Browser tracks window to access options for displaying the @@ -898,31 +898,31 @@ contains the physical position of all STS markers, including those on the deCODE map. This file also contains information about the position on the genome-wide maps, including the deCODE map. A second file, stsInfo2, contains additional information about each marker, including aliases, primer sequence information, etc. This table is related to the first table by an ID (the identNo field in both files).</p> <a name="download29"></a> <h2>Direct MariaDB (MySQL) access to data</h2> <h6>Is it possible to run SQL queries directly on the database rather than using the Table Browser interface?</h6> <p> Yes. See our documentation on <a href="../goldenPath/help/mysql.html">Downloading Data using MariaDB (MySQL)</a>.</p> <p> Connect to the US MariaDB server using the command:</p> -<pre><code>mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A </code></pre> +<pre><code>mysql --user=genome --host=genome-mysql.gi.ucsc.edu -A </code></pre> <p>Or to the European MariaDB server using the command:</p> <pre><code>mysql --user=genome --host=genome-euro-mysql.soe.ucsc.edu -A </code></pre> <a name="download34"></a> <h2>Name of fourth column in BED output</h2> <h6>When using the Table Browser to extract exons from a Gene track, what does the "Name" column (fourth BED column) refer to?</h6> <p> The fourth column of the BED output contains a lot of information separated by underscores. For example:</p> <pre><code>uc009vjk.2_cds_1_0_chr1_324343_f </code></pre> <p> This information is represented as follows:</p> <pre><code>ucscId_sequenceType_sequenceTypeNumber_basesAdded_chromosome_positionOfFirstBaseOfItem_strand</code></pre> <ul> @@ -949,158 +949,158 @@ browser (for more info see this <a href=FAQtracks.html#tracks1>FAQ</a>), but this start position listed in this section of the 4th column is actually 1 based. It will be the exact coordinate the feature starts on as displayed in the browser.</li> <li> Strand: forward(f) or reverse(-) strand.</li> </ul> <a name="download36"></a> <h2>Track Data Access</h2> <h6>How do I access the data underlying a track?</h6> <p> The raw data underlying a track can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a>, <a href="../cgi-bin/hgIntegrator">Data Integrator</a>, or <a href="../cgi-bin/hgVai">Variant Annotation Integrator</a>. For automated analysis, the genome annotation can be downloaded from the -<a href="http://hgdownload.soe.ucsc.edu/">downloads server</a>, one of our two +<a href="http://hgdownload.gi.ucsc.edu/">downloads server</a>, one of our two <a href="http://genome.ucsc.edu/goldenPath/help/mysql.html">public MariaDB servers</a>, or using our <a href='../goldenPath/help/api.html' target=_blank>REST API</a>.</p> <p> <strong>bigBed data:</strong> For <a href="FAQformat.html#format1.5">bigBed</a> files, individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found -<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool can +<a href="http://hgdownload.gi.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool can also be used to obtain only features within a given range using one of the hgdownload servers, example:</p> <ul> <li> North American server: - <pre><code>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> + <pre><code>bigBedToBed http://hgdownload.gi.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> </li> <li> European server: <pre><code>bigBedToBed http://hgdownload-euro.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout </code></pre> </li> </ul> <p> Read more in <a href="https://genome-blog.gi.ucsc.edu/blog/"> our blog</a> about <a href="https://genome-blog.gi.ucsc.edu/blog/?s=programmatic">Accessing the Genome Browser Programmatically</a> to acquire data. </p> <p> <a name="snp"></a> <h2>How do I download dbSNP data?</h2> <p> For versions dbSNP153 and above, the data is formatted in bigBed files. Previous versions are MySQL tables. For help with versions before dbSNP153, see <a href="#download29">accessing MySQL data</a>. This FAQ entry pertains to versions dbSNP153 and above.</p> <p> Since dbSNP has grown to include over 700 million variants, the size of the All dbSNP (153+) subtrack can cause the <a href="/cgi-bin/hgTables" target=_blank>Table Browser</a> and <a href="/cgi-bin/hgIntegrator" target=_blank>Data Integrator</a> to time out, leading to a blank page or truncated output, unless queries are restricted to a chromosomal region or to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser), or to one of the subset tracks such as Common or ClinVar. </p><p> For automated analysis, the track data files can be downloaded from the downloads server for -<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> and -<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a>. Below +<a href="http://hgdownload.gi.ucsc.edu/gbdb/hg19/snp/" target=_blank>hg19</a> and +<a href="http://hgdownload.gi.ucsc.edu/gbdb/hg38/snp/" target=_blank>hg38</a>. Below are specific examples for <b>dbSNP153</b>, however, the same methods and directories will work by substituting a more recent dbSNP release. <table class="descTbl"> <tr> <th colspan=3>file</th> <th>format</th> <th>subtrack</th> </tr> <tr> <td>dbSnp153.bb</td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg19/snp/dbSnp153.bb" target=_blank>hg19</a></td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb" target=_blank>hg38</a></td> <td>bigDbSnp (bigBed4+13)</td> <td>All dbSNP (153)</td> </tr> <tr> <td>dbSnp153ClinVar.bb</td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153ClinVar.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg19/snp/dbSnp153ClinVar.bb" target=_blank>hg19</a></td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153ClinVar.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg38/snp/dbSnp153ClinVar.bb" target=_blank>hg38</a></td> <td>bigDbSnp (bigBed4+13)</td> <td>ClinVar dbSNP (153)</td> </tr> <tr> <td>dbSnp153Common.bb</td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Common.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg19/snp/dbSnp153Common.bb" target=_blank>hg19</a></td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Common.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg38/snp/dbSnp153Common.bb" target=_blank>hg38</a></td> <td>bigDbSnp (bigBed4+13)</td> <td>Common dbSNP (153)</td> </tr> <tr> <td>dbSnp153Mult.bb</td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153Mult.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg19/snp/dbSnp153Mult.bb" target=_blank>hg19</a></td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153Mult.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg38/snp/dbSnp153Mult.bb" target=_blank>hg38</a></td> <td>bigDbSnp (bigBed4+13)</td> <td>Mult. dbSNP (153)</td> </tr> <tr> <td>dbSnp153BadCoords.bb</td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp153BadCoords.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg19/snp/dbSnp153BadCoords.bb" target=_blank>hg19</a></td> - <td><a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153BadCoords.bb" + <td><a href="http://hgdownload.gi.ucsc.edu/gbdb/hg38/snp/dbSnp153BadCoords.bb" target=_blank>hg38</a></td> <td>bigBed4</td> <td>Map Err (153)</td> </tr> <tr> <td colspan=3> - <a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz" + <a href="http://hgdownload.gi.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz" target=_blank>dbSnp153Details.tab.gz</a> </td> <td>gzip-compressed tab-separated text</td> <td>Detailed variant properties, independent of genome assembly version</td> </tr> </table> </p> <p> Several utilities for working with bigBed-formatted binary files can be downloaded -<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads" +<a href="http://hgdownload.gi.ucsc.edu/downloads.html#utilities_downloads" target=_blank>here</a>. Run a utility with no arguments in order to see a brief description of the utility and its options. <ul> <li><b>bigBedInfo</b> provides summary statistics about a bigBed file including the number of items in the file. With the <b>-as</b> option, the output includes an autoSql definition of data columns, useful for interpreting the column values.</li> <li><b>bigBedToBed</b> converts the binary bigBed data to tab-separated text. Output can be restricted to a particular region by using the -chrom, -start and -end options.</li> <li><b>bigBedNamedItems</b> extracts rows for one or more rs# IDs.</li> </ul> </p> <p><b>Example:</b> retrieve all variants in the region chr1:200001-200400</p> -<pre><tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre> +<pre><tt>bigBedToBed http://hgdownload.gi.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre> <p><b>Example:</b> retrieve variant rs6657048</p> <pre><tt>bigBedNamedItems dbSnp153.bb rs6657048 stdout</tt></pre> <p><b>Example:</b> retrieve all variants with rs# IDs in file myIds.txt</p> <pre><tt>bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed</tt></pre> <p> The columns in the bigDbSnp/bigBed files and dbSnp153Details.tab.gz file are described in <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bigDbSnp.as" target=_blank>bigDbSnp.as</a> and <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/dbSnpDetails.as" target=_blank>dbSnpDetails.as</a> respectively. </p><p> UCSC has an <a href="/goldenPath/help/api.html" target=_blank>API</a> @@ -1149,51 +1149,51 @@ <a name="download37"></a> <h2>Obtaining GTF (Gene Transfer Format)</h2> <h6>What is the best method for obtaining GTF output?</h6> <p> Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> option return data in <a href="../FAQ/FAQformat.html#format4">GTF format</a> is limited as explained below. To convert custom GenePred format data into GTF, the best method is to use the command-line format conversion utility, <code>genePredToGtf</code>. This can optionally be set up to automatically connect to the UCSC public SQL database and return GTF files in a few minutes using <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format#Using_kent_commands_with_the_public_database_server"> this short guide</a>.</p> <p> For simplicity, GTF files have been generated using the <code>genePredToGtf</code> method described above and are available on our download server for the main gene transcript sets. These can be found at the following download server address: -<i>http://hgdownload.soe.ucsc.edu/goldenPath/$db/bigZips/genes/</i> +<i>http://hgdownload.gi.ucsc.edu/goldenPath/$db/bigZips/genes/</i> where <i>$db</i> is the assembly of interest. For example, the <a target="_blank" -href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/">hg38 GTF files</a>.</p> +href="http://hgdownload.gi.ucsc.edu/goldenPath/hg38/bigZips/genes/">hg38 GTF files</a>.</p> <p> <p>Summary of Table Browser limitations:</p> <ul> <li>The Table Browser has transcript IDs only, so although it includes both "gene_id" and "transcript_id" fields in its output, the value for transcript ID (e.g., ENST#) is used for both fields.</li> <li>The Table Browser adds start and stop codon annotations whether or not the transcript alignment includes proper start and stop codons.</li> <li>Some tables in older genome assemblies are not supported.</li> </ul> <p> <a href="../FAQ/FAQformat#format9">GenePred</a> (short for Gene Predictions) is a table format commonly used for gene tracks in the UCSC Genome Browser where each transcript has a single row. Tables are not stored in GTF as it would require many rows to describe a single transcript since each gene feature (i.e., exon) requires a separate line. The <code>genePredToGtf</code> command-line utility can be used to convert genePred to GTF. Download the <code>genePredToGtf</code> operating system-specific command-line utility from the -<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">utilities directory</a>.</p> +<a href="http://hgdownload.gi.ucsc.edu/admin/exe/">utilities directory</a>.</p> <p> Please see the <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format"> Genes in GTF or GFF Format wiki page</a> for examples and various methods for conversion. The <code>genePredToGtf</code> utility can convert files from several sources, such as Table Browser output from a genePred table, a local downloaded gene set table like refGene.txt, or from querying <a href="../goldenpath/help/mysql.html">public MariaDB tables.</a></p> <a name="download38"></a> <h2>Table Browser output file order</h2> <h6>My table browser output file is not ordered by position, how is it ordered?</h6> <p> Most of our tables have a special first column called "bin" that helps with quickly displaying data on the Genome Browser. This (chrom,bin) index causes query results to be ordered first by bin, then by chromStart. This allows us to query and return results more quickly than if they were sorted by chromStart. </p> @@ -1244,58 +1244,58 @@ <p> Though not all analysis sets contain the same information, features include:</p> <ul> <li>Removal of alternate and fix sequences which can interfere with read alignment programs</li> <li>Hard masking of duplicate copies of the pseudo-autosomal regions (PARs) and centromeric arrays<li> <li>Addition of "decoy" sequences</li> <li>Index files generated by BWA, Samtools, Bowtie and HISAT2</li></ul> <p> For more information on analysis sets, see the <a href="https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#seqsforalign" target="_blank">NCBI FAQ</a>. Information on what is contained in each specific assembly analysis set can be found in the README by clicking the <strong>Genome sequence files</strong> link for the assembly of interest in our -<a href="http://hgdownload.soe.ucsc.edu/downloads.html">Downloads page</a>. +<a href="http://hgdownload.gi.ucsc.edu/downloads.html">Downloads page</a>. </p> <a name="downloadGenArk"></a> <h2>GenArk Downloads</h2> <h6>How do I download GenArk assembly hub data for my species?</h6> <p> For 2000+ GenArk genomes, we visualize them in assembly hubs instead of native assemblies like hg38 and mm39. These Genome Browsers can be accessed from our <a href="../cgi-bin/hgGateway">Genomes page</a> by searching common name or GCA/GCF number. You can also access the browsers for these species directly with links in the following format:</p> <pre><a href="https://genome.ucsc.edu/h/GCF_000951035.1">https://genome.ucsc.edu/h/GCF_000951035.1</a></pre> <p> The downloads data for these assemblies is stored in a different location than our goldenPath, SQL, or gbdb file directories. There are two ways to access this data for download. First, you can go to the -<a href="https://hgdownload.soe.ucsc.edu/hubs">GenArk page</a> +<a href="https://hgdownload.gi.ucsc.edu/hubs">GenArk page</a> and select your clade (primates, mammals, birds, etc.) and then you will be brought to a page with a table of species and GCA/GCF assembly identifiers. Find your genome and click on the third column, labeled "Scientific name and data download", which will take you to the download directory for that species. </p><p> Alternatively, you can enter your GCA/GCF identifier in the URL in groups of three characters, seperated by slashes. For example, the identifier "GCA_004027835.1" has data in the following directory: -<pre>https://hgdownload.soe.ucsc.edu/hubs/GCA/004/027/835/</pre> +<pre>https://hgdownload.gi.ucsc.edu/hubs/GCA/004/027/835/</pre> </p> <a name="downloadConservation"></a> <h2>Conservation scores downloads</h2> <h6>Why are the conservation scores on the UCSC Genome Browser site different from the ones in the download file?</h6> <p> The difference in the conservation scores, for both PhastCons and PhyloP, is that the wiggle database format (from which the details page and Table Browser scores are extracted) uses lossy compression that keeps enough resolution to display the pixelated scores in the browser graphic display but does not reconstruct the true original scores. This is why we make the original score files available for download. </p> <a name="CAPTCHA"></a>