02fe6028996477c06ad88319e254475bd88127fe brianlee Thu May 19 11:54:33 2022 -0700 Adding examples on twoBit.html page about extracting squences refs MLQ #29456 diff --git src/hg/htdocs/goldenPath/help/twoBit.html src/hg/htdocs/goldenPath/help/twoBit.html index 4713e83..fe0c6a4 100755 --- src/hg/htdocs/goldenPath/help/twoBit.html +++ src/hg/htdocs/goldenPath/help/twoBit.html @@ -1,40 +1,68 @@ <!DOCTYPE html> <!--#set var="TITLE" value="Genome Browser TwoBit Sequence" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>TwoBit Sequence Archives</h1> <p> A twoBit file is a highly efficient way to store genomic sequence. The format is defined <a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format7">here</a>. Note that lower-case nucleotides are considered masked in twoBit, which can cause such sequence to be ignored when using the <code>-mask</code> option with <code>gfServer</code>; therefore, you may wish to convert lower-case sequence to upper-case when preparing the FASTA format.</p> <p> To complete the steps below you must first download the <code>faToTwoBit</code>, <code>twoBitInfo</code>, and <code>twoBitToFa</code> utilities. For more information on downloading our command-line utilities, see these <A href="http://hgdownload.soe.ucsc.edu/downloads.html#source_downloads">instructions</a>.</p> <p> To create a twoBit file, follow these steps: <ol> <li> Prepare the sequence for your twoBit file in a FASTA-formatted file (i.e. genome.fa).</li> <li> Run the <code>faToTwoBit</code> program on your FASTA file: <pre><code> faToTwoBit genome.fa genome.2bit</code></pre></li> <li> Use <code>twoBitInfo</code> to verify the sequences in this assembly and create a chrom.sizes file, which is useful to construct the big* files in later processing steps:<br> <pre><code> twoBitInfo genome.2bit stdout | sort -k2rn > genome.chrom.sizes</code></pre></li> </ol> <p> The twoBit commands can function with the .2bit file as a URL: <pre><code> twoBitInfo -udcDir=. http://your-website.edu/~user/genome.2bit | sort -k2nr > genome.chrom.sizes</code></pre></p> <p> Sequence can be extracted from the .2bit file with the <code>twoBitToFa</code> command, for example: <pre><code> twoBitToFa -seq=chr1 -udcDir=. http://your-website.edu/~user/genome.2bit stdout > genome.chr1.fa</code></pre></p> +<h3 id=extract>Examples of extracting sequences</h3> +<p> +See these series of of blog posts about <a href="http://genome.ucsc.edu/blog/?s=programmatic" +target="_blank">Accessing the Genome Browser Programmatically</a> to see examples of extracting +sequences remotely, such as the following: +<pre> +$ twoBitToFa http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit:chr1:100100-100200 stdout +>chr1:100100-100200 +gcctagtacagactctccctgcagatgaaattatatgggatgctaaatta +taatgagaacaatgtttggtgagccaaaactacaacaagggaagctaatt +</pre></p> +<p> +Also, see the <a href="api.html#getData_examples" +target="_blank">API getData functions</a> to see examples of using the URL, such as the following:</p> +<p> +<a href="https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chr1;start=100100;end=100200" +target="_blank">https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chr1;start=100100;end=100200</a> +<pre> + downloadTime: "2022:05:19T18:45:56Z" + downloadTimeStamp: 1652985956 + genome: "hg38" + chrom: "chr1" + start: 100100 + end: 100200 + dna: "gcctagtacagactctccctgcagatgaaattatatgggatgctaaattataatgagaacaatgtttggtgagccaaaactacaacaagggaagctaatt" +</pre></p> + + <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->