433c60e82625b83fbe7b40fff62a5f814d0488be braney Thu Sep 10 16:28:27 2020 -0700 add placeholder for bigLolly.html so it gets copied by the install script diff --git src/hg/htdocs/goldenPath/help/bigLolly.html src/hg/htdocs/goldenPath/help/bigLolly.html new file mode 100755 index 0000000..00ca533 --- /dev/null +++ src/hg/htdocs/goldenPath/help/bigLolly.html @@ -0,0 +1,214 @@ +<!DOCTYPE html> +<!--#set var="TITLE" value="Genome Browser bigChain Track Format" --> +<!--#set var="ROOT" value="../.." --> + +<!-- Relative paths to support mirror sites with non-standard GB docs install --> +<!--#include virtual="$ROOT/inc/gbPageStart.html" --> + +<h1>bigChain Track Format</h1> +<p> +The bigChain format describes a pairwise alignment that allow gaps in both sequences simultaneously, +just as <a href="chain.html">chain</a> files do; however, bigChain files are compressed and indexed +as bigBeds. Chain files are converted to bigChain files using the program <code>bedToBigBed</code>, +run with the <code>-as</code> option to pull in a special +<a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a> (<em>.as</em>) file +that defines the fields of the bigChain.</p> +<p> +The bigChain files are in an indexed binary format. The main advantage of this format is that only +those portions of the file needed to display a particular region are transferred to the Genome +Browser server. Because of this, bigChain files have considerably faster display performance than +regular chain files when working with large data sets. The bigChain file remains on your local +web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for +the currently displayed chromosomal position is locally cached as a "sparse file". If +you do not have access to a web-accessible server and need hosting space for your bigChain files, +please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help +documentation.</p> + +<a name=bigChain></a> +<h2>bigChain format definition</h2> +<p> +The following autoSql definition is used to specify bigChain pairwise alignment files. This +definition, contained in the file <a href="examples/bigChain.as"><em>bigChain.as</em></a>, will be +pulled in when the <code>bedToBigBed</code> utility is run with the <code>-as=bigChain.as</code> +option. +<!--Click this <a href="examples/bigChain.txt"><code>bed6+6</code></A> file for an example of +bigChain input. --> +</p> +<pre><code> table bigChain + "bigChain pairwise alignment" + ( + string chrom; "Reference sequence chromosome or scaffold" + uint chromStart; "Start position in chromosome" + uint chromEnd; "End position in chromosome" + string name; "Name or ID of item, ideally both human readable and unique" + uint score; "Score (0-1000)" + char[1] strand; "+ or - for strand" + uint tSize; "size of target sequence" + string qName; "name of query sequence" + uint qSize; "size of query sequence" + uint qStart; "start of alignment on query sequence" + uint qEnd; "end of alignment on query sequence" + uint chainScore; "score from chain" + )</code></pre> +<p> +Note that the <code>bedToBigBed</code> utility uses a substantial amount of memory: approximately +25% more RAM than the uncompressed BED input file.</p> +<p> + +<h2>Creating a bigChain track</h2> +<p> +To create a bigChain track, follow these steps:</p> +<p> +<strong>Step 1.</strong> +If you already have a chain file you would like to convert to a bigChain, skip to <em>Step 3</em>. +Otherwise download <a href="examples/chr22_KI270731v1_random.hg38.mm10.rbest.chain">this example +chain file</a> for the human GRCh38 (hg38) assembly.</p> +<p> +<strong>Step 2.</strong> +Download these autoSql files needed by <code>bedToBigBed</code>: +<em><a href="examples/bigChain.as">bigChain.as</a></em> and +<em><a href="examples/bigLink.as">bigLink.as</a></em>.</p> +<p> +<strong>Step 3.</strong> +Download the <code>bedToBigBed</code> and <code>hgLoadChain</code> programs from the UCSC +<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p> +<p> +<strong>Step 4.</strong> +Use the <code>fetchChromSizes</code> script from the +<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">same directory</a> to create a +<em>chrom.sizes</em> file for the UCSC database with which you are working (e.g., hg38). +Alternatively, you can download the +<em>chrom.sizes</em> file for any assembly hosted at UCSC from our +<a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> page (click on "Full +data set" for any assembly). For example, the <em>hg38.chrom.sizes</em> file for the hg38 +database is located at +<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes" +target="_blank">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</a>.</p> +<p> +<strong>Step 5.</strong> +Use the <code>hgLoadChain</code> utility to generate the <em>chain.tab</em> and <em>link.tab</em> +files needed to create the bigChain file:</p> +<pre><code>hgLoadChain -noBin -test hg38 bigChain chr22_KI2707731v1_random.hg38.mm10.rbest.chain</code></pre> +<p> +<strong>Step 6.</strong> +Create the bigChain file from your input chain file using a combination of <code>sed</code>, +<code>awk</code> and the <code>bedToBigBed</code> utility: +<pre><code>sed 's/.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > chr22_KI270731v1_random.hg38.mm10.rbest.bigChain +bedToBigBed -type=bed6+6 -as=bigChain.as -tab chr22_KI270731v1_random.hg38.mm10.rbest.bigChain hg38.chrom.sizes bigChain.bb</code></pre></p> +<p> +<strong>Step 7.</strong> +To display your date in the Genome Browser, you must also create a binary indexed link file to +accompany your bigChain file:</p> +<pre><code>awk 'BEGIN {OFS="\t"} {print $1, $2, $3, $5, $4}' link.tab | sort -k1,1 -k2,2n > bigChain.bigLink +bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.bigLink hg38.chrom.sizes bigChain.link.bb </code></pre> +<p> +<strong>Step 8.</strong> +Move the newly created bigChain (<em>bigChain.bb</em>) and bigLink (<em>bigChain.link.bb</em>) +files to a web-accessible http, https or ftp location.</p> +<p> +<strong>Step 9.</strong> +Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single +<a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed +<a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic +version of the track line will look something like this:</p> +<pre><code>track type=bigChain name="My Big Chain" bigDataUrl=http://myorg.edu/mylab/bigChain.bb linkDataUrl=http://myorg.edu/mylab/bigChain.link.bb </code></pre> +<p> +<strong>Step 10.</strong> +Paste the custom track line into the text box on the +<a href="../../cgi-bin/hgCustom">custom track management page</a>.</p> +<p> +The <code>bedToBigBed</code> program can be run with several additional options. For a full +list of the available options, type <code>bedToBigBed</code> (with no arguments) on the command line +to display the usage message. </p> + +<h2>Examples</h2> +<h3>Example #1</h3> +<p> +In this example, you will create a bigChain custom track using an existing bigChain file, +<em>bigChain.bb</em>, located on the UCSC Genome Browser http server. This file contains data for +the hg38 assembly.</p> +<p> +To create a custom track using this bigChain file: +<ol> + <li> + Construct a track line that references the file:</p> + <pre><code>track type=bigChain name="bigChain Example One" description="A bigChain file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb</code></pre></li> + <li> + Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management + page</a> for the human assembly hg38 (Dec. 2013).</li> + <li> + Click the "submit" button.</li> +</ol> +<p> +<!-- FIX ME --> +Custom tracks can also be loaded via one URL line. +<a href="http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigChain%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb%20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack" +target="_blank">This link</a> loads the same <em>bigChain.bb</em> track and sets additional display parameters in the URL:</p> +<pre><code>http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random &hgct_customText=track%20type=bigChain%20name=Example %20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb %20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack</a></code></pre> +<p> +After this example bigChain is loaded in the Genome Browser, click into a chain on the browser's +track display. Note that the details page displays information about the individual chains, similar +to that which is available for a standard chain track.</p> + +<h3>Example #2</h3> +<p> +In this example, you will create your own bigChain file from an existing chain input file.</p> +<ol> + <li> + Save <a href="examples/chr22_KI270731v1_random.hg38.mm10.rbest.chain">this chain file</a> to your + computer (<em>Step 1</em> in <em>Creating a bigChain track</em>, above).</li> + <li> + Save the autoSql files <a href="examples/bigChain.as"><em>bigChain.as</em></a> and + <a href="examples/bigLink.as"><em>bigLink.as</em></a> to your computer (<em>Step 2</em>, + above).</li> + <li> + Download the <code>bedToBigBed</code> and <code>hgLoadChain</code> + <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">utilities</a> (<em>Step 3</em>, above).</li> + <li> + Save the <a href="hg38.chrom.sizes"><em>hg38.chrom.sizes</em> text file</a> to your computer. This + file contains the chrom.sizes for the human hg38 assembly (<em>Step 4</em>, above).</li> + <li> + Run the utilities in <em>Steps 5-7</em>, above, to create the bigChain and bigLink output + files. </li> + <li> + Place the newly created bigChain (<em>bigChain.bb</em>) and and bigLink + (<em>bigChain.link.bb</em>) files on a web-accessible server (<em>Step 8</em>).</li> + <li> + Construct a track line that points to the bigChain file (<em>Step 9</em>, above).</li> + <li> + Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser + (<em>Step 10</em>, above).</li> +</ol> + +<h2>Sharing your data with others</h2> +<p> +If you would like to share your bigChain data track with a colleague, learn how to create a URL by +looking at Example 6 on <a href="customTrack.html#EXAMPLE6">this page</a>.</p> + +<h2>Extracting data from the bigChain format</h2> +<p> +Because the bigChain files are an extension of bigBed files, which are indexed binary files, it can +be difficult to extract data from them. UCSC has developed the following programs to assist +in working with bigBed formats, available from the +<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p> +<ul> + <li> + <code>bigBedToBed</code> — converts a bigBed file to ASCII BED format.</li> + <li> + <code>bigBedSummary</code> — extracts summary information from a bigBed file.</li> + <li> + <code>bigBedInfo</code> — prints out information about a bigBed file.</li> +</ul> +<p> +As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the +command line to view the usage statement.</p> + +<h2>Troubleshooting</h2> +<p> +If you encounter an error when you run the <code>bedToBigBed</code> program, check your input +file for data coordinates that extend past the the end of the chromosome. If these are present, run +the <code>bedClip</code> program +(<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">available here</a>) to remove the problematic +row(s) in your input file before running the <code>bedToBigBed</code> program.</p> + +<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->