src/hg/htdocs/goldenPath/help/bigChain.html bbf9b5788beae5c61fe8a30bc824f4b8985927e9

bbf9b5788beae5c61fe8a30bc824f4b8985927e9
mspeir
  Wed Nov 20 10:51:05 2024 -0800
changing bigChain steps to use chainToBigChain rather than hgLoadChain, refs #33972

diff --git src/hg/htdocs/goldenPath/help/bigChain.html src/hg/htdocs/goldenPath/help/bigChain.html
index a483962..21b28bc 100755
--- src/hg/htdocs/goldenPath/help/bigChain.html
+++ src/hg/htdocs/goldenPath/help/bigChain.html
@@ -1,220 +1,214 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser bigChain Track Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>bigChain Track Format</h1>
 <p>
 The bigChain format describes a pairwise alignment that allow gaps in both sequences simultaneously,
 just as <a href="chain.html">chain</a> files do; however, bigChain files are compressed and indexed 
 as <a href="/goldenPath/help/bigBed.html">bigBeds</a>. Chain files are converted to bigChain files
 using the program <code>bedToBigBed</code>, run with the <code>-as</code> option to pull in a
 special <a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a>
 (<em>.as</em>) file that defines the fields of the bigChain.</p>
 <p>
 The bigChain files are in an indexed binary format. The main advantage of this format is that only 
 those portions of the file needed to display a particular region are transferred to the Genome 
 Browser server. Because of this, bigChain files have considerably faster display performance than 
 regular chain files when working with large data sets. The bigChain file remains on your local 
 web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for 
 the currently displayed chromosomal position is locally cached as a &quot;sparse file&quot;. If 
 you do not have access to a web-accessible server and need hosting space for your bigChain files, 
 please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help
 documentation.</p>
 
 <a name=bigChain></a>
 <h2>bigChain format definition</h2>
 <p>
 The following autoSql definition is used to specify bigChain pairwise alignment files. This
 definition, contained in the file <a href="examples/bigChain.as"><em>bigChain.as</em></a>, will be 
 pulled in when the <code>bedToBigBed</code> utility is run with the <code>-as=bigChain.as</code> 
 option. 
 <!--Click this <a href="examples/bigChain.txt"><code>bed6+6</code></A> file for an example of 
 bigChain input. -->
 </p>
 <pre><code>    table bigChain
     "bigChain pairwise alignment"
         (
         string chrom;       "Reference sequence chromosome or scaffold"
         uint   chromStart;  "Start position in chromosome"
         uint   chromEnd;    "End position in chromosome"
         string name;        "Name or ID of item, ideally both human readable and unique"
         uint score;         "Score (0-1000)"
         char[1] strand;     "+ or - for strand"
         uint tSize;         "size of target sequence"
         string qName;       "name of query sequence"
         uint qSize;         "size of query sequence"
         uint qStart;        "start of alignment on query sequence"
         uint qEnd;          "end of alignment on query sequence"
         double chainScore;  "score from chain"
         )</code></pre>
 <p>
 Note that the <code>bedToBigBed</code> utility uses a substantial amount of memory: approximately 
 25% more RAM than the uncompressed BED input file.</p>
 <p>
 
 <h2>Creating a bigChain track</h2>
 <p>
 To create a bigChain track, follow these steps:</p>
 <p>
 <strong>Step 1.</strong> 
 If you already have a chain file you would like to convert to a bigChain, skip to <em>Step 3</em>.
 Otherwise download <a href="examples/chr22_KI270731v1_random.hg38.mm10.rbest.chain">this example 
 chain file</a> for the human GRCh38 (hg38) assembly.</p>
 <p>
 <strong>Step 2.</strong> 
 Download these autoSql files needed by <code>bedToBigBed</code>: 
 <em><a href="examples/bigChain.as">bigChain.as</a></em> and 
 <em><a href="examples/bigLink.as">bigLink.as</a></em>.</p>
 <p>
 <strong>Step 3.</strong> 
-Download the <code>bedToBigBed</code> and <code>hgLoadChain</code> programs from the UCSC
+Download the <code>bedToBigBed</code> and <code>chainToBigChain</code> programs from the UCSC
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p>
 <p>
 <strong>Step 4.</strong> 
 Use the <code>fetchChromSizes</code> script from the 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">same directory</a> to create a 
 <em>chrom.sizes</em> file for the UCSC database with which you are working (e.g., hg38). 
 Alternatively, you can download the 
 <em>chrom.sizes</em> file for any assembly hosted at UCSC from our 
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> page (click on &quot;Full 
 data set&quot; for any assembly). For example, the <em>hg38.chrom.sizes</em> file for the hg38 
 database is located at 
 <a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes" 
 target="_blank">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</a>.</p>
 <p>
 Here are wget commands to obtain these above files.
 <pre><code>wget https://genome.ucsc.edu/goldenPath/help/examples/bigChain.as
 wget https://genome.ucsc.edu/goldenPath/help/examples/bigLink.as
 wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
 wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.hg38.mm10.rbest.chain
 </code></pre></p>
+<p>
 <strong>Step 5.</strong>
-Use the <code>hgLoadChain</code> utility to generate the <em>chain.tab</em> and <em>link.tab</em> 
-files needed to create the bigChain file:</p> 
-<pre><code>hgLoadChain -noBin -test hg38 bigChain chr22_KI270731v1_random.hg38.mm10.rbest.chain</code></pre>
+Use the <code>chainToBigChain</code> utility to generate the pre-bigChain and pre-bigLink files:</p>
+<pre><code>chainToBigChain chr22_KI270731v1_random.hg38.mm10.rbest.chain bigChain.pre bigChain.link.pre</code></pre>
 <p>
 <strong>Step 6.</strong> 
-Create the bigChain file from your input chain file using a combination of <code>sed</code>, 
-<code>awk</code> and the <code>bedToBigBed</code> utility: 
-<pre><code>sed 's/\.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' &gt; chr22_KI270731v1_random.hg38.mm10.rbest.bigChain
-bedToBigBed -type=bed6+6 -as=bigChain.as -tab chr22_KI270731v1_random.hg38.mm10.rbest.bigChain hg38.chrom.sizes bigChain.bb</code></pre></p> 
+Create the bigChain and bigLink files using the <code>bedToBigBed</code> utility:
+<pre><code>bedToBigBed -type=bed6+6 -as=bigChain.as -tab bigChain.pre hg38.chrom.sizes bigChain.bb
+bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.link.pre hg38.chrom.sizes bigChain.link.bb
+</code></pre></p>
 <p>
 <strong>Step 7.</strong> 
-To display your date in the Genome Browser, you must also create a binary indexed link file to 
-accompany your bigChain file:</p> 
-<pre><code>awk 'BEGIN {OFS="\t"} {print $1, $2, $3, $5, $4}' link.tab | sort -k1,1 -k2,2n &gt; bigChain.bigLink
-bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.bigLink hg38.chrom.sizes bigChain.link.bb </code></pre>
-<p>
-<strong>Step 8.</strong> 
 Move the newly created bigChain (<em>bigChain.bb</em>) and bigLink (<em>bigChain.link.bb</em>)
 files to a web-accessible http, https or ftp location.</p>
 <p>
-<strong>Step 9.</strong> 
+<strong>Step 8.</strong>
 Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single 
 <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed 
 <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic
 version of the track line will look something like this:</p>
 <pre><code>track type=bigChain name="My Big Chain" bigDataUrl=http://myorg.edu/mylab/bigChain.bb linkDataUrl=http://myorg.edu/mylab/bigChain.link.bb </code></pre>
 <p>
-<strong>Step 10.</strong> 
+<strong>Step 9.</strong>
 Paste the custom track line into the text box on the 
 <a href="../../cgi-bin/hgCustom">custom track management page</a>.</p>
 <p>
 The <code>bedToBigBed</code> program can be run with several additional options. For a full
 list of the available options, type <code>bedToBigBed</code> (with no arguments) on the command line
 to display the usage message. </p>
 
 <h2>Examples</h2>
 <h3>Example #1</h3>
 <p>
 In this example, you will create a bigChain custom track using an existing bigChain file,
 <em>bigChain.bb</em>, located on the UCSC Genome Browser http server. This file contains data for 
 the hg38 assembly.</p>
 <p>
 To create a custom track using this bigChain file: 
 <ol>
   <li>
   Construct a track line that references the file:</p>
   <pre><code>track type=bigChain name=&quot;bigChain Example One&quot; description=&quot;A bigChain file&quot; bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb</code></pre></li>
   <li>
   Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management 
   page</a> for the human assembly hg38 (Dec. 2013).</li> 
   <li>
   Click the &quot;submit&quot; button.</li>
 </ol>
 <p>
 <!-- FIX ME -->
 Custom tracks can also be loaded via one URL line. 
 <a href="http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigChain%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb%20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack"
 target="_blank">This link</a> loads the same <em>bigChain.bb</em> track and sets additional display parameters in the URL:</p>
 <pre><code>http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random &hgct_customText=track%20type=bigChain%20name=Example %20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb %20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack</a></code></pre>
 <p>
 After this example bigChain is loaded in the Genome Browser, click into a chain on the browser's 
 track display. Note that the details page displays information about the individual chains, similar 
 to that which is available for a standard chain track.</p>
 
 <h3>Example #2</h3>
 <p>
 In this example, you will create your own bigChain file from an existing chain input file.</p>
 <ol>
   <li>
   Save <a href="examples/chr22_KI270731v1_random.hg38.mm10.rbest.chain">this chain file</a> to your 
   computer (<em>Step 1</em> in <em>Creating a bigChain track</em>, above).</li>
   <li>
   Save the autoSql files <a href="examples/bigChain.as"><em>bigChain.as</em></a> and 
   <a href="examples/bigLink.as"><em>bigLink.as</em></a> to your computer (<em>Step 2</em>, 
   above).</li>
   <li>
-  Download the <code>bedToBigBed</code> and <code>hgLoadChain</code> 
+  Download the <code>bedToBigBed</code> and <code>chainToBigChain</code>
   <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">utilities</a> (<em>Step 3</em>, above).</li>
   <li>
   Save the <a href="hg38.chrom.sizes"><em>hg38.chrom.sizes</em> text file</a> to your computer. This
   file contains the chrom.sizes for the human hg38 assembly (<em>Step 4</em>, above).</li>
   <li>
   Run the utilities in <em>Steps 5-7</em>, above, to create the bigChain and bigLink output 
   files. </li> 
   <li>
   Place the newly created bigChain (<em>bigChain.bb</em>) and and bigLink 
   (<em>bigChain.link.bb</em>) files on a web-accessible server (<em>Step 8</em>).</li>
   <li>
   Construct a track line that points to the bigChain file (<em>Step 9</em>, above).</li>
   <li>
   Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser 
   (<em>Step 10</em>, above).</li>
 </ol>
 
 <h2>Sharing your data with others</h2>
 <p>
 If you would like to share your bigChain data track with a colleague, learn how to create a URL by 
 looking at Example 6 on <a href="customTrack.html#EXAMPLE6">this page</a>.</p>
 
 <h2>Extracting data from the bigChain format</h2>
 <p>
 Because the bigChain files are an extension of bigBed files, which are indexed binary files, it can 
 be difficult to extract data from them. UCSC has developed the following programs to assist
 in working with bigBed formats, available from the 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p>
 <ul>
   <li>
   <code>bigBedToBed</code> &mdash; converts a bigBed file to ASCII BED format.</li>
   <li>
   <code>bigBedSummary</code> &mdash; extracts summary information from a bigBed file.</li>
   <li>
   <code>bigBedInfo</code> &mdash; prints out information about a bigBed file.</li>
 </ul>
 <p>
 As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the 
 command line to view the usage statement.</p>
 
 <h2>Troubleshooting</h2>
 <p>
 If you encounter an error when you run the <code>bedToBigBed</code> program, check your input 
 file for data coordinates that extend past the the end of the chromosome. If these are present, run 
 the <code>bedClip</code> program 
 (<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">available here</a>) to remove the problematic
 row(s) in your input file before running the <code>bedToBigBed</code> program.</p> 
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->