src/hg/htdocs/goldenPath/help/bigBed.html 6710917f67d227676ec78692cce69b8b6a113285

6710917f67d227676ec78692cce69b8b6a113285
brianlee
  Tue May 10 11:20:14 2022 -0700
Adding a link to trix doc in bigBed help, refs #28817

diff --git src/hg/htdocs/goldenPath/help/bigBed.html src/hg/htdocs/goldenPath/help/bigBed.html
index 256cf07..5a9d6c7 100755
--- src/hg/htdocs/goldenPath/help/bigBed.html
+++ src/hg/htdocs/goldenPath/help/bigBed.html
@@ -1,351 +1,355 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser bigBed Track Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>bigBed Track Format</h1> 
 <p>
 The bigBed format stores annotation items that can be either a simple or a linked collection of 
 exons, much as <a href="../../FAQ/FAQformat.html#format1">BED</a> files do. BigBed files are created
 from BED type files using the program <code>bedToBigBed</code>. The resulting bigBed 
 files are in an indexed binary format. The main advantage of the bigBed files is that only
 those portions of the files needed to display a particular region are transferred to the Genome 
 Browser server. Because of this, bigBed has considerably faster display performance than
 regular BED when working with large data sets. The bigBed file remains on your local web-accessible 
 server (http, https, or ftp), not on the UCSC server, and only the portion that is needed for the 
 currently displayed chromosomal position is locally cached as a "sparse file". If you do not have
 access to a web-accessible server and  need hosting space for your bigBed files, please see the
 <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help documentation.</p>
 <p>
 Additional indices can be created for the items in a bigBed file to support item search in track 
 hubs. See <a href="#Ex3">Example #3</a> below for an example of how to build an additional 
 index.</p>
 <p>
 See <a href="http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format" 
 target="_blank">this wiki page</a> for help in selecting the graphing track data format that is most
 appropriate for your type of data. To see an example of turning a text-based bedDetail custom track
 into the <code>bigBed</code> format, see this
 <a href="https://genome-blog.soe.ucsc.edu/blog/2021/08/03/how-make-a-bigbed-file-part-1/"
 target="_blank">How to make a bigBed file</a> blog post.</p>
 <p>
 Note that the <code>bedToBigBed</code> utility uses a substantial amount of memory:
 approximately 25% more RAM than the uncompressed BED input file.</p>
 <p>
 
 <a name=quick></a>
 <h2>Quickstart example commands</h2>
 
 It is not hard to create a bigBed file. The following UNIX commands create one on a Linux machine
 (swap macOSX for linux for an Apple environment). The steps are
 explained in more detail in the following sections on this page:
 
 <pre><code>wget https://genome.ucsc.edu/goldenPath/help/examples/bedExample.txt
 wget https://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes
 wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
 chmod a+x bedToBigBed
 ./bedToBigBed bedExample.txt hg19.chrom.sizes myBigBed.bb
 mv myBigBed.bb ~/public_html/
 </code></pre>
 
 The last step assumes that your ~/public_html/ directory is accessible from the internet. This may not be the case
 on your server. You may have to copy the file to another server and web-accessible location at your University. 
 Once you know the URL to the file myBigBed.bb, you can paste this URL into the <a href="../../cgi-bin/hgCustom">custom track</a>
 box on the UCSC Genome Browser to display the file.
 
 <a name=overview></a>
 <h2>Creating a bigBed track</h2>
 <p>
 To create a bigBed track, follow these steps (for concrete Unix commands, see the examples below on this page):<p>
 <p>
 <strong>Step 1.</strong>
 Create a BED format file following the directions 
 <a href="../../FAQ/FAQformat.html#format1">here</a>. When converting a BED file to a bigBed file, 
 you are limited to one track of data in your input file; therefore, you must create a separate BED 
 file for each data track. Your BED file must be sorted.
 <p>
 If your BED file was originally a custom track, remove any existing
 &quot;track&quot; or &quot;browser&quot; lines from your BED file so that it
 contains only data. Also the input BED must be sorted, performed with this command:
 <code>sort -k1,1 -k2,2n unsorted.bed &gt; input.bed</code>
 </p>
 
 <p> 
 <strong>Step 2.</strong>
 Download the <code>bedToBigBed</code> program from the 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>. 
 Example #2 below shows the exact Unix command.
 </p>
 
 <p>
 The <code>bedToBigBed</code> program can be run with several additional options. Some of these,
 such as the <code>-as</code> and <code>-type</code> options, are used in examples below. The 
 <code>-type</code> option, describes the size of the bigBed file, <code>-type=bedN[+[P]]</code>,
 where <code>N</code> is an integer between 3 and 12 and the optional <code>+[P]</code> parameter
 specifies the number of extra fields, not required, but preferred. Describing the size of the bigBed file
 is needed for access to extra fields like name, itemRgb, etc.
 Examples:<code>-type=bed6</code> or <code>-type=bed6+</code> or <code>-type=bed6+3 </code>.  
 For a full list of the available options, type <code>bedToBigBed</code> (with no arguments) on the
 command line to display the usage message.
 </p>
 
 <p>
 <strong>Step 3.</strong>
 Use the <code>fetchChromSizes</code> script from the 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">same directory</a> 
 to create the <em>chrom.sizes</em> file for the UCSC database you are working with (e.g., hg19).
 If the assembly <code>genNom</code> is hosted by UCSC, chrom.sizes can be a URL like:
 <code>http://hgdownload.soe.ucsc.edu/goldenPath/genNom/bigZips/genNom.chrom.sizes</code>. 
 </p>
 
 <p> 
 <strong>Step 4.</strong>
 Use the <code>bedToBigBed</code> utility to create a bigBed file from your sorted BED file, using 
 the <em>input.bed</em> file and <em>chrom.sizes</em> files created in <em>Steps 1</em> and
 <em>3</em>:</p> 
 <pre><code><strong>bedToBigBed</strong> input.bed chrom.sizes myBigBed.bb</code></pre>
 <p>
 <strong>Step 5.</strong>
 Move the newly created bigBed file (<em>myBigBed.bb</em>) to a web-accessible http, https, or 
 ftp location. At this point you should have a URL to your data, such as &quot;https://institution.edu/myBigBed.bb&quot;, and the file should be accessible outside of your institution/hosting providers network. For more information on where to host your data, please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help documentation.</p>
 <p> 
 <strong>Step 6.</strong>
 If the file name ends with a <em>.bigBed</em> or <em>.bb</em> suffix, you can paste the URL of the 
 file directly into the 
 <a href="../../cgi-bin/hgCustom">custom track</a> management page, click &quot;submit&quot; and 
 view the file as a track in the Genome Browser. By default, the file name will be used to name the track. To configure the track 
 name and descriptions, you must create a &quot;<a href="hgTracksHelp.html#TRACK">track 
 line</a>&quot;, as shown in Example 1 Configuration <em>Step 1</em>.</p>
 <p> 
 Alternatively, if you want to set the track labels and other options yourself,
 construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single track line. 
 Note that any of the track attributes listed
 <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic 
 version of the track line will look something like this:</p>
 <pre><code><strong>track type=</strong>bigBed <strong>name=</strong>"My Big Bed" <strong>description=</strong>"A Graph of Data from My Lab" <strong>bigDataUrl=</strong>http://myorg.edu/mylab/myBigBed.bb</code></pre>
 <p> 
 Paste this custom track line into the text box on the <a href="../../cgi-bin/hgCustom">custom 
 track</a> management page.</p>
 
 <h2>Examples</h2>
 <a name=Ex1></a>
 <h3>Example #1: Load an existing bigBed file</h3>
 <p>
 In this example, you will load an existing bigBed file,
 <em>bigBedExample.bb</em>, on the UCSC http server. This file contains data on chromosome 21 on the 
 human hg19 assembly.</p>
 <p>
 To create a custom track using this bigBed file:</p>
 <ol>
   <li>
   Paste the URL <code>http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb</code> into 
   the <a href="../../cgi-bin/hgCustom?db=hg19">custom track</a> management page for the human assembly
   hg19 (Feb.  2009).</li> 
   <li>
   Click the &quot;submit&quot; button. </li>
   <li>
   On the next page that displays, click the &quot;go&quot; link. To view the data in the bigBed
   track in the Genome Browser navigate to <code>chr21:33,031,597-33,041,570</code>.</li>
 </ol>
 <h4>Configuration</h4>
 <p>
 You can customize the track display by including track and browser lines that define
 certain parameters: </p>
 <ol>
   <li> 
   Construct a track line that references the <em>bigBedExample.bb</em> file:
 <pre>track type=bigBed name=&quot;bigBed Example One&quot; description=&quot;A bigBed file&quot; bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb</pre></li>
   <li>
   Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg19">custom track</a>
   management page, click the &quot;submit&quot; button. On the next page that displays, click
   the &quot;go&quot; link. To view the data in the bigBed track in the Genome Browser
   navigate to <code>chr21:33,031,597-33,041,570</code>.</li>
   <li>
   With the addition of the following browser line with the track line you can ensure that the
   custom track opens at the correct position when you paste in the information:
   <pre><code>browser position chr21:33,031,597-33,041,570
 track type=bigBed name=&quot;bigBed Example One&quot; description=&quot;A bigBed file&quot; bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb</code></pre>
   Paste the browser and track lines into the <a href="../../cgi-bin/hgCustom?db=hg19">custom track</a>
   page, click the &quot;submit&quot; button and the &quot;go&quot; link to see the data.</li>
 </ol>
 
 <a name=Ex2></a>
 <h3>Example #2: Create a bigBed file from a BED file</h3>
 <p>
 In this example, you will convert a sample BED file to bigBed format.</p>
 <ol>
   <li> 
   Save the BED file <a href="examples/bedExample.txt"><em>bedExample.txt</em></a> to a server, ideally one
   that is accessible from the internet.
   (<em>Steps 1</em> and <em>2</em> in <em>Creating a bigBed track</em>, above).<br>
     <pre><code>wget https://genome.ucsc.edu/goldenPath/help/examples/bedExample.txt</code></pre>
   </li>  
 <li> 
   Save the file <a href="hg19.chrom.sizes"><em>hg19.chrom.sizes</em></a> to your computer. It 
   contains the chrom.sizes data for the human (hg19) assembly (<em>Step 3</em>, above).<br>
     <pre><code>wget https://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes</code></pre>
 </li>
 <li>If you use your own file, it has to be sorted, first on the
   <code>chrom</code> field, and secondarily on the <code>chromStart</code>
   field. You can use the utility <code>bedSort</code>
 available <a href="http://hgdownload.soe.ucsc.edu/admin/exe/" target="_blank">here</a>
 or the following UNIX <code>sort</code>
 command to do this: </p>
 <pre><code>sort -k1,1 -k2,2n unsorted.bed &gt; input.bed</code></pre>
 </li>
 
 <li> 
   Download the <code>bedToBigBed</code> utility (<em>Step 2</em>, above). Replace "linux" below with "macOSX" 
   if your server is a Mac.
     <pre><code>wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
 chmod a+x bedToBigBed</code></pre>
   </li>
   <li> 
   Run the utility to create the bigBed output file (<em>Step 4</em>, above):
 <pre><code><strong>./bedToBigBed</strong> bedExample.txt hg19.chrom.sizes myBigBed.bb</code></pre></li>
   <li> 
   Place the bigBed file you just created (<em>myBigBed.bb</em>) on a web-accessible server 
   (<em>Step 5</em>, above).<br>
     <pre><code>mv myBigBed.bb ~/public_html/</code></pre>
   At some Universities, this involves using the commands ftp, scp or rsync to copy the file to a different server, one that is accessible 
   from the internet. We have <a href="hgTrackHubHelp.html#Hosting">documentation</a> how to find such a server.
   </li>
   <li> 
   Paste the URL itself into the Custom Tracks entry form or construct a track line that 
   points to your bigBed file (<em>Step 5</em>, above).</li>
   <li> 
   Create the custom track on the human assembly hg19 (Feb. 2009), and view it in the Genome Browser 
   (<em>Step 6</em>, above). Note that the original BED file contains data on chromosome 21 
   only.</li>
 </ol>
 
 <a name=Ex3></a>
 <h3>Example #3: Create a bigBed file with extra (custom) fields</h3>
 <p>
 BigBed files can 
 store extra fields in addition to the <a href="../../FAQ/FAQformat.html#format1">predefined BED 
 fields</a>. 
 In this example, you will create your own bigBed file from a fully featured existing BED file that 
 contains the standard BED fields up to and including the <em>color</em> field called <em>itemRgb</em>
 (field 9), plus two 
 additional non-standard fields (two alternate names for each item in the file). 
 The standard BED column itemRgb contains an R,G,B color value (e.g. "255,0,0").
 The resulting
 bigBed file will have nine standard BED columns and two additional non-standard user-defined columns.
 </p>
 <p>
 If you add extra fields to your bigBed file, you must include an AutoSql format
 (<em>.as</em>) file describing the fields. 
 In this file, all fields (standard and non-standard) are described with a short
 internal name and also a human-readable description.
 For more information on AutoSql, see 
 <a href="http://www.linuxjournal.com/article/5949" target="_blank">Kent and Brumbaugh, 2002</a>, as 
 well as examples of <em>.as</em> files in 
 <a href="http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/hg/lib"
 target="_blank">this directory</a>. 
 Then, the bedToBigBed program is run with the arguments <code>-type=bed9+2</code> and also 
 <code>-as=bedExample2.as</code> to help correctly interpret all the columns in the data. 
 </p>
 
 <p>
 This example also demonstrates how to create an extra search index on the name field, and the first of the 
 extra fields to be used for track item search. The searchIndex setting requires the input BED data to be
 case-sensitive sorted (<code>sort -k1,1 -k2,2n</code>), where newer versions of the tool bedToBigBed
 (available <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">here</a>) are enhanced to catch
 improper input.</p>
 <ol>
   <li> 
   Save the BED file <a href="examples/bedExample2.bed"><em>bedExample2.bed</em></a> to your computer
   (<em>Steps 1</em> and <em>2</em> in <em>Creating a bigBed track</em>, above).</li>
   <li> 
   Save the file <a href="hg18.chrom.sizes"><em>hg18.chrom.sizes</em></a> to your computer. This file
   contains the chrom.sizes for the human (hg18) assembly (<em>Step 4</em>, above).</li>
   <li> 
   Save the AutoSql file <a href="examples/bedExample2.as"><em>bedExample2.as</em></a> to your 
   computer. This file contains descriptions of the BED fields, and is required when the BED file 
   contains a <em>color</em> field.</li>  
   <li> 
   Download the <code>bedToBigBed</code> utility (<em>Step 3</em>, above).</li>
   <li> 
   Run the utility to create a bigBed output file with an index on the name field and the first 
   extra field: (<em>Step 5</em>, above):
   <pre><code>bedToBigBed -as=bedExample2.as -type=bed9+2 -extraIndex=name,geneSymbol bedExample2.bed hg18.chrom.sizes myBigBed2.bb</code></pre></li>
   Place the bigBed file you just created (<em>myBigBed2.bb</em>) on a web-accessible server 
   (<em>Step 6</em>, above).</li> 
   <li> 
   Paste the URL of the file into the custom tracks entry form, or alternatively construct a track 
   line that points to your bigBed file (<em>Step 7</em>, above). Because this bigBed file includes a
   field for color, you must include the 
   <a href="../../FAQ/FAQformat.html#format1"><code>itemRgb</code></a> attribute in the track line. 
   It will look somewhat similar to this (note that you must insert the URL specific to your own 
   bigBed file):
   <pre><code>track type=bigBed name="bigBed Example Three" description="A bigBed File with Color and two Extra Fields" itemRgb="On" bigDataUrl=http://yourWebAddress/myBigBed2.bb</code></pre></li>
   <li> 
   Create the custom track on the human assembly hg18 (Mar. 2006), and view it in the Genome Browser 
   (<em>step 8</em>, above). Note that the original BED file contains data on chromosome 7 only.</li>
   <li> 
   If you are using the bigBed file in a track hub, you can use the additional indices for track
   item searches. See the setting &quot;searchIndex&quot; in the 
   <a href="trackDb/trackDbHub.html#searchIndex">Track Database Definition Document</a> for more 
   information. For example, if you run the <code>bedToBigBed</code> utility with the option 
   <code>-extraIndex=name</code>, you will be able to search on the &quot;name&quot; field by adding 
   the line <code>searchIndex name</code> to the stanza about your bigBed in the hub's 
-  <em>trackDb.txt</em> file.</li>
+  <em>trackDb.txt</em> file. While searchIndex expects a search string with an exact match in the
+  index, another setting for Track Hubs, <a href="trackDb/trackDbHub.html#searchTrix">searchTrix</a>
+  allows for a fast look-up of free text associated with a list of identifiers, when a
+  <code>searchIndex</code> has also been created. See a <a href="hubQuickStartSearch.html">Searchable
+  Track Hub Quick Start Guide</a> here.</li>
   <li>
   Extra fields can contain text for labels or for display with mouseover (if the BED
   &quot;name&quot; field is needed for something that is not the label). See the trackDb settings 
   &quot;<a href="trackDb/trackDbHub.html#mouseOverField">mouseOverField</a>&quot; and 
   &quot;<a href="trackDb/trackDbHub.html#labelField">labelField</a>&quot; for more information.</li>
   <li>
   When you click on features, the contents of all extra fields are shown as a table. You can modify 
   the layout of the resulting page with the trackDb settings 
   &quot;<a href="trackDb/trackDbHub.html#skipFields">skipFields</a>&quot;, 
   &quot;<a href="trackDb/trackDbHub.html#sepFields">sepFields</a>&quot; 
   and &quot;<a href="trackDb/trackDbHub.html#skipEmptyFields">skipEmptyFields</a>&quot;, and 
   transform text fields into links with the 
   &quot;<a href="trackDb/trackDbHub.html#urls">urls</a>&quot; trackDb setting.</li>
   <li>
   Extra fields that start with the character &quot;_&quot; are reserved for internal use (special 
   display code); their contents are not shown on the details page.</li>
 </ol>
 
 <h2>Sharing Your Data with Others</h2>
 <p>
 If you would like to share your bigBed data track with a colleague, the best solution is to 
 save your current view as a stable <a href="hgSessionHelp.html">Genome Browser Session Link</a>.
 This will save the position and all settings that you made, all track visibilities, filters, 
 highlights, etc.
 </p>
 <p>
 If you want to create URLs to your bigBed file programmatically from software, 
 look at Example #6 on <a href="customTrack.html#EXAMPLE6">this page</a>.
 </p>
 
 <h2>Extracting Data from the bigBed Format</h2>
 <p>
 Because the bigBed files are indexed binary files, it can be difficult to extract data from them. 
 UCSC has developed the following programs to assist in working with bigBed formats, available from 
 the <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>:</p>
 <ul>
   <li>
   <code>bigBedToBed</code> &mdash; converts a bigBed file to ASCII BED format.</li>
   <li>
   <code>bigBedSummary</code> &mdash; extracts summary information from a bigBed file.</li>
   <li>
   <code>bigBedInfo</code> &mdash; prints out information about a bigBed file.</li>
 </ul>
 <p>
 These programs accept either file names or URLs to files as input. As with all UCSC 
 Genome Browser programs, simply type the program name (with no parameters) on the command line to 
 view the usage statement.</p>
 
 <h2>Troubleshooting</h2>
 <p>
 If you get an error when you run the <code>bedToBigBed</code> program, check your input BED file for
 data coordinates that extend past the end of the chromosome. If these are present, run the 
 <code>bedClip</code> program
 (<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">available here</a>) to remove the problematic 
 row(s) in your input BED file before using the <code>bedToBigBed</code> program. 
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->