3efabcabf4453c2ab402bb20230693af0e2e3963 lrnassar Wed Jun 3 16:18:27 2020 -0700 Help page for trackDbIndexBb utility refs #25532 diff --git src/hg/htdocs/goldenPath/help/trackDbIndexBb.html src/hg/htdocs/goldenPath/help/trackDbIndexBb.html new file mode 100755 index 0000000..f2cd385 --- /dev/null +++ src/hg/htdocs/goldenPath/help/trackDbIndexBb.html @@ -0,0 +1,235 @@ +<!DOCTYPE html> +<!--#set var="TITLE" value="Genome Browser Trix Indices" --> +<!--#set var="ROOT" value="../.." --> + +<!-- Relative paths to support mirror sites with non-standard GB docs install --> +<!--#include virtual="$ROOT/inc/gbPageStart.html" --> + +<h1>trackDbIndexBb</h1> +<p> +When there are many subtracks in a composite view, it may be useful to limit the +display to only those with data in the current viewing window. The trackDb setting, +<a target="_blank" +href="/goldenPath/help/trackDb/trackDbHub.html#hideEmptySubtracks">hideEmptySubtracks</a>, +does just that. This track setting produces a checkbox on the track configuration +page allowing the user to enable or disable this feature. If it is configured to 'on', +then the feature will be on by default (the checkbox is checked). In order to take +full advantage of this setting it is helpful, though not always required, to index the +underlying bigBed files. The <code>trackDbIndexBb</code> utility facilitates the indexing +process. There are two instances in which the index file is helpful or required:</p> +<ul> +<li><b>(helpful)</b> In large composites (dozens or hundreds of tracks), especially when subtrack +data is sparse, an index file will provide a substantial performance improvement.</li> +<li><b>(required)</b> In order to build track associations. An example of this is when peak and +signal tracks wish to be displayed together. Since <code>hideEmptySubtracks</code> works +only on bigBed tracks (peaks), associated tracks (such as bigWig peaks) can be designated +to be displayed alongside the bigBeds with data.</li> +</ul> + +<p> +In order to build the index files, you must first download the <code>trackDbIndexBb</code> utility. +For more information on downloading our command line utilities, see these +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#source_downloads">instructions</a>.</p> +<p> +There are 3 other programs needed to run <code>trackDbIndexBb</code>. Two of them can be found +in the same downloads directory above: <code>bedToBigBed</code> and <code>bigBedToBed</code>. +The final dependency, <b>bedtools</b>, can be found on the <a target="_blank" +href="https://bedtools.readthedocs.io/en/latest/">bedtools site</a>.</p> + +<h2>Parameters</h2> +<p> +Kent utilities can be run with no parameters in order to display a usage message. +<code>trackDbIndexBb</code> can be run this way, and additionally can be passed the +<code>- h</code> flag in order to display a more verbose help message.</p> +<pre> +./trackDbIndexBb +./trackDbIndexBb -h +</pre> +<p> +Below is a short description of the parameters:</p> +<ul> +<li><b>trackName</b> | This is the name of the composite that contains the bigBed tracks +to be indexed. Higher-level composite names may be used in order to build track associations. +This means that all tracks <em>below</em> the name given will be searched for indexing and +associations.</li> +<li><b>raFile</b> | This is the location of the trackDb.ra file containing the bigDataUrls +of the bigBeds to be indexed.</li> +<li><b>chromSizes</b> | Location of the chrom.sizes file. This is a file containing the names +and sizes of the chromosomes for the working assembly. This can be generated from the 2bit genome, +downloaded from the respective assembly on our <a target="_blank" +href="http://hgdownload.soe.ucsc.edu/downloads.html">download server</a>, or fetched using the +<code>fetchChromSizes</code> utility found in the same directory as <code>trackDbIndexBb</code>.</li> +<li><b>-o --out</b> | (Optional) Path to the output directory where the resulting files will +be placed. Defaults to current directory.</li> +<li><b>-p --pathTools</b> | (Optional) Location where the dependent programs can be found. +Will automatically check current directory and user's path. <code>trackDbIndexBb</code> +has three dependencies listed above. Note that bedtools must be downloaded from an +external group.</li> +<li><b>-n --noDelete</b> | (Optional) Keep intermediary multiBed file.</li> +<li><b>-m --metaDataVar</b> | (Optional) Used only when building track associations. The variable +designated here will be a trackDb variable which can be used to associate tracks. Default value is +<em>subGroups</em>, though <em>metaData</em> is also commonly used. See example below for more +information.</li> +<li><b>-s --subGroupRemove</b> | (Optional) Used only when building track associations. In +conjunction with <b>--metaDataVar</b>, this variable is used to build track associations. The +value designated will be excluded as a matching requirement from the trackDb parameter. By +default <em>view</em> is used, though this can change depending on data and organization. +See example below for more information.</li> +</ul> + +<h2>Example 1</h2> +<p> +In this first example, we are looking to build an index file for a composite track +containing 12 bigBed files. An index file is not required for performance +reasons for a track with such few files, however, the steps would be the same +on a larger track.</p> +<p> +First, we can take a look at the header stanza for the composite. The complete +trackDb.ra file is available <a target=_blank" +href="/goldenPath/help/examples/trackDbIndexBb/smallExampleTrackDb.ra">here</a>.</p> +<pre> +track problematic +shortLabel Problematic Regions +longLabel Problematic Regions for NGS or Sanger sequencing or very variable regions +compositeTrack on +hideEmptySubtracks off +group map +visibility hide +type bigBed 3 + +</pre> +<p> +We can see that the <code>hideEmptySubtracks</code> setting is already enabled, +set off by default. The index we are building is not required, but instead +improves the performance of the feature. The other key information here is +the composite track name, <em>problematic</em>. This is what we will want to +pass for our <b>trackName</b> variable.</p> +<p> +The other two required parameters are the path to the trackDb.ra file, +and the chrom.sizes file. If we assume both of those are in the current directory, +and that the required dependencies are present in the path, we can run +<code>trackDbIndexBb</code> as such:</p> +<pre> +./trackDbIndexBb problematic smallExampleTrackDb.ra hg19.chrom.sizes +</pre> +<p> +This will result in two files being generated in the current directory:</p> +<pre> +problematic.multiBed.bb +problematic.multiBedSources.tab +</pre> +<p> +We can then enable the use of these index files for <code>hideEmptySubtracks</code> by +adding the following two lines to our trackDb.ra file, adjusting the path to the file +if needed:</p> +<pre> +hideEmptySubtracks off +hideEmptySubtracksMultiBedUrl problematic.multiBed.bb +hideEmptySubtracksSourcesUrl problematic.multiBedSources.tab +</pre> + +<h2>Example 2</h2> +<p> +In this longer example, we are looking to build an index file with track associations between +DNase-seq peak and signal tracks. There are 2 bigBed peak tracks, and 4 bigWig signal tracks. +The complete trackDb for the example can be found <a target=_blank" +href="/goldenPath/help/examples/trackDbIndexBb/exampleTrackDb.ra">here</a>.</p> +<p> +Looking at the +top level stanza, we see that the track is a composite track with two views, one for +peaks and one for signals. The data are associated with a few different subGroups:</p> +<pre> +track uniformDnase +subGroup4 lab Lab Duke=Duke UW=UW UWDuke=UW-Duke +subGroup3 view View Peaks=Peaks Signal=Signal +subGroup2 cellType Cell_Line GM12878=GM12878 H1-hESC=H1-hESC +</pre> +<p> +In order to decide how to best make these associations, let us see what the relevant parts of +the peak and signal stanzas we would like to associate:</p> +<pre> + track wgEncodeUWDukeDnaseGM12878FdrPeaks + type bigBed 6 + + parent uniformDnasePeaks on + bigDataUrl wgEncodeUWDukeDnaseGM12878.fdr01peaks.hg19.bb + subGroups view=Peaks tier=t1 cellType=GM12878 lab=UWDuke + metadata cell=GM12878 + + track wgEncodeDukeDnaseGM12878FdrSignal + type bigWig + parent uniformDnaseSignal on + bigDataUrl wgEncodeOpenChromDnaseGm12878Aln_5Reps.norm5.rawsignal.bw + subGroups view=Signal tier=t1 cellType=GM12878 lab=Duke + metadata cell=GM12878 lab=Duke + + track wgEncodeUWDnaseGM12878FdrSignal + type bigWig + parent uniformDnaseSignal on + bigDataUrl wgEncodeUwDnaseGm12878Aln_2Reps.norm5.rawsignal.bw + subGroups view=Signal tier=t1 cellType=GM12878 lab=UW + metadata cell=GM12878 lab=UW +</pre> +<p> +The first track is the bigBed peaks track, part of the peaks view, and the second +and third are bigWig signal tracks, part of the signal view. <code>hideEmptySubtracks</code> +allows for two optional variables to build track associations. The first, <b>-m --metaDataVar</b>, +designates which trackDb variable will be used to build the association. In this example, the peak +tracks are called on a combination of the signal tracks for each cell. So we would like to display +both of the signal tracks when the peak track has data.</p> +<p> +The <b>subGroups</b> parameter could be used, however, we see that there are two variables that +differ between the peak and signal stanzas, <em>view</em> and <em>lab</em>. We would have to strip +both of those in order to have matching parameters variables and build an association. On the +other hand, we could use the <b>metaData</b> parameter. This parameter associates the tracks +by the <em>cell</em>, with only the <em>lab</em> variable differing. This would be the best +choice.</p> +<p> +Now that we know which parameter we would like to use to build associations, we need to use the +second optional parameter, <b>-s --subGroupRemove</b>, to tell <code>hideEmptySubtracks</code> +which variables to strip out in making the association. In this case, we would like to +keep the <em>cell</em> variable, but strip the <em>lab</em>. This means that <em>lab</em> +will be the parameter passed. In this way, associations will be made between any tracks that +match the contents of their <b>metaData</b> parameter once the <em>lab</em> variable has been +stripped out.</p> +<p> +Now that we have chosen our parameters, we will run the utility assuming our chrom.sizes file, +our trackDb.ra file, and all the supporting programs (bedToBigBed, bigBedToBed, bedtools) are +present in the current directory. We will also choose the output to be the current directory:</p> +<pre> +./trackDbIndexBb uniformDnase exampleTrackDb.ra chrom.sizes -o . -p . -m metadata -s lab +</pre> +<p> +Note that in this case, we did not need to specify the <b>-o</b> or <b>-p</b> values as the +current directory is the default for both.</p> +<p> +In this small example, the utility would run in a few seconds. But larger inputs containing +hundreds of tracks can take hours. Upon completion, two files will be generated:</p> +<pre> +uniformDnase.multiBed.bb +uniformDnase.multiBedSources.tab +</pre> +<p> +The .bb file will be a big multibed containing the coordinates where the tracks intersect, +expediting data lookup, and the .tab file will serve as an index for the multibed while +also containing the track associations. The .tab file can be quickly examined to ensure +proper generation as it should contain a numerical first column, followed by the bigBed +track, then any number of desired track associations, e.x.</p> +<pre> +1 wgEncodeUWDukeDnaseGM12878FdrPeaks wgEncodeDukeDnaseGM12878FdrSignal wgEncodeUWDnaseGM12878FdrSignal +2 wgEncodeUWDukeDnaseH1hESCFdrPeaks wgEncodeDukeDnaseH1hESCFdrSignal wgEncodeUWDnaseH1hESCFdrSignal +</pre> +<p> +Finally, <a target="_blank" +href="/goldenPath/help/trackDb/trackDbHub.html#hideEmptySubtracks">hideEmptySubtracks</a> can be +enabled and pointed to the newly generated files on the top composite stanza:</p> +<pre> +hideEmptySubtracks on +hideEmptySubtracksMultiBedUrl uniformDnase.multiBed.bb +hideEmptySubtracksSourcesUrl uniformDnase.multiBedSources.tab +</pre> +<p> +More information on how to use track hubs can be found in the <a target="_blank" +href="/goldenPath/help/hgTrackHubHelp.html">Track Hub help page</a> as well as the +<a target="_blank" href="trackDb/trackDbHub.html#searchTrix"> +Track Database Definition Document</a>.</p> + +<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->