3efabcabf4453c2ab402bb20230693af0e2e3963 lrnassar Wed Jun 3 16:18:27 2020 -0700 Help page for trackDbIndexBb utility refs #25532 diff --git src/hg/htdocs/goldenPath/help/trackDbIndexBb.html src/hg/htdocs/goldenPath/help/trackDbIndexBb.html new file mode 100755 index 0000000..f2cd385 --- /dev/null +++ src/hg/htdocs/goldenPath/help/trackDbIndexBb.html @@ -0,0 +1,235 @@ + + + + + + + +

trackDbIndexBb

+

+When there are many subtracks in a composite view, it may be useful to limit the +display to only those with data in the current viewing window. The trackDb setting, +hideEmptySubtracks, +does just that. This track setting produces a checkbox on the track configuration +page allowing the user to enable or disable this feature. If it is configured to 'on', +then the feature will be on by default (the checkbox is checked). In order to take +full advantage of this setting it is helpful, though not always required, to index the +underlying bigBed files. The trackDbIndexBb utility facilitates the indexing +process. There are two instances in which the index file is helpful or required:

+ + +

+In order to build the index files, you must first download the trackDbIndexBb utility. +For more information on downloading our command line utilities, see these +instructions.

+

+There are 3 other programs needed to run trackDbIndexBb. Two of them can be found +in the same downloads directory above: bedToBigBed and bigBedToBed. +The final dependency, bedtools, can be found on the bedtools site.

+ +

Parameters

+

+Kent utilities can be run with no parameters in order to display a usage message. +trackDbIndexBb can be run this way, and additionally can be passed the +- h flag in order to display a more verbose help message.

+
+./trackDbIndexBb
+./trackDbIndexBb -h
+
+

+Below is a short description of the parameters:

+ + +

Example 1

+

+In this first example, we are looking to build an index file for a composite track +containing 12 bigBed files. An index file is not required for performance +reasons for a track with such few files, however, the steps would be the same +on a larger track.

+

+First, we can take a look at the header stanza for the composite. The complete +trackDb.ra file is available here.

+
+track problematic
+shortLabel Problematic Regions
+longLabel Problematic Regions for NGS or Sanger sequencing or very variable regions
+compositeTrack on
+hideEmptySubtracks off
+group map
+visibility hide
+type bigBed 3 +
+
+

+We can see that the hideEmptySubtracks setting is already enabled, +set off by default. The index we are building is not required, but instead +improves the performance of the feature. The other key information here is +the composite track name, problematic. This is what we will want to +pass for our trackName variable.

+

+The other two required parameters are the path to the trackDb.ra file, +and the chrom.sizes file. If we assume both of those are in the current directory, +and that the required dependencies are present in the path, we can run +trackDbIndexBb as such:

+
+./trackDbIndexBb problematic smallExampleTrackDb.ra hg19.chrom.sizes
+
+

+This will result in two files being generated in the current directory:

+
+problematic.multiBed.bb
+problematic.multiBedSources.tab
+
+

+We can then enable the use of these index files for hideEmptySubtracks by +adding the following two lines to our trackDb.ra file, adjusting the path to the file +if needed:

+
+hideEmptySubtracks off
+hideEmptySubtracksMultiBedUrl problematic.multiBed.bb
+hideEmptySubtracksSourcesUrl problematic.multiBedSources.tab
+
+ +

Example 2

+

+In this longer example, we are looking to build an index file with track associations between +DNase-seq peak and signal tracks. There are 2 bigBed peak tracks, and 4 bigWig signal tracks. +The complete trackDb for the example can be found here.

+

+Looking at the +top level stanza, we see that the track is a composite track with two views, one for +peaks and one for signals. The data are associated with a few different subGroups:

+
+track uniformDnase
+subGroup4 lab Lab Duke=Duke UW=UW UWDuke=UW-Duke
+subGroup3 view View Peaks=Peaks Signal=Signal
+subGroup2 cellType Cell_Line GM12878=GM12878 H1-hESC=H1-hESC
+
+

+In order to decide how to best make these associations, let us see what the relevant parts of +the peak and signal stanzas we would like to associate:

+
+                track wgEncodeUWDukeDnaseGM12878FdrPeaks
+                type bigBed 6 +
+                parent uniformDnasePeaks on
+                bigDataUrl wgEncodeUWDukeDnaseGM12878.fdr01peaks.hg19.bb
+                subGroups view=Peaks tier=t1 cellType=GM12878 lab=UWDuke
+                metadata cell=GM12878
+
+                track wgEncodeDukeDnaseGM12878FdrSignal
+                type bigWig
+                parent uniformDnaseSignal on
+                bigDataUrl wgEncodeOpenChromDnaseGm12878Aln_5Reps.norm5.rawsignal.bw
+                subGroups view=Signal tier=t1 cellType=GM12878 lab=Duke
+                metadata cell=GM12878 lab=Duke
+
+                track wgEncodeUWDnaseGM12878FdrSignal
+                type bigWig
+                parent uniformDnaseSignal on
+                bigDataUrl wgEncodeUwDnaseGm12878Aln_2Reps.norm5.rawsignal.bw
+                subGroups view=Signal tier=t1 cellType=GM12878 lab=UW
+                metadata cell=GM12878 lab=UW
+
+

+The first track is the bigBed peaks track, part of the peaks view, and the second +and third are bigWig signal tracks, part of the signal view. hideEmptySubtracks +allows for two optional variables to build track associations. The first, -m --metaDataVar, +designates which trackDb variable will be used to build the association. In this example, the peak +tracks are called on a combination of the signal tracks for each cell. So we would like to display +both of the signal tracks when the peak track has data.

+

+The subGroups parameter could be used, however, we see that there are two variables that +differ between the peak and signal stanzas, view and lab. We would have to strip +both of those in order to have matching parameters variables and build an association. On the +other hand, we could use the metaData parameter. This parameter associates the tracks +by the cell, with only the lab variable differing. This would be the best +choice.

+

+Now that we know which parameter we would like to use to build associations, we need to use the +second optional parameter, -s --subGroupRemove, to tell hideEmptySubtracks +which variables to strip out in making the association. In this case, we would like to +keep the cell variable, but strip the lab. This means that lab +will be the parameter passed. In this way, associations will be made between any tracks that +match the contents of their metaData parameter once the lab variable has been +stripped out.

+

+Now that we have chosen our parameters, we will run the utility assuming our chrom.sizes file, +our trackDb.ra file, and all the supporting programs (bedToBigBed, bigBedToBed, bedtools) are +present in the current directory. We will also choose the output to be the current directory:

+
+./trackDbIndexBb uniformDnase exampleTrackDb.ra chrom.sizes -o . -p . -m metadata -s lab
+
+

+Note that in this case, we did not need to specify the -o or -p values as the +current directory is the default for both.

+

+In this small example, the utility would run in a few seconds. But larger inputs containing +hundreds of tracks can take hours. Upon completion, two files will be generated:

+
+uniformDnase.multiBed.bb
+uniformDnase.multiBedSources.tab
+
+

+The .bb file will be a big multibed containing the coordinates where the tracks intersect, +expediting data lookup, and the .tab file will serve as an index for the multibed while +also containing the track associations. The .tab file can be quickly examined to ensure +proper generation as it should contain a numerical first column, followed by the bigBed +track, then any number of desired track associations, e.x.

+
+1	wgEncodeUWDukeDnaseGM12878FdrPeaks	wgEncodeDukeDnaseGM12878FdrSignal	wgEncodeUWDnaseGM12878FdrSignal
+2	wgEncodeUWDukeDnaseH1hESCFdrPeaks	wgEncodeDukeDnaseH1hESCFdrSignal	wgEncodeUWDnaseH1hESCFdrSignal
+
+

+Finally, hideEmptySubtracks can be +enabled and pointed to the newly generated files on the top composite stanza:

+
+hideEmptySubtracks on
+hideEmptySubtracksMultiBedUrl uniformDnase.multiBed.bb 
+hideEmptySubtracksSourcesUrl uniformDnase.multiBedSources.tab
+
+

+More information on how to use track hubs can be found in the Track Hub help page as well as the + +Track Database Definition Document.

+ +