3efabcabf4453c2ab402bb20230693af0e2e3963 lrnassar Wed Jun 3 16:18:27 2020 -0700 Help page for trackDbIndexBb utility refs #25532 diff --git src/hg/htdocs/goldenPath/help/trackDbIndexBb.html src/hg/htdocs/goldenPath/help/trackDbIndexBb.html new file mode 100755 index 0000000..f2cd385 --- /dev/null +++ src/hg/htdocs/goldenPath/help/trackDbIndexBb.html @@ -0,0 +1,235 @@ + + + + + + + +
+When there are many subtracks in a composite view, it may be useful to limit the
+display to only those with data in the current viewing window. The trackDb setting,
+hideEmptySubtracks,
+does just that. This track setting produces a checkbox on the track configuration
+page allowing the user to enable or disable this feature. If it is configured to 'on',
+then the feature will be on by default (the checkbox is checked). In order to take
+full advantage of this setting it is helpful, though not always required, to index the
+underlying bigBed files. The trackDbIndexBb
utility facilitates the indexing
+process. There are two instances in which the index file is helpful or required:
hideEmptySubtracks
works
+only on bigBed tracks (peaks), associated tracks (such as bigWig peaks) can be designated
+to be displayed alongside the bigBeds with data.
+In order to build the index files, you must first download the trackDbIndexBb
utility.
+For more information on downloading our command line utilities, see these
+instructions.
+There are 3 other programs needed to run trackDbIndexBb
. Two of them can be found
+in the same downloads directory above: bedToBigBed
and bigBedToBed
.
+The final dependency, bedtools, can be found on the bedtools site.
+Kent utilities can be run with no parameters in order to display a usage message.
+trackDbIndexBb
can be run this way, and additionally can be passed the
+- h
flag in order to display a more verbose help message.
+./trackDbIndexBb +./trackDbIndexBb -h ++
+Below is a short description of the parameters:
+fetchChromSizes
utility found in the same directory as trackDbIndexBb
.trackDbIndexBb
+has three dependencies listed above. Note that bedtools must be downloaded from an
+external group.+In this first example, we are looking to build an index file for a composite track +containing 12 bigBed files. An index file is not required for performance +reasons for a track with such few files, however, the steps would be the same +on a larger track.
++First, we can take a look at the header stanza for the composite. The complete +trackDb.ra file is available here.
++track problematic +shortLabel Problematic Regions +longLabel Problematic Regions for NGS or Sanger sequencing or very variable regions +compositeTrack on +hideEmptySubtracks off +group map +visibility hide +type bigBed 3 + ++
+We can see that the hideEmptySubtracks
setting is already enabled,
+set off by default. The index we are building is not required, but instead
+improves the performance of the feature. The other key information here is
+the composite track name, problematic. This is what we will want to
+pass for our trackName variable.
+The other two required parameters are the path to the trackDb.ra file,
+and the chrom.sizes file. If we assume both of those are in the current directory,
+and that the required dependencies are present in the path, we can run
+trackDbIndexBb
as such:
+./trackDbIndexBb problematic smallExampleTrackDb.ra hg19.chrom.sizes ++
+This will result in two files being generated in the current directory:
++problematic.multiBed.bb +problematic.multiBedSources.tab ++
+We can then enable the use of these index files for hideEmptySubtracks
by
+adding the following two lines to our trackDb.ra file, adjusting the path to the file
+if needed:
+hideEmptySubtracks off +hideEmptySubtracksMultiBedUrl problematic.multiBed.bb +hideEmptySubtracksSourcesUrl problematic.multiBedSources.tab ++ +
+In this longer example, we are looking to build an index file with track associations between +DNase-seq peak and signal tracks. There are 2 bigBed peak tracks, and 4 bigWig signal tracks. +The complete trackDb for the example can be found here.
++Looking at the +top level stanza, we see that the track is a composite track with two views, one for +peaks and one for signals. The data are associated with a few different subGroups:
++track uniformDnase +subGroup4 lab Lab Duke=Duke UW=UW UWDuke=UW-Duke +subGroup3 view View Peaks=Peaks Signal=Signal +subGroup2 cellType Cell_Line GM12878=GM12878 H1-hESC=H1-hESC ++
+In order to decide how to best make these associations, let us see what the relevant parts of +the peak and signal stanzas we would like to associate:
++ track wgEncodeUWDukeDnaseGM12878FdrPeaks + type bigBed 6 + + parent uniformDnasePeaks on + bigDataUrl wgEncodeUWDukeDnaseGM12878.fdr01peaks.hg19.bb + subGroups view=Peaks tier=t1 cellType=GM12878 lab=UWDuke + metadata cell=GM12878 + + track wgEncodeDukeDnaseGM12878FdrSignal + type bigWig + parent uniformDnaseSignal on + bigDataUrl wgEncodeOpenChromDnaseGm12878Aln_5Reps.norm5.rawsignal.bw + subGroups view=Signal tier=t1 cellType=GM12878 lab=Duke + metadata cell=GM12878 lab=Duke + + track wgEncodeUWDnaseGM12878FdrSignal + type bigWig + parent uniformDnaseSignal on + bigDataUrl wgEncodeUwDnaseGm12878Aln_2Reps.norm5.rawsignal.bw + subGroups view=Signal tier=t1 cellType=GM12878 lab=UW + metadata cell=GM12878 lab=UW ++
+The first track is the bigBed peaks track, part of the peaks view, and the second
+and third are bigWig signal tracks, part of the signal view. hideEmptySubtracks
+allows for two optional variables to build track associations. The first, -m --metaDataVar,
+designates which trackDb variable will be used to build the association. In this example, the peak
+tracks are called on a combination of the signal tracks for each cell. So we would like to display
+both of the signal tracks when the peak track has data.
+The subGroups parameter could be used, however, we see that there are two variables that +differ between the peak and signal stanzas, view and lab. We would have to strip +both of those in order to have matching parameters variables and build an association. On the +other hand, we could use the metaData parameter. This parameter associates the tracks +by the cell, with only the lab variable differing. This would be the best +choice.
+
+Now that we know which parameter we would like to use to build associations, we need to use the
+second optional parameter, -s --subGroupRemove, to tell hideEmptySubtracks
+which variables to strip out in making the association. In this case, we would like to
+keep the cell variable, but strip the lab. This means that lab
+will be the parameter passed. In this way, associations will be made between any tracks that
+match the contents of their metaData parameter once the lab variable has been
+stripped out.
+Now that we have chosen our parameters, we will run the utility assuming our chrom.sizes file, +our trackDb.ra file, and all the supporting programs (bedToBigBed, bigBedToBed, bedtools) are +present in the current directory. We will also choose the output to be the current directory:
++./trackDbIndexBb uniformDnase exampleTrackDb.ra chrom.sizes -o . -p . -m metadata -s lab ++
+Note that in this case, we did not need to specify the -o or -p values as the +current directory is the default for both.
++In this small example, the utility would run in a few seconds. But larger inputs containing +hundreds of tracks can take hours. Upon completion, two files will be generated:
++uniformDnase.multiBed.bb +uniformDnase.multiBedSources.tab ++
+The .bb file will be a big multibed containing the coordinates where the tracks intersect, +expediting data lookup, and the .tab file will serve as an index for the multibed while +also containing the track associations. The .tab file can be quickly examined to ensure +proper generation as it should contain a numerical first column, followed by the bigBed +track, then any number of desired track associations, e.x.
++1 wgEncodeUWDukeDnaseGM12878FdrPeaks wgEncodeDukeDnaseGM12878FdrSignal wgEncodeUWDnaseGM12878FdrSignal +2 wgEncodeUWDukeDnaseH1hESCFdrPeaks wgEncodeDukeDnaseH1hESCFdrSignal wgEncodeUWDnaseH1hESCFdrSignal ++
+Finally, hideEmptySubtracks can be +enabled and pointed to the newly generated files on the top composite stanza:
++hideEmptySubtracks on +hideEmptySubtracksMultiBedUrl uniformDnase.multiBed.bb +hideEmptySubtracksSourcesUrl uniformDnase.multiBedSources.tab ++
+More information on how to use track hubs can be found in the Track Hub help page as well as the + +Track Database Definition Document.
+ +