3a5356e5e0d64e331f9639b86b712506f58e251d lrnassar Mon Jun 8 17:33:44 2020 -0700 Making changes in response to CR, refs #25695 diff --git src/hg/htdocs/goldenPath/help/trackDbIndexBb.html src/hg/htdocs/goldenPath/help/trackDbIndexBb.html index f2cd385..197f88e 100755 --- src/hg/htdocs/goldenPath/help/trackDbIndexBb.html +++ src/hg/htdocs/goldenPath/help/trackDbIndexBb.html @@ -1,230 +1,240 @@
When there are many subtracks in a composite view, it may be useful to limit the
display to only those with data in the current viewing window. The trackDb setting,
hideEmptySubtracks,
-does just that. This track setting produces a checkbox on the track configuration
+enables this behavior. This track setting produces a checkbox on the track configuration
page allowing the user to enable or disable this feature. If it is configured to 'on',
-then the feature will be on by default (the checkbox is checked). In order to take
+then the feature will be on by default (the checkbox is checked). To take
full advantage of this setting it is helpful, though not always required, to index the
-underlying bigBed files. The trackDbIndexBb
utility facilitates the indexing
-process. There are two instances in which the index file is helpful or required:
trackDbIndexBb
utility. This utility
+creates multibed/index files containing the coordinates where the tracks intersect,
+expediting data lookup. There are two instances in which these files are helpful or required:
hideEmptySubtracks
works
-only on bigBed tracks (peaks), associated tracks (such as bigWig peaks) can be designated
+data are sparse, the index files will provide a substantial performance improvement.hideEmptySubtracks
works
+only on bigBed tracks (peaks), associated tracks (such as bigWig signals) can be designated
to be displayed alongside the bigBeds with data.
-In order to build the index files, you must first download the trackDbIndexBb
utility.
+To build the index files, first download the trackDbIndexBb
utility.
For more information on downloading our command line utilities, see these
instructions.
-There are 3 other programs needed to run trackDbIndexBb
. Two of them can be found
-in the same downloads directory above: bedToBigBed
and bigBedToBed
.
+There are three other programs needed to run trackDbIndexBb
. Two of them,
+bedToBigBed
and bigBedToBed
, can be found
+in the same
+downloads directory.
The final dependency, bedtools, can be found on the bedtools site.
-Kent utilities can be run with no parameters in order to display a usage message.
-trackDbIndexBb
can be run this way, and additionally can be passed the
-- h
flag in order to display a more verbose help message.
trackDbIndexBb
can be passed the
+- h
flag to display a more verbose help message.
./trackDbIndexBb ./trackDbIndexBb -h
Below is a short description of the parameters:
fetchChromSizes
utility found in the same directory as trackDbIndexBb
.trackDbIndexBb
has three dependencies listed above. Note that bedtools must be downloaded from an
external group.-In this first example, we are looking to build an index file for a composite track -containing 12 bigBed files. An index file is not required for performance -reasons for a track with such few files, however, the steps would be the same +In this first example, we are looking to build index files for a composite track +containing 12 bigBed files. Index files are not required for performance +reasons for a track with so few files, however, the steps would be the same on a larger track.
First, we can take a look at the header stanza for the composite. The complete trackDb.ra file is available here.
track problematic shortLabel Problematic Regions longLabel Problematic Regions for NGS or Sanger sequencing or very variable regions compositeTrack on hideEmptySubtracks off group map visibility hide type bigBed 3 +
We can see that the hideEmptySubtracks
setting is already enabled,
-set off by default. The index we are building is not required, but instead
-improves the performance of the feature. The other key information here is
-the composite track name, problematic. This is what we will want to
-pass for our trackName variable.
The other two required parameters are the path to the trackDb.ra file,
and the chrom.sizes file. If we assume both of those are in the current directory,
and that the required dependencies are present in the path, we can run
trackDbIndexBb
as such:
./trackDbIndexBb problematic smallExampleTrackDb.ra hg19.chrom.sizes
This will result in two files being generated in the current directory:
problematic.multiBed.bb problematic.multiBedSources.tab
We can then enable the use of these index files for hideEmptySubtracks
by
adding the following two lines to our trackDb.ra file, adjusting the path to the file
if needed:
hideEmptySubtracks off hideEmptySubtracksMultiBedUrl problematic.multiBed.bb hideEmptySubtracksSourcesUrl problematic.multiBedSources.tab
-In this longer example, we are looking to build an index file with track associations between +In this longer example, we are looking to build index files with track associations between DNase-seq peak and signal tracks. There are 2 bigBed peak tracks, and 4 bigWig signal tracks. The complete trackDb for the example can be found here.
Looking at the top level stanza, we see that the track is a composite track with two views, one for peaks and one for signals. The data are associated with a few different subGroups:
track uniformDnase subGroup4 lab Lab Duke=Duke UW=UW UWDuke=UW-Duke subGroup3 view View Peaks=Peaks Signal=Signal subGroup2 cellType Cell_Line GM12878=GM12878 H1-hESC=H1-hESC
-In order to decide how to best make these associations, let us see what the relevant parts of -the peak and signal stanzas we would like to associate:
+To help us decide how to best make these associations, let us see what parts of +the peak and signal stanzas we would like to associate are relevant:track wgEncodeUWDukeDnaseGM12878FdrPeaks type bigBed 6 + parent uniformDnasePeaks on bigDataUrl wgEncodeUWDukeDnaseGM12878.fdr01peaks.hg19.bb subGroups view=Peaks tier=t1 cellType=GM12878 lab=UWDuke metadata cell=GM12878 track wgEncodeDukeDnaseGM12878FdrSignal type bigWig parent uniformDnaseSignal on bigDataUrl wgEncodeOpenChromDnaseGm12878Aln_5Reps.norm5.rawsignal.bw subGroups view=Signal tier=t1 cellType=GM12878 lab=Duke metadata cell=GM12878 lab=Duke track wgEncodeUWDnaseGM12878FdrSignal type bigWig parent uniformDnaseSignal on bigDataUrl wgEncodeUwDnaseGm12878Aln_2Reps.norm5.rawsignal.bw subGroups view=Signal tier=t1 cellType=GM12878 lab=UW metadata cell=GM12878 lab=UW
The first track is the bigBed peaks track, part of the peaks view, and the second
and third are bigWig signal tracks, part of the signal view. hideEmptySubtracks
allows for two optional variables to build track associations. The first, -m --metaDataVar,
-designates which trackDb variable will be used to build the association. In this example, the peak
-tracks are called on a combination of the signal tracks for each cell. So we would like to display
-both of the signal tracks when the peak track has data.
-The subGroups parameter could be used, however, we see that there are two variables that
+At this point it is important to explain how trackDbIndexBb
makes track associations.
+It will look at the stanza variable line designated by -m --metaDataVar, then look for
+identical matching lines in other stanzas. Since at least one parameter within the line will usually
+differ, such as the designation between peak and signal, -s --subGroupRemove can be used
+to strip out one of the parameters in the line.
+The subGroups parameter could be used. However, we see that there are two variables that differ between the peak and signal stanzas, view and lab. We would have to strip -both of those in order to have matching parameters variables and build an association. On the +both of those to have matching parameter variables and build an association. On the other hand, we could use the metaData parameter. This parameter associates the tracks by the cell, with only the lab variable differing. This would be the best -choice.
+choice as only a single parameter would have to be stripped, lab, as opposed to two, +lab and view, to have matching peak and signal parameters for related tracks.
Now that we know which parameter we would like to use to build associations, we need to use the
second optional parameter, -s --subGroupRemove, to tell hideEmptySubtracks
which variables to strip out in making the association. In this case, we would like to
keep the cell variable, but strip the lab. This means that lab
will be the parameter passed. In this way, associations will be made between any tracks that
match the contents of their metaData parameter once the lab variable has been
stripped out.
-Now that we have chosen our parameters, we will run the utility assuming our chrom.sizes file, +Now that we have chosen our parameters, we will run the utility -- assuming our chrom.sizes file, our trackDb.ra file, and all the supporting programs (bedToBigBed, bigBedToBed, bedtools) are present in the current directory. We will also choose the output to be the current directory:
./trackDbIndexBb uniformDnase exampleTrackDb.ra chrom.sizes -o . -p . -m metadata -s lab
-Note that in this case, we did not need to specify the -o or -p values as the +Note that in this case, we could have omitted the -o and -p values as the current directory is the default for both.
In this small example, the utility would run in a few seconds. But larger inputs containing hundreds of tracks can take hours. Upon completion, two files will be generated:
uniformDnase.multiBed.bb uniformDnase.multiBedSources.tab
The .bb file will be a big multibed containing the coordinates where the tracks intersect, expediting data lookup, and the .tab file will serve as an index for the multibed while also containing the track associations. The .tab file can be quickly examined to ensure proper generation as it should contain a numerical first column, followed by the bigBed -track, then any number of desired track associations, e.x.
+track, then any number of desired track associations, e.g.1 wgEncodeUWDukeDnaseGM12878FdrPeaks wgEncodeDukeDnaseGM12878FdrSignal wgEncodeUWDnaseGM12878FdrSignal 2 wgEncodeUWDukeDnaseH1hESCFdrPeaks wgEncodeDukeDnaseH1hESCFdrSignal wgEncodeUWDnaseH1hESCFdrSignal
Finally, hideEmptySubtracks can be enabled and pointed to the newly generated files on the top composite stanza:
hideEmptySubtracks on hideEmptySubtracksMultiBedUrl uniformDnase.multiBed.bb hideEmptySubtracksSourcesUrl uniformDnase.multiBedSources.tab
More information on how to use track hubs can be found in the