4a6170904fe3901af94b7cf0494e9e991e40115e jcasper Wed Apr 22 03:39:05 2026 -0700 First pass at a faceted composite help doc, refs #36320 diff --git src/hg/htdocs/goldenPath/help/facetedComposite.html src/hg/htdocs/goldenPath/help/facetedComposite.html new file mode 100755 index 00000000000..ce3bf4e4eba --- /dev/null +++ src/hg/htdocs/goldenPath/help/facetedComposite.html @@ -0,0 +1,250 @@ +<!DOCTYPE html> +<!--#set var="TITLE" value="Genome Browser Faceted Composites" --> +<!--#set var="ROOT" value="../.." --> + +<!-- Relative paths to support mirror sites with non-standard GB docs install --> +<!--#include virtual="$ROOT/inc/gbPageStart.html" --> + +<h1>Faceted Composite Tracks</h1> + +<h2>Overview</h2> +<p>Composite tracks are a standard way to collect a large number of related tracks within +the browser and interact with them in a unified interface. For example, our Conservation tracks +are often organized as composites as are each of our "All GENCODE" tracks (see +<a href="../../cgi-bin/hgTrackUi?db=hg38&g=wgEncodeGencodeSuper">here</a>). +The standard user interface for these composite tracks presents a plain list of all of the +subtracks where each one can be configured individually. More structure can be added to that +interface by adding "Views" to the composite, which group subtracks together and provide a +track selection matrix. That system works for intermediate numbers of tracks (around 20-40), +but the matrix approach fails when the number of subtracks scales into the thousands. +Faceted composites provide an alternate user interface for composite tracks that is designed for +these situations.</p> +<p>In a faceted composite, the list of subtracks is presented as a plain list with a list of +facets and text filters that can be used to narrow down the view to tracks of interest. This +is particularly useful for data sets that include information on a wide variety of cell types, +for example, where only a few of them may be of interest for any particular user. Because the +focus is on simply helping users identify which subtracks are relevant to them, the subtrack +configuration options are reduced to "is this subtrack displayed or not". Users can then +alter the display of individual subtracks using the right-click Configure meny from the main +hgTracks browser display.</p> +<p> + +<div class="text-center"> + <img alt="Heatmap track showing a color-coded grid of expression values across genomic positions" src="/images/heatmap_example.png" style="width:80%;max-width:1083px"> +</div> + +<h2>Contents</h2> + +<h6><a href="#quickStart">Quick Start</a> +<h6><a href="#facetedSettings">TrackDb Settings for Building a Faceted Composite</a></h6> +<h6><a href="#troubleshooting">Troubleshooting</a></h6> + +<a id="quickStart"></a> +<h2>Quick Start</h2> +<p> +The TL;DR version of this is that a faceted composite is like any other composite, +but don't add views or subgroups. All tracks in a mix of types live under the same +composite parent. The metadata file (which must be web-accessible) describes the +facet data for the tracks. The composite's trackDb settings must include a "primaryKey" +setting that names one of the fields in the metadata file. Child tracks must have +names that match "<parent_name>_<primaryKey>".<br> +Brief example:<br> +<b>TrackDb entries</b><br> +<pre> + track myComposite + compositeTrack faceted + metaDataUrl https://url/to/metadata.tsv + primaryKey name + shortLabel Blood tests + longLabel Blood tests + + track myComposite_ex1 + parent myComposite + type bigBed + bigDataUrl https://url/to/ex1.bb + shortLabel ex1 peaks + longLabel ex1 Blood data peaks + + track myComposite_ex2 + parent myComposite + type bigBed + bigDataUrl https://url/to/ex2.bb + shortLabel ex1 peaks + longLabel ex2 Blood data peaks +</pre> +<b>metadata.tsv</b><br> +<pre> +name collection_date cell_type lab +ex1 2026-01-01 erythrocyte Richter +ex2 2026-01-03 erythrocyte Helsing +</pre> +</p> + +<a id="facetedSettings"></a> +<h2>TrackDb Settings for Building a Faceted Composite</h2> +<p> +Our <a href="trackDb/trackDbHub.html#faceted_composite">trackDb documentation</a> +includes details about settings relevant to faceted composites, but some of them +deserve a bit more exposition. +</p> +<p> +<b>view</b> and <b>subGroups</b><br> +These settings are not used in faceted composites. Instead, the UI for a faceted +composite is governed by the <code>dataTypes</code> and <code>metaDataUrl</code> +settings. Most composite track needs can be addressed without using the +<code>dataTypes</code> setting at all, so we're going to ignore it to start with. +We'll first consider the case where the <code>dataTypes</code> setting +is not in use, and then a case where it might be helpful and what changes are +required. +</p><p> +In most situations, the desired user interface for a faceted composite track +presents a table where each row is a separate subtrack from the composite. The +user has full flexibility to decide which subtracks they want to see. +Clicking on individual rows adds them to the list of displayed subtracks; +clicking again deselects that track, removing it from the display. Facet filters +are provided to help narrow down the list interactively, as the list of subtracks +is often too long to easily scroll all the way through. +</p><p> +<b>metaDataUrl</b><br> +In order to set up the facets, however, the track needs to include a description +of which facets exist and what the associated values for each track are. This +data comes from a separate web-accessible TSV (tab-separated value) file named +in the <code>metaDataUrl</code> setting of the track.<br> +Example:<br> +<pre> +accession tissue protocol treatment _date __count +SRR11111 blood ATAC-seq control 2026-01-01 12 +SRR11112 blood ATAC-seq IFNg6h 2026-01-01 31 +SRR11113 spleen ATAC-seq control 2024-08-21 8 +SRR11114 spleen ATAC-seq IFNg6h 2026-08-22 17 +</pre> +<p> +This data would be pasted into a file called something like "myTrackMetadata.tsv", +and it would be added to your faceted composite by adding</p> +<pre> +metaDataUrl https://url/to/myTrackMetadata.tsv +</pre> +<p> +to the trackDb settings for the faceted composite track. A particular note +about two field names in this example file. The "date" field begins with one +underscore, and the "count" field begins with two underscores. These prefixes +modify the facet interface for the track. By default, each field apart from +the primaryKey field will have an associated facet created for it on the page, +and a searchbox will be provided in the table. When a field name begins with +one underscore, however, no facet will be created (a search box will still +be provided in the table header). When a field name begins with two underscores, +there will be no facet for it and no search box in the table header. +</p><p> +<b>dataTypes</b><br> +In the above examples, the assumption is that there is one track for each accession. +In some situations, however, there may be multiple tracks associated with each +accession in a formulaic way. For example, each accession could have a raw counts +bigWig track, a scaled counts bigWig track, and a peak calls bigBed track. +Instead of having three entries in the table that all share the same metadata +(one for each track), you can use the <code>dataTypes</code> setting to describe +which data types (raw counts, scaled counts, and peaks) are available for each +sample accession.<br> +When this setting is used, an additional selector is placed near the top of the +page to permit users to identify which data types they want to display. The +selected data types will be turned on for every selected sample in the table, +so the interface is a bit less flexible than the plain one-row-per-track table. +In this alternate setup, however, the one-row-per-sample arrangement can save +significant space both in the configuration UI and in the metadata TSV file.<br> +<em>A very important note here</em>: the rules for subtrack names changes when +the <code>dataTypes</code> setting is active. Without dataTypes, subtrack names +are expected to match <parent track name>_<primary key>. An example +of that can be seen in the quick start near the top of the page. When +dataTypes are used, however, then subtrack names are expected to match +<parent track name>_<primary key>_<dataType>. For example, +if the data types "signal" and "peaks" are in use for that same quick start +example, then the track look like this:<br> +<pre> + track myComposite + compositeTrack faceted + metaDataUrl https://url/to/metadata.tsv + primaryKey name + shortLabel Blood tests + longLabel Blood tests + dataTypes signal peaks + + track myComposite_ex1_signal + parent myComposite + type bigWig + bigDataUrl https://url/to/ex1.bw + shortLabel ex1 signal + longLabel ex1 Blood data signal + + track myComposite_ex1_peaks + parent myComposite + type bigBed + bigDataUrl https://url/to/ex1.bb + shortLabel ex1 peaks + longLabel ex1 Blood data peaks + + track myComposite_ex2_signal + parent myComposite + type bigWig + bigDataUrl https://url/to/ex2.bw + shortLabel ex2 signal + longLabel ex2 Blood data signal + + track myComposite_ex2 + parent myComposite + type bigBed + bigDataUrl https://url/to/ex2.bb + shortLabel ex2 peaks + longLabel ex2 Blood data peaks +</pre> +</p><p> +<b>metadata.tsv</b><br> +<pre> +name collection_date cell_type lab +ex1 2026-01-01 erythrocyte Richter +ex2 2026-01-03 erythrocyte Helsing +</pre> +</p><p> +One other note: sometimes you may wish to have more descriptive text than just "peaks" +or "signal" for the selector, but the better labels aren't compatible with being used +as part of a track name (maybe because they include spaces). This can be handled +by specifying each data type as <code><name>|"<label>"</code>. The "name" +value will be used to generate track names, while the label will be used for display.<br> +Example:<br> +<pre> + dataTypes signal|"Methylation signal (scaled)" peaks|"Highly methylated regions" +</pre> +</p><p> +<b>subtrackUrls</b><br> +It can also be useful to have certain fields provide links out to external resources, +particularly when accessions are in use. The <code>subtrackUrls</code> setting describes +which fields are to be used to generate links out and what the format of those URLs should be. +Bringing back this example metadata file:<br> +<pre> +accession tissue protocol treatment _date __count +SRR11111 blood ATAC-seq control 2026-01-01 12 +SRR11112 blood ATAC-seq IFNg6h 2026-01-01 31 +SRR11113 spleen ATAC-seq control 2024-08-21 8 +SRR11114 spleen ATAC-seq IFNg6h 2026-08-22 17 +</pre> +</p><p> +It might be helpful to provide links out from the accession column to SRA, and the protocol +column to a description page of the protocol. This could be achieved by adding the following +subtrackUrls setting to the composite's trackDb block: +<pre> +subtrackUrls accession=https://www.ncbi.nlm.nih.gov/sra/$$ protocol=https://www.protocols.io/view/$$ +</pre> +</p><p> +For each of these URLs, $$ will be replaced with the relevant value from that field (whether one +of the SRR strings for the accession field, or "ATAC-seq" for the protocol field). +</p><p> + +<a id="troubleshooting"></a> +<h2>Troubleshooting</h2> +<p> +The most likely place to encounter problems when building a faceted composite is a mismatch +between the metadata TSV file and the subtrack names in the trackDb block. Check carefully +to ensure that the values in the primaryKey column match the names of the subtracks, +including capitalization. The hubCheck tool has not yet been updated to automate these +checks, but that work is in progress. +</p> + +<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->