9e3b6ed0cea49664cac3413a3136ff262ee2b581 jcasper Mon May 11 10:01:04 2026 -0700 Working on faceted composite docs page, refs #36320 diff --git src/hg/htdocs/goldenPath/help/facetedComposite.html src/hg/htdocs/goldenPath/help/facetedComposite.html index ce3bf4e4eba..ab7599353d5 100755 --- src/hg/htdocs/goldenPath/help/facetedComposite.html +++ src/hg/htdocs/goldenPath/help/facetedComposite.html @@ -1,134 +1,189 @@ <!DOCTYPE html> <!--#set var="TITLE" value="Genome Browser Faceted Composites" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>Faceted Composite Tracks</h1> <h2>Overview</h2> -<p>Composite tracks are a standard way to collect a large number of related tracks within +<p>Composite tracks are a standard way to collect a number of related tracks within the browser and interact with them in a unified interface. For example, our Conservation tracks are often organized as composites as are each of our "All GENCODE" tracks (see <a href="../../cgi-bin/hgTrackUi?db=hg38&g=wgEncodeGencodeSuper">here</a>). -The standard user interface for these composite tracks presents a plain list of all of the -subtracks where each one can be configured individually. More structure can be added to that -interface by adding "Views" to the composite, which group subtracks together and provide a -track selection matrix. That system works for intermediate numbers of tracks (around 20-40), -but the matrix approach fails when the number of subtracks scales into the thousands. -Faceted composites provide an alternate user interface for composite tracks that is designed for -these situations.</p> -<p>In a faceted composite, the list of subtracks is presented as a plain list with a list of -facets and text filters that can be used to narrow down the view to tracks of interest. This -is particularly useful for data sets that include information on a wide variety of cell types, -for example, where only a few of them may be of interest for any particular user. Because the +The standard user interface works for intermediate numbers of subtracks (around 20-200), +but becomes unusable when that scales up into the thousands. Faceted composites provide +an alternate user interface for composite tracks that is designed fork these situations. +</p><p> +The faceted composite display is particularly useful for data sets where each subtrack has +many potential values to be filtered on (e.g. cell type, protocol, date, experiment scores, etc.), +and where only a few of them may be of interest for any particular user. Because the focus is on simply helping users identify which subtracks are relevant to them, the subtrack configuration options are reduced to "is this subtrack displayed or not". Users can then -alter the display of individual subtracks using the right-click Configure meny from the main +alter the display of individual subtracks using the right-click Configure menu from the main hgTracks browser display.</p> -<p> +<p></p> <div class="text-center"> - <img alt="Heatmap track showing a color-coded grid of expression values across genomic positions" src="/images/heatmap_example.png" style="width:80%;max-width:1083px"> + <img alt="Heatmap track showing a color-coded grid of expression values across genomic positions" src="/images/facet_example.png" style="width:80%;max-width:1083px"> </div> <h2>Contents</h2> -<h6><a href="#quickStart">Quick Start</a> -<h6><a href="#facetedSettings">TrackDb Settings for Building a Faceted Composite</a></h6> +<h6><a href="#quickStart">Quick Start - For those familiar with composite tracks</a></h6> +<h6><a href="#facetedSettings">Slow Start - TrackDb Settings for Building a Faceted Composite</a></h6> <h6><a href="#troubleshooting">Troubleshooting</a></h6> <a id="quickStart"></a> -<h2>Quick Start</h2> +<h2>Quick Start - For those familiar with composite tracks</h2> <p> The TL;DR version of this is that a faceted composite is like any other composite, but don't add views or subgroups. All tracks in a mix of types live under the same composite parent. The metadata file (which must be web-accessible) describes the facet data for the tracks. The composite's trackDb settings must include a "primaryKey" setting that names one of the fields in the metadata file. Child tracks must have names that match "<parent_name>_<primaryKey>".<br> Brief example:<br> <b>TrackDb entries</b><br> <pre> track myComposite compositeTrack faceted metaDataUrl https://url/to/metadata.tsv primaryKey name shortLabel Blood tests longLabel Blood tests track myComposite_ex1 parent myComposite type bigBed bigDataUrl https://url/to/ex1.bb shortLabel ex1 peaks longLabel ex1 Blood data peaks track myComposite_ex2 parent myComposite type bigBed bigDataUrl https://url/to/ex2.bb - shortLabel ex1 peaks + shortLabel ex2 peaks longLabel ex2 Blood data peaks </pre> <b>metadata.tsv</b><br> <pre> name collection_date cell_type lab ex1 2026-01-01 erythrocyte Richter ex2 2026-01-03 erythrocyte Helsing </pre> </p> <a id="facetedSettings"></a> -<h2>TrackDb Settings for Building a Faceted Composite</h2> +<h2>Long Start - TrackDb Settings for Building a Faceted Composite</h2> +<p> +This section walks through building a faceted composite from the +ground up, starting with the bare minimum structure and adding features +piece by piece. By the end, you should have all of the trackDb settings +needed to assemble a fully faceted composite track for your own data. +</p> +<p> +Like any composite track, a faceted composite is built from two kinds of +trackDb entries: a single <b>parent</b> stanza that declares the composite +as a whole, and a collection of <b>child</b> stanzas (also called subtracks) +that each carry the underlying data. The parent is what users see in the +track list on the browser gateway; opening it brings up the faceted +interface that lets users choose which children to display. +</p> +<p> +<b>The parent track</b><br> +At its simplest, a faceted composite parent looks like this: +<pre> + track myComposite + compositeTrack faceted + shortLabel Blood tests + longLabel Blood tests across cell types +</pre> +The line <code>compositeTrack faceted</code> is what tells the browser to use +the faceted user interface rather than the traditional composite matrix. +The <code>shortLabel</code> and <code>longLabel</code> are the names shown +in the track list and on the configuration page. +</p> +<p> +<b>The child tracks</b><br> +Each child stanza names its parent with a <code>parent</code> line and +points at its own data file. A minimal pair of children for the example +above might be: +<pre> + track myComposite_ex1 + parent myComposite + type bigBed + bigDataUrl https://url/to/ex1.bb + shortLabel ex1 peaks + longLabel ex1 Blood data peaks + + track myComposite_ex2 + parent myComposite + type bigBed + bigDataUrl https://url/to/ex2.bb + shortLabel ex2 peaks + longLabel ex2 Blood data peaks +</pre> +Two things worth noting. First, unlike a traditional composite, a faceted +composite happily mixes data types - the children don't all need to be +bigBed, or all bigWig. Second, each child track name follows the convention +<code><parent_name>_<identifier></code>, where the identifier +will line up with a row in the metadata file introduced below. That naming +convention is how the browser ties each subtrack to its metadata, so the +match must be exact, capitalization included. +</p> <p> -Our <a href="trackDb/trackDbHub.html#faceted_composite">trackDb documentation</a> -includes details about settings relevant to faceted composites, but some of them -deserve a bit more exposition. +With just the settings above, the composite will load and display, but it +won't yet have any facets or filters. The remaining sections below cover +the trackDb settings that turn this plain composite into a fully faceted +one. Our <a href="trackDb/trackDbHub.html#faceted_composite">trackDb +documentation</a> gives the full reference for each setting; what follows +is some additional exposition. </p> <p> <b>view</b> and <b>subGroups</b><br> These settings are not used in faceted composites. Instead, the UI for a faceted composite is governed by the <code>dataTypes</code> and <code>metaDataUrl</code> settings. Most composite track needs can be addressed without using the -<code>dataTypes</code> setting at all, so we're going to ignore it to start with. -We'll first consider the case where the <code>dataTypes</code> setting -is not in use, and then a case where it might be helpful and what changes are -required. +<code>dataTypes</code> setting at all, so we are going to ignore it to start with. +After considering an example where the <code>dataTypes</code> setting +is not in use, we will then discuss where it might be helpful and what associated +changes are required. </p><p> In most situations, the desired user interface for a faceted composite track presents a table where each row is a separate subtrack from the composite. The user has full flexibility to decide which subtracks they want to see. Clicking on individual rows adds them to the list of displayed subtracks; clicking again deselects that track, removing it from the display. Facet filters are provided to help narrow down the list interactively, as the list of subtracks is often too long to easily scroll all the way through. </p><p> <b>metaDataUrl</b><br> In order to set up the facets, however, the track needs to include a description of which facets exist and what the associated values for each track are. This data comes from a separate web-accessible TSV (tab-separated value) file named in the <code>metaDataUrl</code> setting of the track.<br> Example:<br> <pre> accession tissue protocol treatment _date __count -SRR11111 blood ATAC-seq control 2026-01-01 12 -SRR11112 blood ATAC-seq IFNg6h 2026-01-01 31 -SRR11113 spleen ATAC-seq control 2024-08-21 8 -SRR11114 spleen ATAC-seq IFNg6h 2026-08-22 17 +SRR11111 blood Omni-ATAC-seq control 2026-01-01 12 +SRR11112 blood Omni-ATAC-seq IFNg6h 2026-01-01 31 +SRR11113 spleen Omni-ATAC-seq control 2024-08-21 8 +SRR11114 spleen Omni-ATAC-seq IFNg6h 2026-08-22 17 </pre> <p> This data would be pasted into a file called something like "myTrackMetadata.tsv", and it would be added to your faceted composite by adding</p> <pre> metaDataUrl https://url/to/myTrackMetadata.tsv </pre> <p> to the trackDb settings for the faceted composite track. A particular note about two field names in this example file. The "date" field begins with one underscore, and the "count" field begins with two underscores. These prefixes modify the facet interface for the track. By default, each field apart from the primaryKey field will have an associated facet created for it on the page, and a search box will be provided in the table. When a field name begins with one underscore, however, no facet will be created (a search box will still @@ -138,69 +193,69 @@ <b>dataTypes</b><br> In the above examples, the assumption is that there is one track for each accession. In some situations, however, there may be multiple tracks associated with each accession in a formulaic way. For example, each accession could have a raw counts bigWig track, a scaled counts bigWig track, and a peak calls bigBed track. Instead of having three entries in the table that all share the same metadata (one for each track), you can use the <code>dataTypes</code> setting to describe which data types (raw counts, scaled counts, and peaks) are available for each sample accession.<br> When this setting is used, an additional selector is placed near the top of the page to permit users to identify which data types they want to display. The selected data types will be turned on for every selected sample in the table, so the interface is a bit less flexible than the plain one-row-per-track table. In this alternate setup, however, the one-row-per-sample arrangement can save significant space both in the configuration UI and in the metadata TSV file.<br> -<em>A very important note here</em>: the rules for subtrack names changes when +<em>A very important note here</em>: the rules for subtrack names change when the <code>dataTypes</code> setting is active. Without dataTypes, subtrack names are expected to match <parent track name>_<primary key>. An example of that can be seen in the quick start near the top of the page. When dataTypes are used, however, then subtrack names are expected to match <parent track name>_<primary key>_<dataType>. For example, if the data types "signal" and "peaks" are in use for that same quick start -example, then the track look like this:<br> +example, then the set of tracks looks like this:<br> <pre> track myComposite compositeTrack faceted metaDataUrl https://url/to/metadata.tsv primaryKey name shortLabel Blood tests longLabel Blood tests dataTypes signal peaks track myComposite_ex1_signal parent myComposite type bigWig bigDataUrl https://url/to/ex1.bw shortLabel ex1 signal longLabel ex1 Blood data signal track myComposite_ex1_peaks parent myComposite type bigBed bigDataUrl https://url/to/ex1.bb shortLabel ex1 peaks longLabel ex1 Blood data peaks track myComposite_ex2_signal parent myComposite type bigWig bigDataUrl https://url/to/ex2.bw shortLabel ex2 signal longLabel ex2 Blood data signal - track myComposite_ex2 + track myComposite_ex2_peaks parent myComposite type bigBed bigDataUrl https://url/to/ex2.bb shortLabel ex2 peaks longLabel ex2 Blood data peaks </pre> </p><p> <b>metadata.tsv</b><br> <pre> name collection_date cell_type lab ex1 2026-01-01 erythrocyte Richter ex2 2026-01-03 erythrocyte Helsing </pre> </p><p> One other note: sometimes you may wish to have more descriptive text than just "peaks" @@ -208,43 +263,59 @@ as part of a track name (maybe because they include spaces). This can be handled by specifying each data type as <code><name>|"<label>"</code>. The "name" value will be used to generate track names, while the label will be used for display.<br> Example:<br> <pre> dataTypes signal|"Methylation signal (scaled)" peaks|"Highly methylated regions" </pre> </p><p> <b>subtrackUrls</b><br> It can also be useful to have certain fields provide links out to external resources, particularly when accessions are in use. The <code>subtrackUrls</code> setting describes which fields are to be used to generate links out and what the format of those URLs should be. Bringing back this example metadata file:<br> <pre> accession tissue protocol treatment _date __count -SRR11111 blood ATAC-seq control 2026-01-01 12 -SRR11112 blood ATAC-seq IFNg6h 2026-01-01 31 -SRR11113 spleen ATAC-seq control 2024-08-21 8 -SRR11114 spleen ATAC-seq IFNg6h 2026-08-22 17 +SRR11111 blood Omni-ATAC-seq control 2026-01-01 12 +SRR11112 blood Omni-ATAC-seq IFNg6h 2026-01-01 31 +SRR11113 spleen Omni-ATAC-seq control 2024-08-21 8 +SRR11114 spleen Omni-ATAC-seq IFNg6h 2026-08-22 17 </pre> </p><p> It might be helpful to provide links out from the accession column to SRA, and the protocol column to a description page of the protocol. This could be achieved by adding the following subtrackUrls setting to the composite's trackDb block: <pre> subtrackUrls accession=https://www.ncbi.nlm.nih.gov/sra/$$ protocol=https://www.protocols.io/view/$$ </pre> </p><p> For each of these URLs, $$ will be replaced with the relevant value from that field (whether one -of the SRR strings for the accession field, or "ATAC-seq" for the protocol field). +of the SRR strings for the accession field, or "Omni-ATAC-seq" for the protocol field). </p><p> +Similar to the dataTypes discussion, there is also a final note here about situations where +you want to use one value in the URL while having another value displayed in the column. And +just as in that case, the solution is to use <code><value>|"<label>"</code> in that +field in the metadata TSV file. The above example wouldn't quite work right because the actual +URL for the protocol is "https://www.protocols.io/view/omni-atac-seq-improved-atac-seq-protocol-14egn94jyl5d". +Clearly, however, we don't want to use "omni-atac-seq-improved-atac-seq-protocol-14egn94jyl5d" +in the display for people reading through the table. By setting up the rows like this instead, +we maintain a clean display while providing links to the right protocol: +<pre> +accession tissue protocol treatment _date __count +SRR11111 blood omni-atac-seq-improved-atac-seq-protocol-14egn94jyl5d|"Omni-ATAC-seq" control 2026-01-01 12 +SRR11112 blood omni-atac-seq-improved-atac-seq-protocol-14egn94jyl5d|"Omni-ATAC-seq" IFNg6h 2026-01-01 31 +SRR11113 spleen omni-atac-seq-improved-atac-seq-protocol-14egn94jyl5d|"Omni-ATAC-seq" control 2024-08-21 8 +SRR11114 spleen omni-atac-seq-improved-atac-seq-protocol-14egn94jyl5d|"Omni-ATAC-seq" IFNg6h 2026-08-22 17 +</pre> +</p<p> <a id="troubleshooting"></a> <h2>Troubleshooting</h2> <p> The most likely place to encounter problems when building a faceted composite is a mismatch between the metadata TSV file and the subtrack names in the trackDb block. Check carefully to ensure that the values in the primaryKey column match the names of the subtracks, including capitalization. The hubCheck tool has not yet been updated to automate these checks, but that work is in progress. </p> <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->