08afd09bf225ea50691998f829926e9d71d55b49 lrnassar Wed Mar 11 19:21:42 2026 -0700 Fixing small grammatical and consistency errors with the heatmap page and trackDb entry. Also some bigger fixes like a broken anchor. Refs #36176 diff --git src/hg/htdocs/goldenPath/help/heatmap.html src/hg/htdocs/goldenPath/help/heatmap.html index 0a81bdecdea..9a89c999b7d 100755 --- src/hg/htdocs/goldenPath/help/heatmap.html +++ src/hg/htdocs/goldenPath/help/heatmap.html @@ -3,43 +3,43 @@ <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>Positional Heatmap Display</h1> <h2>Overview</h2> <p>The standard display mode for a bigBed track is a simple block or exon/intron marker in the window. Extra fields in the bigBed, however, can contain a variety of additional data. When data in the extra fields meet the schema described below, then the simple block display can be replaced with a positional heatmap. The heatmap provides a sparse 2-dimensional grid for information like expression of allele-specific point mutations across a transcript.</p> <div class="text-center"> - <a href="http://genome.ucsc.edu/s/jcasper/heatmap_example" target="_blank"> + <a href="https://genome.ucsc.edu/s/jcasper/heatmap_example" target="_blank"> <img src="/images/heatmap_example.png" style="width:80%;max-width:1083px"></a> </div> <h2>Contents</h2> <h6><a href="#extrafields">The Extra Fields</a></h6> <h6><a href="#gettingStarted">Getting Started</a></h6> <h6><a href="#troubleshooting">Troubleshooting</a></h6> -<a id="annotating"></a> +<a id="extrafields"></a> <h2>The Extra Fields</h2> -<p>A heatmap file bigBed file starts with the standard 12 BED fields, but adds 7 more. Moreover, +<p>A heatmap bigBed file starts with the standard 12 BED fields, but adds 7 more. Moreover, the blockSizes and chromStarts fields take on a slightly different interpretation. </p> <pre> string chrom; "Chromosome (or contig, scaffold, etc.)" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "Name of item" uint score; "Score from 0-1000" char[1] strand; "+ or -" uint thickStart; "Start of where display should be thick (start codon)" uint thickEnd; "End of where display should be thick (stop codon)" uint reserved; "Used as itemRgb as of 2004-11-22" int blockCount; "Number of blocks" int[blockCount] blockSizes; "Comma separated list of block sizes" int[blockCount] chromStarts; "Start positions relative to chromStart" @@ -53,85 +53,85 @@ </pre> <p> The _rowCount field describes how many rows exist in each item's heatmap, while the blockCount field describes the number of columns. The labels field provides the labels for each row of the heatmap; no column labels are currently supported, as they are already tied to positions.</p> <p>The next two fields are a bit more complicated. Heatmaps convert numerical scores into different colors for the heatmap cells, but the way the scores are translated into colors is controlled by the <code>_colorBounds</code> and <code>_colorValues</code> arrays. Conceptually, these settings pick some score thresholds and associate colors with those scores. Any scores between two adjacent thresholds receive a color that is interpolated between those two thresholds. Any scores outside the bounds just copy the value of the nearest boundary.</p> <p>For example, if <code>_colorBounds</code> is set to 0,500,1000 and <code>_colorValues</code> is set to #000000,#FF0000,#FFFFFF (black, red, and white, respectively), then the following will hold. Any score 0 or below will be drawn in black, the score 500 will be drawn in pure red, and any score of 1000 or more will be drawn in white. The score 250 is halfway between the black and red thresholds, -so it would be drawn in a color halfway between #000000 and #FF0000, or about #490000.</p> +so it would be drawn in a color halfway between #000000 and #FF0000, or about #800000.</p> <p>The <code>_scoreArray</code> and <code>_labelArray</code> fields contain the actual scores and mouseover text for each cell within the heatmap in row-major order, meaning the list of scores will fill the first row, then overflow to begin filling the second row, then the third row and so on. Because these are comma-separated lists, a cell can be left empty simply by placing nothing between the commas that demarcate it.</p> -<p>For example, if exonCount is 2 and _rowCount is 3, then the line will describe a 6-cell heatmap +<p>For example, if blockCount is 2 and _rowCount is 3, then the line will describe a 6-cell heatmap with 3 rows and 2 columns. If the accompanying <code>_scoreArray</code> is "1,0,2,,0.5,,", then -that will describe a heatmap where the top-left corner has the score 1, the top-middle score is 0, -the top-right score is 2, and the bottom-middle score is 0.5. The bottom-left and bottom-right -cells will be left empty, as no score was provided for them.</p> +that will describe a heatmap where row 1 has scores 1 and 0, row 2 has the score 2 in the first +column and an empty second column, and row 3 has 0.5 in the first column and an empty second +column.</p> <p>This format is a bit awkward to describe with only 6 cells; it becomes difficult or impossible to edit manually when the heatmaps reach sufficient size (like hundreds or even thousands). We strongly recommend the use of automated scripts to create these files. </p> <p> The final field is <code>legend</code>, which is much simpler. The text in this field is used to create a legend for the heatmap that will be displayed at the top after the "name" from the BED file. </p> <a id="gettingStarted"></a> <h2>Getting started with heatmaps</h2> <p> <strong>The basic bed fields</strong>:<br> As noted above, it is difficult to manage sizeable heatmap examples without resorting to scripting and automation. Small examples are the easiest ones to experiment with. For the sake of this example, imagine that we already have a list of two transcripts, T1 and T2, that we want to make heatmaps for. T1 and T2 themselves are described in a bed file like so:</p> <pre> chr1 1000 2000 T1 1000 + 1000 2000 0 2 200,200, 0,800 chr1 1200 2400 T2 1000 - 1200 2400 0 3 500,800,300 0,600,900 </pre> <p>Separately, we have a list of four heatmap boxes that we want to draw for each of these transcripts. For T1, the heatmap boxes cover bases 100-200 of the transcript (1100-1200 in the genome) and 700-800 (1700-1800 in the genome), each with two boxes labeled Case1 and Case2. For T2, the labels are the same -but the bases covered are 200-300 and 900-1000.</p> <p>Creating heapmaps for +but the bases covered are 200-300 and 900-1000.</p> <p>Creating heatmaps for the transcripts means changing the exon structure of each one - instead of representing exon boundaries, those fields will be used to describe the regions where we want to draw the heatmap boxes. We'll also need to add extra fields with the remainder of the heatmap data, but let's start with the exons. We immediately have a problem - in a BED file, the exons are expected to span the length of the item. If our first line is intended to show a transcript on chr1 from base 1000 to 2000, then the first exon needs to start at 1000 and the last exon needs to end at 2000, even though we don't have heatmap data for those "exons". There are two ways to get around this.</p> <p>Option 1 is to simply reduce the size of each transcript to match the extent of where we want to draw the heatmap. For that first line, if we only have heatmap data for bases 100-200 and 700-800 in the transcript (bases 1100-1200 and 1700-1800 in the genome), then those are the new bounds for our transcript (note that this means changing chromStart/End, thickStart/End, and the relative chromStarts values for the exons):</p> <pre> chr1 1100 1800 T1 1000 + 1100 1800 0 2 100,100 0,600 -chr1 1400 2200 T2 1000 - 1200 2400 0 2 100,100 0,700 +chr1 1400 2200 T2 1000 - 1400 2200 0 2 100,100 0,700 </pre> <p>Note that in addition to changing the blockCount, blockSizes, and chromStarts, we also needed to change thickStart and thickEnd to indicate where the first exon "starts" and the last one "ends". </p> <p>Option 2 is to add a fake exon on each edge of the transcript to pad out the exon list (assuming that our heatmap data doesn't already reach the edges of the transcript - if it does, then this problem goes away). These fake exons can then be associated with no score value in the list of heatmap scores, which means no heatmap color will be drawn there, but the bounding box of the heatmap will still extend for the full length of the transcript (1000-2000).</p> <pre> chr1 1000 2000 T1 1000 + 1000 2000 0 4 1,100,100,1 0,100,700,999 chr1 1200 2400 T2 1000 - 1200 2400 0 4 1,100,100,1 0,200,900,1199 </pre> @@ -139,31 +139,31 @@ with Option 2. </p> <p> <strong>The extra fields:</strong><br> Now that we've filled in the basic BED fields, we need to populate the remaining seven heatmap-specific fields. As we said earlier, in this example we have two rows that we want to draw in each heatmap, which we've decided to label "Case 1" and "Case 2". This means that our <code>_rowCount</code> value will be 2, and the <code>_labels</code> value will be "Case1,Case2". Here's the updated BED, though we're not done yet:</p> <pre> chr1 1000 2000 T1 1000 + 1000 2000 0 4 1,100,100,1 0,100,700,999 2 "Case 1","Case 2" chr1 1200 2400 T2 1000 - 1200 2400 0 4 1,100,100,1 0,200,900,1199 2 "Case 1","Case 2" </pre> <p>The next two fields, <code>_colorBounds</code> and <code>_colorValues</code>, are tied together. A heatmap displays a gradient of color, which is intended to convey score information. Frequently -only two colors bounds are used, for the minimum anc maximum values, and intermediate scores are transformed +only two colors bounds are used, for the minimum and maximum values, and intermediate scores are transformed into a color shade between the two. If the maximum score is 5, associated with red (<span style="background-color:red; color:red;">#</span>), and the minimum score is -5, associated with blue (<span style="background-color:blue; color:blue;">#</span>), then a score of 0 would be purple - midway between the two (<span style="background-color:#7f007f; color:#7f007f;">#</span>). That would be a heatmap with two colorBounds and two colorValues - the bounds are 5 and -5, and the values are red and blue. If you reduced the colorBounds, to 3 and -3, you'd keep the same color scheme, but some of your heatmap scores would now saturate at full red or blue because they went past the outer score boundary on either side.</p> <p>In some situations, though, you might want to keep the association of red with positive scores and blue with negative ones without mixing the two. A score of 0 should instead be drawn in white, positive scores should range from faint red (for positive scores close to 0) to full red (for positive scores at or exceeding the max threshold). Negative scores, meanwhile, should be drawn in blue (faint blue close to 0, intense blue at the minimum threshold). This display would use three colorBounds and three colorValues - the bounds are -5, 0, and 5, and the values are blue, white, and red. Here is an example of what that would look like in our BED:</p> <pre> @@ -188,45 +188,46 @@ chr1 1200 2400 T2 1000 - 1200 2400 0 4 1,100,100,1 0,200,900,1199 2 "Case 1","Case 2" 3 -5,0,5 #0000ff,#ffffff,#ff0000 ,0.1,-2,,,1.1,-3.5,, </pre> <p> If you are looking closely, you will note that there is an extra comma at the end of the scores list. We need that comma because the final score value is empty, and it's hard to tell the difference between whether that's an intentional empty value or just the end of the list. For this situation, we do it by including the extra comma. If the final value had data (like 3.5), then we could end the list with 3.5 and skip the final comma (though including it would also be okay - we assume there isn't a score after the final comma, so "3.5" and "3.5," will be treated the same way).</p> <p> All that is left now is to fill in a similarly-structured comma-separated list for cell-specific mouseover labels and a legend. The mouseover labels can be used to do things like indicate the numerical score value (because viewers will otherwise only see the heatmap color) or provide other useful contextual information like "no data" or the HGVS term describing a mutation at that position.</p> <pre> -chr1 1000 2000 T1 1000 + 1000 2000 0 4 1,100,100,1 0,100,700,999 2 "Case 1","Case 2" 3 -5,0,5 #0000ff,#ffffff,#ff0000 ,2.8,,,,-4,8.9,, ,"2.8, medium","no data",,,"-4.8, low","8.9, extreme",,, "Example on transcript 1" +chr1 1000 2000 T1 1000 + 1000 2000 0 4 1,100,100,1 0,100,700,999 2 "Case 1","Case 2" 3 -5,0,5 #0000ff,#ffffff,#ff0000 ,2.8,,,,-4,8.9,, ,"2.8, medium","no data",,,"-4, low","8.9, extreme",,, "Example on transcript 1" chr1 1200 2400 T2 1000 - 1200 2400 0 4 1,100,100,1 0,200,900,1199 2 "Case 1","Case 2" 3 -5,0,5 #0000ff,#ffffff,#ff0000 ,0.1,-2,,,1.1,-3.5,, ,"0.1, negligible","-2, low",,,"1.1 marginal","-3.5 low",, "Example on transcript 2" </pre> <p>Here is this example in a <a href="examples/heatmap.bed" target="_blank">BED file</a> (using tabs as field separators), and the corresponding <a href="examples/heatmap.bb" target="_blank">bigBed file</a>. The bigBed was created from the bed file using the following command:</p> <pre> bedToBigBed -tab -type=bed12+ -as=heatmap.as heatmap.bed chrom.sizes heatmap.bb </pre> <p>A copy of heatmap.as is available <a href="examples/heatmap.as" target="_blank">here</a>. chrom.sizes files for most assemblies can be found on our <a href="https://hgdownload.soe.ucsc.edu" target="_blank">download server</a>. +</p> <div class="text-center"> - <a href="http://genome.ucsc.edu/s/jcasper/heatmap_example" target="_blank"> + <a href="https://genome.ucsc.edu/s/jcasper/heatmap_example" target="_blank"> <img src="/images/heatmap_example2.png" style="width:80%;max-width:1083px"></a> </div> <a id="troubleshooting"></a> <h2>Troubleshooting</h2> <p> The most likely place to encounter errors when building a heatmap file is when running the <code>bedToBigBed</code> program. The score and label arrays can be difficult to organize, and we highly recommend making use of a bit of scripting to automate the process. The errors reported by bedToBigBed are usually helpful for identifying which part of the input isn't organized correctly, but please <a href="../../contacts.html">contact us</a> if you continue to have issues. </p> <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->