src/hg/htdocs/goldenPath/help/bigRmsk.html e19b35681f137d5bcc141f968112bf77476fac28

e19b35681f137d5bcc141f968112bf77476fac28
markd
  Tue Sep 13 17:09:32 2022 -0700
added maxWindowToDraw to bigRmsk example

diff --git src/hg/htdocs/goldenPath/help/bigRmsk.html src/hg/htdocs/goldenPath/help/bigRmsk.html
index 8b061e0..497a48f 100755
--- src/hg/htdocs/goldenPath/help/bigRmsk.html
+++ src/hg/htdocs/goldenPath/help/bigRmsk.html
@@ -1,328 +1,329 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser bigRmsk RepeatMasker Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>bigRmsk Track Format</h1>
 
 <p>
   The bigRmsk format allows for the display of annotations of a genome generated by the
   <a href="http://www.repeatmasker.org/" target="_blank">RepeatMasker</a>
   program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
   It is the recommended method of adding RepeatMasker tracks to assembly hubs. 
 </p>
 <p>
   The bigRmsk format enables taking the annotation output of RepeatMasker and
   converting it into a compressed and indexed
   <a href="/goldenPath/help/bigBed.html">bigBed</a> file.  Please see this page
   for a details of the bigBed format, its use, and associated tools.
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 
 <h4>Context Sensitive Zooming</h4>
 <p>
   This track employs a technique which chooses the appropriate visual representation for the data based on the
   zoom scale and the number of annotations currently in view.  The track will automatically switch from the
   most detailed visualization ('Full' mode) to the denser view ('Pack' mode) when the window size is greater
   than 45kb of sequence.  It will further switch to the even denser single line view ('Dense' mode) if more than
   500 annotations are present in the current view.
 </p>
 <h4>Dense Mode Visualization</h4>
 <p>
   In dense display mode, a single line is displayed denoting the coverage of repeats using a series
   of colored boxes.  The boxes are colored based on the classification of the repeat (see below for legend).
 <br>
 <br>
 <img height="30" width="1250" src="/images/rmskDense.png">
 </p>
 <h4>Pack Mode Visualization</h4>
 <p>
   In pack mode, repeats are represented as sets of joined features.  These are color coded as above based on the
   class of the repeat, and the further details such as orientation (denoted by chevrons) and a family label are provided.
   This family label may be optionally turned off in the track configuration.
 <br>
 <br>
 <img height="100" width="1250" src="/images/rmskPack.png">
 <br>
 <br>
   The pack display mode may also be configured to resemble the original UCSC repeat track.  In this visualization, 
   repeat features are grouped by classes (see below), and displayed on separate track lines.  The repeat ranges are
   denoted as grayscale boxes, reflecting both the size of the repeat and
   the amount of base mismatch, base deletion, and base insertion associated with a repeat element.
   The higher the combined number of this divergence from the reference, the lighter the shading.
 <br>
 <br>
 <img height="100" width="1250" src="/images/rmskOrigPack.png">
 </p>
 <h4>Full Mode Visualization</h4>
 <p>
   In the most detailed visualization, repeats are displayed as chevron boxes, indicating the size and orientation of 
   the repeat.  The interior grayscale shading represents the divergence of the repeat (see above) while the outline color
   represents the class of the repeat. Dotted lines above the repeat and extending left or right
   indicate the length of unaligned repeat model sequence and provide context for where a repeat fragment originates in its
   consensus or pHMM model.  If the length of the unaligned sequence
   is large, an interruption line and bp size is indicated instead of drawing the extension to scale.
 <br>
 <br>
 <img height="125" width="1250" src="/images/rmskFull.png">
 </p>
 
 <p>
   For example, the following repeat is a SINE element in the forward orientation with average
   divergence. Only the 5' proximal fragment of the consensus sequence is aligned to the genome.
   The 3' unaligned length (384bp) is not drawn to scale and is instead displayed using a set of
   interruption lines along with the length of the unaligned sequence.
 </p>
 
 <img src="/images/rmskExample1.svg">
 
 <p>
   Repeats that have been fragmented by insertions or large internal deletions are now represented
   by join lines.  In the example below, a LINE element is found as two fragments.  The solid
   connection lines indicate that there are no unaligned consensus bases between the two fragments.
   Also note these fragments form the 3' extremity of the repeat, as there is no unaligned consensus
   sequence following the last fragment.
 </p>
 
 <img src="/images/rmskExample2.svg">
 
 <p>
   In cases where there is unaligned consensus sequence between the fragments, the repeat will look like
   the following.  The dotted line indicates the length of the unaligned sequence between the two
   fragments.  In this case the unaligned consensus is longer than the actual genomic distance between
   these two fragments.
 </p>
 
 <img src="/images/rmskExample3.svg">
 
 <p>
   If there is consensus overlap between the two fragments, the joining lines will be drawn to indicate
   how much of the left fragment is repeated in the right fragment.
 </p>
 
 <img src="/images/rmskExample4.svg">
 
 <p>
   The following table lists the repeat class colors:
 </p>
 
 <table>
   <thead>
   <tr>
     <th style="border-bottom: 2px solid #6678B1;">Color</th>
     <th style="border-bottom: 2px solid #6678B1;">Repeat Class</th>
   </tr>
   </thead>
   <tr>
     
     <td bgcolor="#1F77B4"></td>
     <td align="left"><b>SINE</b> - Short Interspersed Nuclear Element</td>
   </tr>
   <tr>
     <td bgcolor="#FF7F0E"></td>
     <td align="left"><b>LINE</b> - Long Interspersed Nuclear Element</td>
   </tr>
   <tr>
     <td bgcolor="#2CA02C"></td>
     <td align="left"><b>LTR</b> - Long Terminal Repeat</td>
   </tr>
   <tr>
     <td bgcolor="#D62728"></td>
     <td align="left"><b>DNA</b> - DNA Transposon</td>
   </tr>
   <tr>
     <td bgcolor="#9467BD"></td>
     <td align="left"><b>Simple</b> - Single Nucleotide Stretches and Tandem Repeats</td>
   </tr>
   <tr>
   <tr>
     <td bgcolor="#8C564B"></td>
     <td align="left"><b>Low_complexity</b> - Low Complexity DNA</td>
   </tr>
   <tr>
     <td bgcolor="#E377C2"></td>
     <td align="left"><b>Satellite</b> - Satellite Repeats</td>
   </tr>
   <tr>
     <td bgcolor="#7F7F7F"></td>
     <td align="left"><b>RNA</b> - RNA Repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)</td>
   </tr>
   <tr>
     <td bgcolor="#BCBD22"></td>
     <td align="left"><b>Other</b> - Other Repeats (including class RC - Rolling Circle)</td>
   </tr>
   <tr>
     <td bgcolor="#17BECF"></td>
     <td align="left"><b>Unknown</b> - Unknown Classification</td>
   </tr>
 </table>
 
 
 <p>
   A &quot;?&quot; at the end of the &quot;Family&quot; or &quot;Class&quot; (for example, DNA?)
   signifies that the curator was unsure of the classification. At some point in the future,
   either the &quot;?&quot; will be removed or the classification will be changed.</p>
 
 
 
 <h2 id="bigRmsk">bigRmsk track definitions</h2>
 <p>
   The bigRmsk tracks consist of two bigBed files defined by
   <a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a> schema:
 </p>
 <ul>
   <li>The primary bigRmsk file, defined by <a href="examples/bigRmskBed.as">
     <em>bigRmskBed.as</em></a>,
     which has the annotations of repeats.
   <li>The secondary bigRmskAlign file, defined by <a href="examples/bigRmskAlignBed.as">
     <em>bigRmskAlignBed.as</em></a>,
     which contains the alignments of the consensus repeats to the genome.  This file is optional, 
     if omitted, the bigRmsk track will function without the ability to view the alignments.
 </ul>
 
 <p>
   The input files for the bigRmsk files are created from the RepeatMasker <em>*.out</em> and 
   <em>*.align</em> files
   using the <em>rmToTrackHub.pl</em> program that is included with RepeatMasker.  The bigRmsk
   format is not designed to work with any other type of data.
 </p>
 
 
 <h2 id="steps">Creating a bigRmsk track</h2>
 <p>
   To create a bigRmsk track and its supporting files, follow the below steps.
   This assumes that you have already run RepeatMasker and have a <em>*.out</em>, and
   optionally <em>*.align</em> file.
 </p>
 
 <p>
   RepeatMasker output files are converted to the bigRmsk textual form using the
   <em>RepeatMasker/util/rmToTrackHub.pl</em> program that is part of the
   <a href="http://www.repeatmasker.org/RepeatMasker/">RepeatMasker 4.1.3 or newer distribution</a>.
 </p>
 <p>
   <strong>Step 1.</strong>
   If you wish to experiment with quickly building an example track, download the
   example RepeatMasker output files for the human GRCh38 (hg38) assembly
   <a href="examples/bigRmskExample.out">bigRmskExample.out</a>
   and <a href="examples/bigRmskExample.align">bigRmskExample.align</a>
   used in this tutorial:
   <pre>
       wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.out
       wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.align</pre>
 <p>
   Otherwise, substitute your <em>*.out</em> and <em>*.align</em> in theses instructions.
   Generating the alignment bigRmsk file is optional if you don&apos;t have the <em>*.align</em>
   files from RepeatMasker, the track will function with reduced functionality without them.  Just skip the
   steps involved in build the alignment files.
 
 <p>
   <strong>Step 2.</strong>
   Download the autoSql schemes <a href="examples/bigRmskBed.as">bigRmskBed.as</a> and
   <a href="examples/bigRmskAlignBed.as">bigRmskAlignBed.as</a>:
   <pre>
       wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskBed.as
       wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as</pre>
 <p>
   You will also need a file of chromosome sizes for your genome, or download the hg38
   file for the example:
   <pre>
       wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</pre>
 <p>
   <strong>Step 3.</strong>
   Convert the RepeatMasker files to the text format bigRmsk files for conversion to the bigRmsk files with
   <em>rmToTrackHub.pl</em>, which sorts the output for direct input to <em>bedToBigBed</em>:
   <pre>
       RepeatMasker/util/rmToTrackHub.pl -out bigRmskExample.out -align bigRmskExample.align</pre>
 <p>
   <strong>Step 4.</strong>
   Build the bigRmsk and optional bigRmskAlign files:
   <pre>
       bedToBigBed -tab -type=bed9+5 -as=bigRmskBed.as bigRmskExample.join.tsv hg38.chrom.sizes bigRmskExample.bb
       bedToBigBed -tab -type=bed3+14 -as=bigRmskAlignBed.as bigRmskExample.align.tsv hg38.chrom.sizes bigRmskExampleAlign.bb</pre>
 
 <p>
 <strong>Step 6.</strong> 
   Place the newly created bigRmsk file (<em>bigRmskExample.bb</em>), and optional
   bigRmskAlign (<em>bigRmskExampleAlign.bb</em>) to a web-accessible http, https
   or ftp location.
 </p>
 <strong>Step 7.</strong>
 <p>
   As with other bigBed-based tracks, bigRmsk tracks can be displayed as
   <a href="hgTracksHelp.html#CustomTracks">custom tracks</a>,
   included in <a href="hubQuickStart.html">track hubs</a>,
   or <a href="hubQuickStartAssembly.html">assembly hubs</a>.
 </p>
 
 <p>
   The following options are used for bigRmsk custom tracks (with an equals sign between key and value) or trackDb hub entries as below:
   <ul>
     <li> <code>type bigRmsk</code>
     <li> <code>bigDataUrl<em> &lt;url&gt;</em></code> - URL or relative path of bigRmsk file
     <li> <code>xrefDataUrl<em> &lt;url&gt;</em></code> - URL or relative path of optional bigRmskAlign file
   </ul>
 <p>
   A standard bigRmsk track description is available at <a href="../trackDescriptions/bigRmskTrackDesc.html">bigRmskTrackDesc.html</a>,
   which can be directly linked to with the URL:<br>
   <em>http://genome.ucsc.edu/goldenPath/trackDescriptions/bigRmskTrackDesc.html</em>.
 </p>
 <p>
   See the <a href="#examples">Examples</a> section below for detailed examples of bigRmsk custom tracks
   and track hub definitions.
 </p>
 
 <h2 id="examples">Examples</h2>
 
 <h3 id="example1">Example of a bigRmsk custom track</h3>
 <p>
 Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single 
 <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed 
 <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed.
 <p>
 To create a custom track using the example bigRmsk file: 
 <ol>
   <li>
     Construct a track line that references the file:<br>
     <pre><code>track type=bigRmsk name=&quot;bigRmsk Example&quot; description=&quot;RepeatMasker example&quot; visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb xrefDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb</code></pre>
   </li>
   <li>
     Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management page</a>
     for the human assembly hg38 (Dec. 2013).
   </li> 
   <li>
     Click the &quot;submit&quot; button.
   </li>
   <li>
     Navigate to <code>chr1:8,890-35,190</code> to see the track.
   </li>
 </ol>
 <h3 id="example2">Example of a bigRmsk track hub </h3>
 <p>
   This example can also be loaded in a Track or Assembly Hub <em>trackDb.txt</em>
   with a stanza such as the following:</p>
 <pre>
     track bigRmskExample
     shortLabel Example bigRmsk
     longLabel This is an example bigRmsk Track Hub Stanza
     type bigRmsk
+    maxWindowToDraw 10000000
     visibility full
     html http://genome.ucsc.edu/goldenPath/trackDescriptions/bigRmskTrackDesc.html
     bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb
     xrefDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb
 </pre>
 
 <h2 id="share">Additional information</h2>
 <p>
   See the <a href="bigBed.html">bigBed documentation</a> for guidance on
   sharing, troubleshooting, and extracting data from bigRmsk files.
 </p>
 
 <h2 id="credits">Credits</h2>
 <p>
   The bigRmsk system was developed by Robert Hubley of the Institute for Systems Biology.
 </p>
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->