8b8e3dd40b5f555d315dafce8a4bd64ad07bb1b2
dschmelt
  Fri Sep 9 12:51:21 2022 -0700
Proofread and edited for clarity refs #29356

diff --git src/hg/htdocs/goldenPath/help/bigRmsk.html src/hg/htdocs/goldenPath/help/bigRmsk.html
index 839b9ea..8b061e0 100755
--- src/hg/htdocs/goldenPath/help/bigRmsk.html
+++ src/hg/htdocs/goldenPath/help/bigRmsk.html
@@ -1,155 +1,287 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser bigRmsk RepeatMasker Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>bigRmsk Track Format</h1>
 
 <p>
   The bigRmsk format allows for the display of annotations of a genome generated by the
   <a href="http://www.repeatmasker.org/" target="_blank">RepeatMasker</a>
   program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
-It is the recommend method of adding RepeatMaster tracks to assembly hubs. 
-For a descriptions of this features of this track type, with examples, see
-<a href="bigRmskTrackDescExample.html">standard bigRmsk track description</a>.
+  It is the recommended method of adding RepeatMasker tracks to assembly hubs. 
+</p>
 <p>
   The bigRmsk format enables taking the annotation output of RepeatMasker and
   converting it into a compressed and indexed
   <a href="/goldenPath/help/bigBed.html">bigBed</a> file.  Please see this page
   for a details of the bigBed format, its use, and associated tools.
 </p>
 
+<h2>Display Conventions and Configuration</h2>
+
+<h4>Context Sensitive Zooming</h4>
+<p>
+  This track employs a technique which chooses the appropriate visual representation for the data based on the
+  zoom scale and the number of annotations currently in view.  The track will automatically switch from the
+  most detailed visualization ('Full' mode) to the denser view ('Pack' mode) when the window size is greater
+  than 45kb of sequence.  It will further switch to the even denser single line view ('Dense' mode) if more than
+  500 annotations are present in the current view.
+</p>
+<h4>Dense Mode Visualization</h4>
+<p>
+  In dense display mode, a single line is displayed denoting the coverage of repeats using a series
+  of colored boxes.  The boxes are colored based on the classification of the repeat (see below for legend).
+<br>
+<br>
+<img height="30" width="1250" src="/images/rmskDense.png">
+</p>
+<h4>Pack Mode Visualization</h4>
+<p>
+  In pack mode, repeats are represented as sets of joined features.  These are color coded as above based on the
+  class of the repeat, and the further details such as orientation (denoted by chevrons) and a family label are provided.
+  This family label may be optionally turned off in the track configuration.
+<br>
+<br>
+<img height="100" width="1250" src="/images/rmskPack.png">
+<br>
+<br>
+  The pack display mode may also be configured to resemble the original UCSC repeat track.  In this visualization, 
+  repeat features are grouped by classes (see below), and displayed on separate track lines.  The repeat ranges are
+  denoted as grayscale boxes, reflecting both the size of the repeat and
+  the amount of base mismatch, base deletion, and base insertion associated with a repeat element.
+  The higher the combined number of this divergence from the reference, the lighter the shading.
+<br>
+<br>
+<img height="100" width="1250" src="/images/rmskOrigPack.png">
+</p>
+<h4>Full Mode Visualization</h4>
+<p>
+  In the most detailed visualization, repeats are displayed as chevron boxes, indicating the size and orientation of 
+  the repeat.  The interior grayscale shading represents the divergence of the repeat (see above) while the outline color
+  represents the class of the repeat. Dotted lines above the repeat and extending left or right
+  indicate the length of unaligned repeat model sequence and provide context for where a repeat fragment originates in its
+  consensus or pHMM model.  If the length of the unaligned sequence
+  is large, an interruption line and bp size is indicated instead of drawing the extension to scale.
+<br>
+<br>
+<img height="125" width="1250" src="/images/rmskFull.png">
+</p>
+
+<p>
+  For example, the following repeat is a SINE element in the forward orientation with average
+  divergence. Only the 5' proximal fragment of the consensus sequence is aligned to the genome.
+  The 3' unaligned length (384bp) is not drawn to scale and is instead displayed using a set of
+  interruption lines along with the length of the unaligned sequence.
+</p>
+
+<img src="/images/rmskExample1.svg">
+
+<p>
+  Repeats that have been fragmented by insertions or large internal deletions are now represented
+  by join lines.  In the example below, a LINE element is found as two fragments.  The solid
+  connection lines indicate that there are no unaligned consensus bases between the two fragments.
+  Also note these fragments form the 3' extremity of the repeat, as there is no unaligned consensus
+  sequence following the last fragment.
+</p>
+
+<img src="/images/rmskExample2.svg">
+
+<p>
+  In cases where there is unaligned consensus sequence between the fragments, the repeat will look like
+  the following.  The dotted line indicates the length of the unaligned sequence between the two
+  fragments.  In this case the unaligned consensus is longer than the actual genomic distance between
+  these two fragments.
+</p>
+
+<img src="/images/rmskExample3.svg">
+
+<p>
+  If there is consensus overlap between the two fragments, the joining lines will be drawn to indicate
+  how much of the left fragment is repeated in the right fragment.
+</p>
+
+<img src="/images/rmskExample4.svg">
+
+<p>
+  The following table lists the repeat class colors:
+</p>
+
+<table>
+  <thead>
+  <tr>
+    <th style="border-bottom: 2px solid #6678B1;">Color</th>
+    <th style="border-bottom: 2px solid #6678B1;">Repeat Class</th>
+  </tr>
+  </thead>
+  <tr>
+    
+    <td bgcolor="#1F77B4"></td>
+    <td align="left"><b>SINE</b> - Short Interspersed Nuclear Element</td>
+  </tr>
+  <tr>
+    <td bgcolor="#FF7F0E"></td>
+    <td align="left"><b>LINE</b> - Long Interspersed Nuclear Element</td>
+  </tr>
+  <tr>
+    <td bgcolor="#2CA02C"></td>
+    <td align="left"><b>LTR</b> - Long Terminal Repeat</td>
+  </tr>
+  <tr>
+    <td bgcolor="#D62728"></td>
+    <td align="left"><b>DNA</b> - DNA Transposon</td>
+  </tr>
+  <tr>
+    <td bgcolor="#9467BD"></td>
+    <td align="left"><b>Simple</b> - Single Nucleotide Stretches and Tandem Repeats</td>
+  </tr>
+  <tr>
+  <tr>
+    <td bgcolor="#8C564B"></td>
+    <td align="left"><b>Low_complexity</b> - Low Complexity DNA</td>
+  </tr>
+  <tr>
+    <td bgcolor="#E377C2"></td>
+    <td align="left"><b>Satellite</b> - Satellite Repeats</td>
+  </tr>
+  <tr>
+    <td bgcolor="#7F7F7F"></td>
+    <td align="left"><b>RNA</b> - RNA Repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)</td>
+  </tr>
+  <tr>
+    <td bgcolor="#BCBD22"></td>
+    <td align="left"><b>Other</b> - Other Repeats (including class RC - Rolling Circle)</td>
+  </tr>
+  <tr>
+    <td bgcolor="#17BECF"></td>
+    <td align="left"><b>Unknown</b> - Unknown Classification</td>
+  </tr>
+</table>
+
+
+<p>
+  A &quot;?&quot; at the end of the &quot;Family&quot; or &quot;Class&quot; (for example, DNA?)
+  signifies that the curator was unsure of the classification. At some point in the future,
+  either the &quot;?&quot; will be removed or the classification will be changed.</p>
+
+
+
 <h2 id="bigRmsk">bigRmsk track definitions</h2>
 <p>
-  The bigRmsk tracks consist of two bigBed files define by
+  The bigRmsk tracks consist of two bigBed files defined by
   <a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a> schema:
 </p>
 <ul>
-  <li>The primary bigRmsk file, define by <a href="examples/bigRmskBed.as">
+  <li>The primary bigRmsk file, defined by <a href="examples/bigRmskBed.as">
     <em>bigRmskBed.as</em></a>,
     which has the annotations of repeats.
-  <li>The secondary bigRmskAlign file, define by <a href="examples/bigRmskAlignBed.as">
+  <li>The secondary bigRmskAlign file, defined by <a href="examples/bigRmskAlignBed.as">
     <em>bigRmskAlignBed.as</em></a>,
     which contains the alignments of the consensus repeats to the genome.  This file is optional, 
-    if omitted, the bigRmsk track will function, without the ability to view the alignments.
+    if omitted, the bigRmsk track will function without the ability to view the alignments.
 </ul>
 
 <p>
   The input files for the bigRmsk files are created from the RepeatMasker <em>*.out</em> and 
   <em>*.align</em> files
   using the <em>rmToTrackHub.pl</em> program that is included with RepeatMasker.  The bigRmsk
   format is not designed to work with any other type of data.
 </p>
 
 
 <h2 id="steps">Creating a bigRmsk track</h2>
 <p>
   To create a bigRmsk track and its supporting files, follow the below steps.
   This assumes that you have already run RepeatMasker and have a <em>*.out</em>, and
   optionally <em>*.align</em> file.
 </p>
 
 <p>
-  RepeatMasker output files are convert to the bigRmsk textual form using the
+  RepeatMasker output files are converted to the bigRmsk textual form using the
   <em>RepeatMasker/util/rmToTrackHub.pl</em> program that is part of the
   <a href="http://www.repeatmasker.org/RepeatMasker/">RepeatMasker 4.1.3 or newer distribution</a>.
 </p>
 <p>
   <strong>Step 1.</strong>
   If you wish to experiment with quickly building an example track, download the
   example RepeatMasker output files for the human GRCh38 (hg38) assembly
   <a href="examples/bigRmskExample.out">bigRmskExample.out</a>
   and <a href="examples/bigRmskExample.align">bigRmskExample.align</a>
   used in this tutorial:
   <pre>
-    <code>
       wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.out
-      wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.align
-    </code>
-  </pre>
+      wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.align</pre>
 <p>
   Otherwise, substitute your <em>*.out</em> and <em>*.align</em> in theses instructions.
   Generating the alignment bigRmsk file is optional if you don&apos;t have the <em>*.align</em>
   files from RepeatMasker, the track will function with reduced functionality without them.  Just skip the
   steps involved in build the alignment files.
 
 <p>
   <strong>Step 2.</strong>
   Download the autoSql schemes <a href="examples/bigRmskBed.as">bigRmskBed.as</a> and
   <a href="examples/bigRmskAlignBed.as">bigRmskAlignBed.as</a>:
   <pre>
-    <code>
       wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskBed.as
-      wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as
-    </code>
-  </pre>
+      wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as</pre>
 <p>
   You will also need a file of chromosome sizes for your genome, or download the hg38
   file for the example:
   <pre>
-    <code>
-      wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
-    </code>
-  </pre>
+      wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</pre>
 <p>
   <strong>Step 3.</strong>
   Convert the RepeatMasker files to the text format bigRmsk files for conversion to the bigRmsk files with
   <em>rmToTrackHub.pl</em>, which sorts the output for direct input to <em>bedToBigBed</em>:
   <pre>
-    <code>
-      RepeatMasker/util/rmToTrackHub.pl -out bigRmskExample.out -align bigRmskExample.align
-    </code>
-  </pre>
+      RepeatMasker/util/rmToTrackHub.pl -out bigRmskExample.out -align bigRmskExample.align</pre>
 <p>
   <strong>Step 4.</strong>
   Build the bigRmsk and optional bigRmskAlign files:
   <pre>
-    <code>
       bedToBigBed -tab -type=bed9+5 -as=bigRmskBed.as bigRmskExample.join.tsv hg38.chrom.sizes bigRmskExample.bb
-      bedToBigBed -tab -type=bed3+14 -as=bigRmskAlignBed.as bigRmskExample.align.tsv hg38.chrom.sizes bigRmskExampleAlign.bb
-    </code>
-  </pre>
+      bedToBigBed -tab -type=bed3+14 -as=bigRmskAlignBed.as bigRmskExample.align.tsv hg38.chrom.sizes bigRmskExampleAlign.bb</pre>
 
 <p>
 <strong>Step 6.</strong> 
   Place the newly created bigRmsk file (<em>bigRmskExample.bb</em>), and optional
   bigRmskAlign (<em>bigRmskExampleAlign.bb</em>) to a web-accessible http, https
   or ftp location.
 </p>
 <strong>Step 7.</strong>
 <p>
   As with other bigBed-based tracks, bigRmsk tracks can be displayed as
   <a href="hgTracksHelp.html#CustomTracks">custom tracks</a>,
   included in <a href="hubQuickStart.html">track hubs</a>,
   or <a href="hubQuickStartAssembly.html">assembly hubs</a>.
 </p>
 
 <p>
-  The following options are used for bigRmsk custom tracks or trackDb entries:
+  The following options are used for bigRmsk custom tracks (with an equals sign between key and value) or trackDb hub entries as below:
   <ul>
     <li> <code>type bigRmsk</code>
     <li> <code>bigDataUrl<em> &lt;url&gt;</em></code> - URL or relative path of bigRmsk file
     <li> <code>xrefDataUrl<em> &lt;url&gt;</em></code> - URL or relative path of optional bigRmskAlign file
   </ul>
-
+<p>
   A standard bigRmsk track description is available at <a href="../trackDescriptions/bigRmskTrackDesc.html">bigRmskTrackDesc.html</a>,
-  which can be directly to with as the URL:<br>
+  which can be directly linked to with the URL:<br>
   <em>http://genome.ucsc.edu/goldenPath/trackDescriptions/bigRmskTrackDesc.html</em>.
-
+</p>
 <p>
   See the <a href="#examples">Examples</a> section below for detailed examples of bigRmsk custom tracks
   and track hub definitions.
 </p>
 
 <h2 id="examples">Examples</h2>
 
 <h3 id="example1">Example of a bigRmsk custom track</h3>
 <p>
 Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single 
 <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed 
 <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed.
 <p>
 To create a custom track using the example bigRmsk file: 
 <ol>
@@ -174,22 +306,23 @@
   with a stanza such as the following:</p>
 <pre>
     track bigRmskExample
     shortLabel Example bigRmsk
     longLabel This is an example bigRmsk Track Hub Stanza
     type bigRmsk
     visibility full
     html http://genome.ucsc.edu/goldenPath/trackDescriptions/bigRmskTrackDesc.html
     bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb
     xrefDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb
 </pre>
 
 <h2 id="share">Additional information</h2>
 <p>
   See the <a href="bigBed.html">bigBed documentation</a> for guidance on
-  sharing, trouble shooting and extracting data from bigRmsk files.
+  sharing, troubleshooting, and extracting data from bigRmsk files.
 </p>
 
 <h2 id="credits">Credits</h2>
+<p>
   The bigRmsk system was developed by Robert Hubley of the Institute for Systems Biology.
-
+</p>
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->