ab2f57f4ebbb1b98aaf1ac0033eb2fa1bc36ad87
gperez2
  Mon Sep 23 09:06:12 2024 -0700
Adding information to the chain.html about the minus strand in chain files, refs #24858

diff --git src/hg/htdocs/goldenPath/help/chain.html src/hg/htdocs/goldenPath/help/chain.html
index e93df00..e61db16 100755
--- src/hg/htdocs/goldenPath/help/chain.html
+++ src/hg/htdocs/goldenPath/help/chain.html
@@ -1,112 +1,150 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser Chain Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>Chain Format</h1> 
 <p> 
 The chain format describes a pairwise alignment that allow gaps in both sequences simultaneously. 
 Each set of chain alignments starts with a header line, contains one or more alignment data lines, 
 and terminates with a blank line. The format is deliberately quite dense.</p>  
 <p> 
 <strong><em>Example:</em></strong></p>
 <pre><code>    chain 4900 chrY 58368225 + 25985403 25985638 chr5 151006098 - 43257292 43257528 1
      9       1       0
      10      0       5
      61      4       0
      16      0       4
      42      3       0
      16      0       8
      14      1       0
      3       7       0
      48
 
      chain 4900 chrY 58368225 + 25985406 25985566 chr5 151006098 - 43549808 43549970 2
      16      0       2
      60      4       0
      10      0       4
      70</code></pre> 
 <p>
 <strong>Header Lines</strong> 
 <p>
 <pre><code>    <strong>chain</strong> <em>score</em> <em>tName</em> <em>tSize</em> <em>tStrand</em> <em>tStart</em> <em>tEnd</em> <em>qName</em> <em>qSize</em> <em>qStrand</em> <em>qStart</em> <em>qEnd</em> <em>id</em> </code></pre> 
 <p>
 The initial header line starts with the keyword <strong><code>chain</code></strong>, followed by 
 11 required attribute values, and ends with a blank line. The attributes include: 
 <ul> 
   <li> 
   <strong><code>score</code></em></strong> -- chain score</li>
   <li>
   <strong><code>tName</code></strong> -- chromosome (reference/target sequence)</li>
   <li>
   <strong><code>tSize</code></strong> -- chromosome size (reference/target sequence)</li>
   <li>
   <strong><code>tStrand</code></strong> -- strand (reference/target sequence)</li>
   <li>
   <strong><code>tStart</code></strong> -- alignment start position (reference/target sequence)</li>
   <li>
   <strong><code>tEnd</code></strong> -- alignment end position (reference/target sequence)</li>
   <li>
   <strong><code>qName</code></strong> -- chromosome (query sequence)</li>
   <li>
   <strong><code>qSize</code></strong> -- chromosome size (query sequence)</li>
   <li>
   <strong><code>qStrand</code></strong> -- strand (query sequence)</li>
   <li>
   <strong><code>qStart</code></strong> -- alignment start position (query sequence)</li>
   <li>
   <strong><code>qEnd</code></strong> -- alignment end position (query sequence)</li>
   <li>
   <strong><code>id</code></strong> -- chain ID</li>
 </ul> 
 <p> 
 The alignment start and end positions are represented as zero-based half-open intervals. For 
 example, the first 100 bases of a sequence would be represented with start position = 0 and end 
 position = 100, and the next 100 bases would be represented as start position = 100 and end 
-position = 200. When the strand value is &quot;-&quot;, position coordinates are listed in terms of 
-the reverse-complemented sequence.</p> 
+position = 200.</p>
+
+<p>
+<b>NOTE</b>: When the strand value is &quot;-&quot;,the query coordinates (qStart and qEnd) are on
+the reverse strand and must be subtracted from the chromosome size to obtain the correct position
+on the forward strand in the other genome. The reverse coordinates are subtracted as follows to get
+forward strand coordinates:</p>
+<pre><code>    qStartForward = qSize - qEnd 
+    qEndForward   = qSize - qStart</code></pre>
+
+<p>
+For example, using the query coordinates from chain 5 in
+<a href="https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToMm10.over.chain.gz"
+target="_blank">hg38ToMm10.over.chain.gz</a>:
+<pre><code>    chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id
+    chain 442878230 chr1 248956422 + 158547112 207360161 chr1 195471971 - 21022354 65032227 5</code></pre>
+<p>
+The reverse strand coordinates are subtracted from the chromosome size:</p>
+<pre><code>    mm10Start = 195471971 - 65032227 = 130439744
+    mm10End   = 195471971 - 21022354 = 174449617</code></pre>
+<p>The forward strand coordinates for chain 5 on mm10 are
+<code>chr1 130439744 174449617</code>, or with 1-based coordinates for a position range,
+chr1:130,439,745-174,449,617.</p>
+
+<p>
+To reverse the calculation and derive the corresponding hg38 coordinates using chain 5 in
+<a href="https://hgdownload.soe.ucsc.edu/goldenPath/mm10/liftOver/mm10ToHg38.over.chain.gz"
+target="_blank">mm10ToHg38.over.chain.gz</a>, note that the derived mm10 coordinates match the
+<code>tStart</code> and <code>tEnd</code> values:</p>
+<pre><code>    chain 442878230 chr1 195471971 + 130439744 174449617 chr1 248956422 - 41596261 90409310 5</code></pre>
+
+<p>
+The hg38 coordinates are subtracted as follows:</p>
+<pre><code>    hg38Start = 248956422 - 90409310 = 158547112
+    hg38End   = 248956422 - 41596261 = 207360161</code></pre>
+
+<p>These coordinates match the target coordinates in
+<a href="https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToMm10.over.chain.gz"
+target="_blank">hg38ToMm10.over.chain.gz</a>.</p>
+
 <p>
 <strong>Alignment Data Lines</strong></p> 
 <p> 
 Alignment data lines contain three required attribute values:<p>
 <pre>    <em>size</em> <em>dt</em> <em>dq</em></pre>
 <ul> 
   <li> 
   <strong><code>size</code></strong> -- the size of the ungapped alignment</li>
   <li> 
   <strong><code>dt</code></strong> -- the difference between the end of this block and the beginning of 
   the next block (reference/target sequence)</li>
   <li> 
   <strong><code>dq</code></strong> -- the difference between the end of this block and the beginning of 
   the next block (query sequence)</li>
 </ul> 
 <p> 
 NOTE: The last line of the alignment section contains only one number: the ungapped alignment size 
 of the last block.</p>
 
 <a name="rearrangement"></a> 
 <h3>&quot;Snake&quot; rearrangement display</h3>
 <p>
 Rearrangement display, sometimes called snakes display, is an alternative way to view pairwise alignments.
 It is available for PSL and chain format tracks.</p>
 <p>
 Rearrangement display is a representation of the path that the sequence follows in the &quot;other&quot; sequence. 
 You start in the upper left and move to the right, following the lines if you come to the end of a block. If a block 
 is red, which means it is a match on the negative strand, then you reverse your course and start going from right to left.
 The gray lines mean there are no bases in the other sequence between the blocks. Orange lines means there are some 
 bases in there that are not aligning.</p>
 <p>The display type can be enabled on the track configuration page of eligible tracks. Below are two examples for clarity.</p>
 <p>
 <div class="text-left">
   <img src="../../images/rearrangementExample1.png" alt="Rearrangement display example 1" width='55%'>
 </div><br>
 This example shows a tandem duplication on CDH1 which duplicates 3 exons.</p>
 <p>
 <div class="text-left">
   <img src="../../images/rearrangementExample2.png" alt="Rearrangement display example 2" width='55%'>
 </div><br>
 This example shows an inversion, which can be identified by the red colored sequence.</p>
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->