ab2f57f4ebbb1b98aaf1ac0033eb2fa1bc36ad87 gperez2 Mon Sep 23 09:06:12 2024 -0700 Adding information to the chain.html about the minus strand in chain files, refs #24858 diff --git src/hg/htdocs/goldenPath/help/chain.html src/hg/htdocs/goldenPath/help/chain.html index e93df00..e61db16 100755 --- src/hg/htdocs/goldenPath/help/chain.html +++ src/hg/htdocs/goldenPath/help/chain.html @@ -53,32 +53,70 @@ <li> <strong><code>qSize</code></strong> -- chromosome size (query sequence)</li> <li> <strong><code>qStrand</code></strong> -- strand (query sequence)</li> <li> <strong><code>qStart</code></strong> -- alignment start position (query sequence)</li> <li> <strong><code>qEnd</code></strong> -- alignment end position (query sequence)</li> <li> <strong><code>id</code></strong> -- chain ID</li> </ul> <p> The alignment start and end positions are represented as zero-based half-open intervals. For example, the first 100 bases of a sequence would be represented with start position = 0 and end position = 100, and the next 100 bases would be represented as start position = 100 and end -position = 200. When the strand value is "-", position coordinates are listed in terms of -the reverse-complemented sequence.</p> +position = 200.</p> + +<p> +<b>NOTE</b>: When the strand value is "-",the query coordinates (qStart and qEnd) are on +the reverse strand and must be subtracted from the chromosome size to obtain the correct position +on the forward strand in the other genome. The reverse coordinates are subtracted as follows to get +forward strand coordinates:</p> +<pre><code> qStartForward = qSize - qEnd + qEndForward = qSize - qStart</code></pre> + +<p> +For example, using the query coordinates from chain 5 in +<a href="https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToMm10.over.chain.gz" +target="_blank">hg38ToMm10.over.chain.gz</a>: +<pre><code> chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id + chain 442878230 chr1 248956422 + 158547112 207360161 chr1 195471971 - 21022354 65032227 5</code></pre> +<p> +The reverse strand coordinates are subtracted from the chromosome size:</p> +<pre><code> mm10Start = 195471971 - 65032227 = 130439744 + mm10End = 195471971 - 21022354 = 174449617</code></pre> +<p>The forward strand coordinates for chain 5 on mm10 are +<code>chr1 130439744 174449617</code>, or with 1-based coordinates for a position range, +chr1:130,439,745-174,449,617.</p> + +<p> +To reverse the calculation and derive the corresponding hg38 coordinates using chain 5 in +<a href="https://hgdownload.soe.ucsc.edu/goldenPath/mm10/liftOver/mm10ToHg38.over.chain.gz" +target="_blank">mm10ToHg38.over.chain.gz</a>, note that the derived mm10 coordinates match the +<code>tStart</code> and <code>tEnd</code> values:</p> +<pre><code> chain 442878230 chr1 195471971 + 130439744 174449617 chr1 248956422 - 41596261 90409310 5</code></pre> + +<p> +The hg38 coordinates are subtracted as follows:</p> +<pre><code> hg38Start = 248956422 - 90409310 = 158547112 + hg38End = 248956422 - 41596261 = 207360161</code></pre> + +<p>These coordinates match the target coordinates in +<a href="https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToMm10.over.chain.gz" +target="_blank">hg38ToMm10.over.chain.gz</a>.</p> + <p> <strong>Alignment Data Lines</strong></p> <p> Alignment data lines contain three required attribute values:<p> <pre> <em>size</em> <em>dt</em> <em>dq</em></pre> <ul> <li> <strong><code>size</code></strong> -- the size of the ungapped alignment</li> <li> <strong><code>dt</code></strong> -- the difference between the end of this block and the beginning of the next block (reference/target sequence)</li> <li> <strong><code>dq</code></strong> -- the difference between the end of this block and the beginning of the next block (query sequence)</li> </ul>