ab2f57f4ebbb1b98aaf1ac0033eb2fa1bc36ad87 gperez2 Mon Sep 23 09:06:12 2024 -0700 Adding information to the chain.html about the minus strand in chain files, refs #24858 diff --git src/hg/htdocs/goldenPath/help/chain.html src/hg/htdocs/goldenPath/help/chain.html index e93df00..e61db16 100755 --- src/hg/htdocs/goldenPath/help/chain.html +++ src/hg/htdocs/goldenPath/help/chain.html @@ -53,32 +53,70 @@
qSize
-- chromosome size (query sequence)qStrand
-- strand (query sequence)qStart
-- alignment start position (query sequence)qEnd
-- alignment end position (query sequence)id
-- chain IDThe alignment start and end positions are represented as zero-based half-open intervals. For example, the first 100 bases of a sequence would be represented with start position = 0 and end position = 100, and the next 100 bases would be represented as start position = 100 and end -position = 200. When the strand value is "-", position coordinates are listed in terms of -the reverse-complemented sequence.
+position = 200. + ++NOTE: When the strand value is "-",the query coordinates (qStart and qEnd) are on +the reverse strand and must be subtracted from the chromosome size to obtain the correct position +on the forward strand in the other genome. The reverse coordinates are subtracted as follows to get +forward strand coordinates:
+ qStartForward = qSize - qEnd
+ qEndForward = qSize - qStart
+
++For example, using the query coordinates from chain 5 in +hg38ToMm10.over.chain.gz: +
chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id
+ chain 442878230 chr1 248956422 + 158547112 207360161 chr1 195471971 - 21022354 65032227 5
++The reverse strand coordinates are subtracted from the chromosome size:
+ mm10Start = 195471971 - 65032227 = 130439744
+ mm10End = 195471971 - 21022354 = 174449617
+The forward strand coordinates for chain 5 on mm10 are
+chr1 130439744 174449617
, or with 1-based coordinates for a position range,
+chr1:130,439,745-174,449,617.
+To reverse the calculation and derive the corresponding hg38 coordinates using chain 5 in
+mm10ToHg38.over.chain.gz, note that the derived mm10 coordinates match the
+tStart
and tEnd
values:
chain 442878230 chr1 195471971 + 130439744 174449617 chr1 248956422 - 41596261 90409310 5
+
++The hg38 coordinates are subtracted as follows:
+ hg38Start = 248956422 - 90409310 = 158547112
+ hg38End = 248956422 - 41596261 = 207360161
+
+These coordinates match the target coordinates in +hg38ToMm10.over.chain.gz.
+Alignment Data Lines
Alignment data lines contain three required attribute values:
size dt dq
size
-- the size of the ungapped alignmentdt
-- the difference between the end of this block and the beginning of
the next block (reference/target sequence)dq
-- the difference between the end of this block and the beginning of
the next block (query sequence)