ab2f57f4ebbb1b98aaf1ac0033eb2fa1bc36ad87 gperez2 Mon Sep 23 09:06:12 2024 -0700 Adding information to the chain.html about the minus strand in chain files, refs #24858 diff --git src/hg/htdocs/goldenPath/help/chain.html src/hg/htdocs/goldenPath/help/chain.html index e93df00..e61db16 100755 --- src/hg/htdocs/goldenPath/help/chain.html +++ src/hg/htdocs/goldenPath/help/chain.html @@ -53,32 +53,70 @@
  • qSize -- chromosome size (query sequence)
  • qStrand -- strand (query sequence)
  • qStart -- alignment start position (query sequence)
  • qEnd -- alignment end position (query sequence)
  • id -- chain ID
  • The alignment start and end positions are represented as zero-based half-open intervals. For example, the first 100 bases of a sequence would be represented with start position = 0 and end position = 100, and the next 100 bases would be represented as start position = 100 and end -position = 200. When the strand value is "-", position coordinates are listed in terms of -the reverse-complemented sequence.

    +position = 200.

    + +

    +NOTE: When the strand value is "-",the query coordinates (qStart and qEnd) are on +the reverse strand and must be subtracted from the chromosome size to obtain the correct position +on the forward strand in the other genome. The reverse coordinates are subtracted as follows to get +forward strand coordinates:

    +
        qStartForward = qSize - qEnd 
    +    qEndForward   = qSize - qStart
    + +

    +For example, using the query coordinates from chain 5 in +hg38ToMm10.over.chain.gz: +

        chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id
    +    chain 442878230 chr1 248956422 + 158547112 207360161 chr1 195471971 - 21022354 65032227 5
    +

    +The reverse strand coordinates are subtracted from the chromosome size:

    +
        mm10Start = 195471971 - 65032227 = 130439744
    +    mm10End   = 195471971 - 21022354 = 174449617
    +

    The forward strand coordinates for chain 5 on mm10 are +chr1 130439744 174449617, or with 1-based coordinates for a position range, +chr1:130,439,745-174,449,617.

    + +

    +To reverse the calculation and derive the corresponding hg38 coordinates using chain 5 in +mm10ToHg38.over.chain.gz, note that the derived mm10 coordinates match the +tStart and tEnd values:

    +
        chain 442878230 chr1 195471971 + 130439744 174449617 chr1 248956422 - 41596261 90409310 5
    + +

    +The hg38 coordinates are subtracted as follows:

    +
        hg38Start = 248956422 - 90409310 = 158547112
    +    hg38End   = 248956422 - 41596261 = 207360161
    + +

    These coordinates match the target coordinates in +hg38ToMm10.over.chain.gz.

    +

    Alignment Data Lines

    Alignment data lines contain three required attribute values:

        size dt dq