c572830f58d57d62ce6d87db55efeedaa5bf6f7a hiram Wed Nov 17 10:51:49 2021 -0800 add note about the calculation command and where to obtain source for the program refs #28401 diff --git src/hg/makeDb/trackDb/cpgIslandSuper.html src/hg/makeDb/trackDb/cpgIslandSuper.html index 8106cfc..f139f5b 100644 --- src/hg/makeDb/trackDb/cpgIslandSuper.html +++ src/hg/makeDb/trackDb/cpgIslandSuper.html @@ -46,41 +46,57 @@ The entire genome sequence, masking areas included, was used for the construction of the track <em>Unmasked CpG</em>. The track <em>CpG Islands</em> is constructed on the sequence after all masked sequence is removed. </p> <p>The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden <em>et al</em>. (1987)): <pre> Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)</pre> where N = length of sequence.</p> +<p> +The calculation of the track data is performed by the following command sequence: +<pre> +twoBitToFa <em>assembly.2bit</em> stdout | maskOutFa stdin hard stdout \ + | cpg_lh /dev/stdin 2> cpg_lh.err \ + | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ + | sort -k1,1 -k2,2n > cpgIsland.bed +</pre> +The <em>unmasked</em> track data is constructed from +<em>twoBitToFa -noMask</em> output for the <em>twoBitToFa</em> command. +</p> <h2>Data access</h2> <p> CpG islands and its associated tables can be explored interactively using the <a href="../goldenPath/help/api.html" target="_blank">REST API</a>, the <a href="/cgi-bin/hgTables" target="_blank">Table Browser</a> or the <a href="/cgi-bin/hgIntegrator" target="_blank">Data Integrator</a>. All the tables can also be queried directly from our public MySQL servers, with more information available on our <a target="_blank" href="/goldenPath/help/mysql.html">help page</a> as well as on <a target="_blank" href="http://genome.ucsc.edu/blog/tag/mysql/">our blog</a>.</p> +<p> +The source for the <em>cpg_lh</em> program can be obtained from +<a href="https://genome-source.gi.ucsc.edu/gitlist/kent.git/tree/master/src/utils/cpgIslandExt/" target=_blank>src/utils/cpgIslandExt/</a>. +The <em>cpg_lh</em> program binary can be obtained from: <a href="http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh" download="cpg_lh">http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh</a> (choose "save file") +</p> <h2>Credits</h2> <p>This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished).</p> <h2>References</h2> <p> Gardiner-Garden M, Frommer M. <a href="https://www.sciencedirect.com/science/article/pii/0022283687906899" target="_blank"> CpG islands in vertebrate genomes</a>. <em>J Mol Biol</em>. 1987 Jul 20;196(2):261-82. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/3656447" target="_blank">3656447</a> </p>