d43ec6011a8adc1ba82e3b2c2473e2043d300c03
hiram
  Thu Apr 19 14:10:07 2012 -0700
UCSC source from CVS source tree 2005 version
diff --git src/utils/cpgIslandExt/README src/utils/cpgIslandExt/README
new file mode 100644
index 0000000..0d98529
--- /dev/null
+++ src/utils/cpgIslandExt/README
@@ -0,0 +1,26 @@
+README (added at UCSC)
+
+"make" --> gcc readseq.c cpg_lh.c -o cpglh.exe
+
+We've been running this on hard-masked sequence (RepeatMasker and TRF 
+with period <= 12 results are masked to 'N') in order to avoid CpG 
+islands in Alu repeats in human.
+
+The original cpg.c was written by Gos Miklem from the Sanger Center.  
+LaDeana Hillier added some modifications --> cpg_lh.c, and UCSC has 
+added some further modifications to cpg_lh.c, so that its expected 
+number of CpGs in an island is calculated as described in 
+  Gardiner-Garden, M. and M. Frommer, 1987 
+  CpG islands in vertebrate genomes. J. Mol. Biol. 196:261-282.
+
+    Expected = (Number of C's * Number of G's) / Length
+
+Instead of a sliding-window search for CpG islands, this cpg program
+uses a running-sum score where a 'C' followed by a 'G' increases the
+score by 17 and anything else decreases the score by 1.  When the
+score transitions from positive to 0 (and at the end of the sequence),
+the sequence in the current span is evaluated to see if it qualifies
+as a CpG island (>200 bp length, >50% GC, >0.6 ratio of observed CpG
+to expected).  Then the search recurses on the span from the position
+with the max running score up to the current position.
+