src/hg/makeDb/trackDb/human/hg18/wgEncodeHudsonalphaChipSeq.html 198c9b8daecc44fbda6a6494c566c723920f030a

198c9b8daecc44fbda6a6494c566c723920f030a
lrnassar
  Wed Mar 11 18:25:21 2026 -0700
Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM.

diff --git src/hg/makeDb/trackDb/human/hg18/wgEncodeHudsonalphaChipSeq.html src/hg/makeDb/trackDb/human/hg18/wgEncodeHudsonalphaChipSeq.html
index e56edd96759..ad7b5374894 100644
--- src/hg/makeDb/trackDb/human/hg18/wgEncodeHudsonalphaChipSeq.html
+++ src/hg/makeDb/trackDb/human/hg18/wgEncodeHudsonalphaChipSeq.html
@@ -1,204 +1,204 @@
 <h2>Description</h2>
 
 <DIV style='max-width: 900px'>
 <p>This track displays binding sites of the specified transcription factors
 in the given cell types as identified by chromatin immunoprecipitation
 followed by high-throughput sequencing (ChIP-Seq &mdash; see Johnson DS,
 <em>et al</em>., 2007 and Fields S, 2007).
 </P><P>
 The ChIP-seq method was used to assay chromatin fragments bound by
 specific or general transcription factors as described below.  DNA isolated
 by ChIP-seq was size-selected (~225 bp) and sequenced.  Short reads of
 25-35 nt were mapped to the human reference genome, and enriched regions
 of high read density relative to a total input chromatin control reads
 were identified.
 </P><P>
 Included for each cell type is a control signal, which represents the control
 condition where the DNA:protein crosslinks were reversed and DNA fragments
 were sequenced with no immunoprecipitation (IP).
 </P><P>
 The sequence reads, quality scores, and alignment coordinates from
 these experiments are available for download.
 </P>
 
 <h2>Track Conventions</h2>
 
 This track is a multi-view composite track that contains multiple data types
 (<EM>views</EM>). For each view, there are multiple subtracks that
 display individually on the browser.  Instructions for configuring multi-view
 tracks are  <A HREF="/goldenPath/help/multiView.html" TARGET=_BLANK>here</A>.
 The subtracks in this track are grouped by transcription factor targeted with
 an antibody for ChIP and by cell type.  For each experiment (cell type vs.
 antibody), the following views are included:
 <DL>
 <DT><I>Peaks</I></DT><DD>Sites with the greatest evidence of transcription
 factor binding.</DD>
 <DT><I>Raw Signal</I></DT><DD>A continuous signal which indicates density of
 aligned reads.
 The sequence reads were extended to the size-selected length (225 bp),
 and the read density computed using the bedItemOverlapCount
 utility.  This annotation was generated by the ENCODE Data Coordination
 Center at UCSC.</DD>
 </DL>
 </p>
 
 <a name="Methods"></a><h2>Methods</h2>
 
 <P>
 Cells were grown according to the approved
 <A HREF="/ENCODE/protocols/cell" TARGET=_BLANK>ENCODE cell culture protocols</A>.
 Briefly, cross-linked chromatin was immunoprecipitated with antibody, the
 protein:DNA crosslinks were reversed and the DNA fragments were recovered
 and sequenced.  Because these experiments were carried out over the course
 of two years, several changes and improvements were made to the original
 protocol (see Johnson DS, <em>et al</em>. 2007).  The major differences between
-protocols are the number of cells and magnetic beads used for IP, the the
+protocols are the number of cells and magnetic beads used for IP, the
 method of sonication used to fragment DNA, and the number of cycles of PCR
 used to amplify the sequencing library.  The most current protocol used by
 the Myers Lab can be found
 <a href="http://hudsonalpha.org/myers-lab/protocols"
 TARGET=_blank title="http://hudsonalpha.org/myers-lab/protocols">
 here</a>. The sequencing libraries labeled as PCR2x were
 made with two rounds of amplification (25 and 15 cycles) and those labeled
 as PCR1x were made with one 15-cycle round of amplification.  Biological
 replicates from each experiment were completed.  For specific details on the
 protocol used for a ChIP of interest (number of cells, DNA fragmentation and
 sequencing library construction), please contact the Myers Lab at the contact
 information provided below.
 </P><P>
 Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina
 Genome Analyzer IIx according to the manufacturer's recommendations.  Sequence
 data produced by the Illumina data pipeline software were quality filtered and
 then mapped to NCBI Build36 (hg18) using the integrated Eland software; 25 to
 36 bp of the sequence reads were used for alignment; up to two mismatches were
 tolerated; reads that mapped to multiple sites in the genome were discarded.
 </P><P>
 To identify likely binding sites, peak calling was applied to the aligned
 sequence data sets using either Quantitative Enrichment of Sequence Tags
 (<a href="http://mendel.stanford.edu/sidowlab/downloads/quest" TARGET=_blank
 title="http://mendel.stanford.edu/sidowlab/downloads/quest">QuEST</a>,
 see Valouev A, <em>et al.</em>, 2008) or Model-based Analysis of ChIP-Seq (
 <a href="http://liulab.dfci.harvard.edu/MACS/00README.html" TARGET=_blank
 title="http://liulab.dfci.harvard.edu/MACS/00README.html">MACS</a>,
 see Zhang Y, <em>et al.</em>, 2008).  Experiments for which peak calling was completed
 using MACS are labeled as "<i>softwareVersion:</i> MACS" in the list above and
 can be found by clicking on the metadata link "..." for the Peaks subtrack.  Experiments
 for which QuEST was used do not have a software version annotated.  QuEST is based
 on the kernel density estimation approach, which uses ChIP-seq data to determine
 positions where protein complexes contact DNA.  QuEST uses data in the form of
 genome coordinates ('tags') obtained from mapping several million sequencing
 reads to a reference genome.  Tags from forward and reverse reads cluster on
 opposite sides of the transcription factor binding site.  QuEST first
 constructs two separate profiles, one for forward and one for reverse tags.
 QuEST identifies candidates for combined density profile (CDP) peaks as
 positions in the reference genome corresponding to local maxima of the CDP with
 sufficient enrichment compared to the control data.  MACS empirically models
 the shift size of ChIP-seq tags, and uses it to improve the spatial resolution
 of predicted binding sites.  MACS also uses a dynamic Poisson distribution to
 capture local biases in the genome, allowing for more robust predictions
 (see Zhang Y, <em>et al.</em>, 2008).
 </P>
 
 <h2>Validation</h2>
 
 <p>Quantitative polymerase chain reaction (qPCR) assays can be used to
 validate the transcription factor binding sites found using ChIP-Seq.  Regions
 of enriched read density reported by QuEST are reported as a single genomic
 coordinate (peak) for each enriched region.  These peaks are ranked according
 to the ChIP-Seq enrichment ratio and qPCR assays are used to validate the set
 of overlapping peaks between replicates for each cell line.  qPCR primer pairs
 were designed to interrogate the list of ordered peaks in common between
 replicates.  Amplicons were 60-100 bp in length and were completely contained
 within 250 bp of either side of the peak coordinate.  For each primer pair, qPCR
 assays were performed on biological replicate ChIP samples on both cell lines
 and total input chromatin DNA was recovered.  Enrichment was calculated as a
 ratio of amount of target DNA over the average of a pair of negative control
 primers.
 An assay was considered positive when it had a two-fold or greater enrichment of
 the average qPCR replicates (see Valouev A, <em>et al.</em>, 2008).
 </p>
 
 <P>
 <B>Notes:</B><BR>
 <a name="pA"><em>Protocol pA</em></a>: Unless otherwise noted, datasets were
 generated using a protocol that involved a single round of PCR (15 cycles)
 to prepare DNA fragment libraries.  Certain earlier datasets, however, were
 generated using a protocol with two rounds of PCR (25 + 15 cycles).  These
 datasets contain "PCR2x" in their label and metadata.  Peaks for these
 experiments were called using "Input <i>(PCR2x)</i>" for background.
 </p>
 
 <h2>Release Notes</h2>
 
 <P>
 Update of Release 2 (Dec 2010): Six experiments previously labeled as performed on the SK-N-SH_RA
 cell line were corrected to the HTB-11 cell line. Additionally, another set of experiments
 previously labeled as having the protocol PCR1X were corrected to PCR2X.
 <P>
 This is Release 2 (May 2010) of this track, which includes 73 new experiments
 covering 13 new cell lines and 27 antibodies.  Additionally, DEX and EtOH
 treatments have been included on the A549 cell line.  The HepG2/SRF experiment
 currently only has the first replicate.  For A549/GR, data from the two
 replicates have been combined into one.  Finally, the inputs for GM12878 and
 K562 for protocol PCR1x were resubmitted to ensure naming integrity and
 consistency with the PCR2x versions. They have been marked 'V2'. For new
 versions of previously-released data, the affected database tables and files
 include 'V2' in the name, and metadata is marked with "submittedDataVersion=V2",
 followed by the reason for replacement. Previous versions of these files are
 available for download from the
 <A HREF="ftp://hgdownload.soe.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/"
  TARGET=_BLANK>FTP site</A>.
 
 <a name="Credits"></a><h2>Credits</h2>
 <p>These data were provided by the <a href='http://myers.hudsonalpha.org/'
 TARGET=_blank>Myers Lab</a> at the <a href='http://www.hudsonalpha.org/'
 TARGET=_blank>HudsonAlpha Institute for Biotechnology</a>.</p>
 <P>
 Contact:
 <A HREF="mailto:&#102;p&#97;&#117;&#108;&#105;&#64;&#104;&#117;&#100;&#115;&#111;&#110;&#97;&#108;&#112;&#104;a.
 &#111;r&#103;">Flo Pauli</A>.
 <!-- above address is fpauli at hudsonalpha.org (encodeEmail.pl) -->
 <!--<A HREF="mailto:&#114;r&#97;&#117;&#99;h&#64;&#115;&#116;a&#110;&#102;o&#114;&#100;.
 e&#100;&#117;">Rami Rauch</A>-->
 </P>
 
 <a name="References"></a><h2>References</h2>
 
 <p>
 Fields S.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/17556576" target="_blank">
 Molecular biology. Site-seeing by sequencing</a>.
 <em>Science</em>. 2007 Jun 8;316(5830):1441-2.
 </p>
 
 <p>
 Johnson DS, Mortazavi A, Myers RM, Wold B.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/17540862" target="_blank">
 Genome-wide mapping of in vivo protein-DNA interactions</a>.
 <em>Science</em>. 2007 Jun 8;316(5830):1497-502.
 </p>
 
 <p>
 Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, Myers RM, Sidow A.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/19160518" target="_blank">
 Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data</a>.
 <em>Nat Methods</em>. 2008 Sep;5(9):829-34.
 </p>
 
 <p>
 Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W <em>et al</em>.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/18798982" target="_blank">
 Model-based analysis of ChIP-Seq (MACS)</a>.
 <em>Genome Biol</em>. 2008;9(9):R137.
 </p>
 
 <H2>Data Release Policy</H2>
 
 <P>Data users may freely use ENCODE data, but may not, without prior
 consent, submit publications that use an unpublished ENCODE dataset until
 nine months following the release of the dataset.  This date is listed in
 the <EM>Restricted Until</EM> column on the track configuration page and
 the download page.  The full data release policy for ENCODE is available
 <A HREF="../ENCODE/terms.html" TARGET=_BLANK>here</A>.</P>
 
 </DIV>