src/hg/makeDb/trackDb/human/hg19/wgEncodeRegTfbsClusteredV3.html 198c9b8daecc44fbda6a6494c566c723920f030a

198c9b8daecc44fbda6a6494c566c723920f030a
lrnassar
  Wed Mar 11 18:25:21 2026 -0700
Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM.

diff --git src/hg/makeDb/trackDb/human/hg19/wgEncodeRegTfbsClusteredV3.html src/hg/makeDb/trackDb/human/hg19/wgEncodeRegTfbsClusteredV3.html
index 34dccc967bb..657fff5c969 100644
--- src/hg/makeDb/trackDb/human/hg19/wgEncodeRegTfbsClusteredV3.html
+++ src/hg/makeDb/trackDb/human/hg19/wgEncodeRegTfbsClusteredV3.html
@@ -1,200 +1,200 @@
 <h2>Description</h2>
 <p>
 This track shows regions of transcription factor binding derived from a large collection
 of ChIP-seq experiments performed by the ENCODE project, together with DNA binding motifs 
 identified within these regions by the ENCODE
 <a target="_blank" href="http://v1.factorbook.org/mediawiki/index.php/Welcome_to_factorbook">
 Factorbook</a> repository.</p>
 <p>
 Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to
 regulate gene expression.  Some TFs contain a DNA binding domain and can bind directly to 
 specific short DNA sequences ('motifs');
 others bind to DNA indirectly through interactions with TFs containing a DNA binding domain.
 High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation
 followed by sequencing, or 'ChIP-seq') can be used to identify regions of
 TF binding genome-wide.  These regions are commonly called ChIP-seq peaks.</p>
 <p>
 ENCODE TFBS ChIP-seq data were processed using the computational pipeline developed
 by the ENCODE Analysis Working Group to generate uniform peaks of TF binding.
 Peaks for 161 transcription factors in
 91 cell types are combined here into clusters to produce a summary display showing 
 occupancy regions for each factor and motif sites within the regions when identified.
 Additional views of the underlying ChIP-seq data and documentation on the methods used
 to generate it are available from the
 <a href="../cgi-bin/hgTrackUi?db=hg19&g=wgEncodeAwgTfbsUniform" 
 target="_blank">ENCODE Uniform TFBS</a> track.
 <!-- 
 The 
 <a href="../cgi-bin/hgTrackUi?db=hg19&g=factorbookMotifPos" target=_blank">
 Factorbook Motif</a> track shows the complete set of motif locations
 identified in the uniform ENCODE ChIP-seq peaks.
 -->
 </p>
 
 <h2>Display Conventions</h2>
 <p>
 A gray box encloses each peak cluster of transcription factor occupancy, with the
 darkness of the box being proportional to the maximum signal strength observed in any cell line
 contributing to the cluster. The HGNC gene name for the transcription factor is shown 
 to the left of each cluster. Within a cluster, a green highlight indicates 
 the highest scoring site of a Factorbook-identified canonical motif for
 the corresponding factor. (NOTE: motif highlights are shown
 only in browser windows of size 50,000 bp or less, and their display can be suppressed by unchecking
 the highlight motifs box on the track configuration page).
 Arrows on the highlight designate the matching strand of the motif.
 </p>
 <p>
 The cell lines where signal was detected for the factor are identified by single-letter 
 abbreviations shown to the right of the cluster.  
 The darkness of each letter is proportional to the signal strength observed in the cell line. 
 Abbreviations starting with capital letters designate
 <a href="https://www.encodeproject.org/search/?type=Biosample&organism.scientific_name=Homo+sapiens"
 target="_blank">ENCODE cell types</a> identified for intensive study - Tier 1 and Tier 2 - 
 while those starting with lowercase letters designate Tier 3 cell lines.</p>
 <p>
 Click on a peak cluster to see more information about the TF/cell assays contributing to the
 cluster, the cell line abbreviation table, and details about the highest scoring canonical 
 motif in the cluster.
 </p>
 
 <h2>Methods</h2>
 <p>
 <p>
 Peaks of transcription factor occupancy from uniform processing of ENCODE ChIP-seq data
 by the ENCODE Analysis Working Group were filtered to exclude datasets that did not pass the
 integrated quality metric
 (see &quot;Quality Control&quot; section of <a target="_blank" href=../cgi-bin/hgTrackUi?g=wgEncodeAwgTfbsUniform>Uniform TFBS</a>) 
 and then were clustered using the UCSC hgBedsToBedExps tool.  
 Scores were assigned to peaks by multiplying the input signal values by a normalization
 factor calculated as the ratio of the maximum score value (1000) to the signal value at one
 standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the
 effect of distributing scores up to mean plus one 1 standard deviation across the score range,
 but assigning all above to the maximum score.
 The cluster score is the highest score for any peak contributing to the cluster.</p>  
 <p>
 The Factorbook motif discovery and annotation pipeline uses
 the MEME-ChIP and FIMO tools from the <a href="https://meme-suite.org/meme/doc/overview.html"
 target="_blank">MEME</a> software suite in conjunction with machine learning methods and
 manual curation to merge discovered motifs with known motifs reported in 
 <a target="blank" href="https://jaspar.genereg.net//">Jaspar</a> and
 <a href="https://portal.biobase-international.com/build_t/idb/1.0/html/bkldoc/source/bkl/transfac%20suite/transfac/tf_intro.html"
 target="_blank">TransFac</a>.
 Motif identifications reported in Wang et al. 2012 (below) were supplemented in this track
 with more recent data (derived from newer ENCODE datasets - Jan 2011 through Mar 2012 freezes),
 provided by the Factorbook team.  Motif identifications from all datasets were merged, with
 the most significant value (qvalue) reported being picked when motifs were duplicated in
 multiple cell lines.  The scores for the selected best-scoring motif sites were then transformed
 to -log10.
 
 </p>
 
 <h2>Release Notes</h2>
 <p>
 Release 4 (February 2014) of this track adds display of the Factorbook motifs.
 Release 3 (August 2013) added 124 datasets (690 total, vs. 486 in Release 2),
 representing all ENCODE TF ChIP-seq passing quality assessment through 
 the ENCODE March 2012 data freeze.
 The peaks used to generate these clusters were called with less stringent thresholds than 
 used during the January 2011 uniform processing shown in Release 2 of this track.
 The contributing datasets are displayed as individual
 tracks in the ENCODE Uniform TFBS track, which is  available along with the primary data tracks
 in the <a href="../cgi-bin/hgTrackUi?g=wgEncodeTfBindingSuper" 
 target="_blank">ENC TF Binding Supertrack</a> page.
 The clustering for V3/V4 is based on the transcription factor target, and so differs from V2 where clustering was based on antibody.
 </p>
 <p>
 For the V3/V4 releases, a new track table format, 'factorSource' was used to represent 
 the primary clusters table and downloads file, <em>wgEncodeRegTfbsClusteredV3</em>.  
 This format consists of standard BED5
 fields (see <a target="_blank" href="../FAQ/FAQformat.html#format1">File Formats</a>) 
 followed by an experiment count field (expCount) and finally two fields containing comma-separated lists.
 The first list field (expNums) contains numeric identifiers for experiments,
 keyed to the <em>wgEncodeRegTfbsClusteredInputsV3</em> table,
 which includes such information as the experiment's underlying Uniform TFBS table name,
 factor targeted, antibody used, cell type, treatment (if any), and laboratory source.  
 The second list field (expScores) contains the scores for the corresponding experiments. 
 For convenience, the 
 <a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/">
 file downloads directory</a>
 for this track also contains a BED file,
 <em>wgEncodeRegTfbsClusteredWithCellsV3</em>, that lists each cluster with the cluster score followed by a comma-separated list of cell types.
 </p>
 <p>
 The <a target=_blank
 href="http://v1.factorbook.org/mediawiki/index.php/Welcome_to_factorbook">Factorbook</a>
 motif positions that display as green boxes on the track come from an additional table
 called <em>factorbookMotifPos</em>, and are supported by additional metadata tables such as
 <em>factorbookMotifCanonical</em> that connects different terms used
 for the same factor (RELA &lt;--&gt; NFKB1), and <em>factorbookGeneAlias</em>
-that connects terms to the the link used at factorbook.org (EGR1 &lt;--&gt;
+that connects terms to the link used at factorbook.org (EGR1 &lt;--&gt;
 <a href="http://v1.factorbook.org/mediawiki/index.php/EGR-1" target="_blank">EGR-1</a>),
 and lastly a position weight matrix table, <em>factorbookMotifPwm</em>, used in
 building the graphical sequence logo for each motif on the item details page.
 These tables are available on our <a href="/goldenPath/help/mysql.html"
 target="_blank">public MySQL server</a> and as files on our
 <a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/" target="_blank">download server</a>.
 </p>
 
 <h2>Credits</h2>
 <p>
 This track shows ChIP-seq data from the <a href='http://archive.hudsonalpha.org/myers-lab'
 target="_blank">Myers Lab</a> at the <a href="https://hudsonalpha.org/"
 target="_blank">HudsonAlpha Institute for Biotechnology</a> and by the labs of
 <a href="https://med.stanford.edu/snyderlab.html" target="_blank">Michael Snyder</a>,
 <a href="http://bioinfo.mbb.yale.edu/" target="_blank">Mark Gerstein</a>,
 <a href="https://medicine.yale.edu/bbs/people/sherman_weissman.profile" target="_blank">Sherman Weissman</a>
 at Yale University,
 <a href="http://ww25.farnhamlab.com/?subid1=20241120-1213-14b7-b655-39771964508d" target="_blank">Peggy Farnham</a> 
 at the University of Southern California,
 <a href="https://struhl.med.harvard.edu/" target="_blank">Kevin Struhl</a> at Harvard,
 <a href="http://www.igsb.org/labs/kevin-white/" target="_blank">Kevin White</a> 
 at the University of Chicago, and
 <a href="http://microarray.icmb.utexas.edu/research.html" target="_blank">Vishy Iyer</a> 
 at the University of Texas, Austin.
 These data were processed into uniform peak calls by the ENCODE Analysis Working Group pipeline
 developed by
 <a href="https://sites.google.com/site/anshulkundaje" target="_blank">Anshul Kundaje</a>
 The clustering of the uniform peaks was performed by UCSC.
 The Factorbook motif identifications and localizations (and valuable assistance with 
 interpretation) were provided by Jie Wang, Bong Hyun Kim and Jiali Zhuang of the 
 <a target="_blank" href="https://www.umassmed.edu/zlab/">Zlab (Weng Lab)</a> at UMass Medical
 School.</p>
 
 <h2>References</h2>
 
 <p>
 Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J,
 Alexander R <em>et al</em>.
 <a href="https://www.nature.com/articles/nature11245" target="_blank">
 Architecture of the human regulatory network derived from ENCODE data</a>.
 <em>Nature</em>. 2012 Sep 6;489(7414):91-100.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/22955619" target="_blank">22955619</a>
 </p>
 <p>
 Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y
 <em>et al</em>.
 <a href="https://genome.cshlp.org/content/22/9/1798.long" target="_blank">
 Sequence features and chromatin structure around the genomic regions bound by 119 human
 transcription factors</a>.
 <em>Genome Res</em>. 2012 Sep;22(9):1798-812.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/22955990" target="_blank">22955990</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431495/" target="_blank">PMC3431495</a>
 </p>
 <p>
 Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D <em>et
 al</em>.
 <a href="https://academic.oup.com/nar/article/41/D1/D171/1069417" target="_blank">
 Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE
 consortium</a>.
 <em>Nucleic Acids Res</em>. 2013 Jan;41(Database issue):D171-6.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/23203885" target="_blank">23203885</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531197/" target="_blank">PMC3531197</a>
 </p>
 
 <h2>Data Release Policy</h2>
 <p>
 While primary ENCODE data was subject to a restriction period as described in the 
 <a href="../ENCODE/terms.html" target="_blank">
 ENCODE data release policy</a>, this restriction does not apply to the integrative 
 analysis results, and all primary data underlying this track have passed the restriction date. 
 The data in this track are freely available.</p>