src/hg/makeDb/trackDb/human/encRegTfbsClustered.html aba55984f35dacd02061eafb2ca9a9784efb8bae

aba55984f35dacd02061eafb2ca9a9784efb8bae
kate
  Thu May 9 16:31:31 2019 -0700
Add/polish track labels and descriptions for ENCODE 3 TF tracks. refs #21139

diff --git src/hg/makeDb/trackDb/human/encRegTfbsClustered.html src/hg/makeDb/trackDb/human/encRegTfbsClustered.html
new file mode 100644
index 0000000..9de3cf4
--- /dev/null
+++ src/hg/makeDb/trackDb/human/encRegTfbsClustered.html
@@ -0,0 +1,128 @@
+<h2>Description</h2>
+<p>
+This track shows regions of transcription factor binding derived from a large collection
+of ChIP-seq experiments performed by the ENCODE project between February 2011 and November 2018,
+spanning the first production phase of ENCODE ("ENCODE 2") through the second full production
+phase ("ENCODE 3").
+</p>
+<p>
+Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to
+regulate gene expression.  Some TFs contain a DNA binding domain and can bind directly to 
+specific short DNA sequences ('motifs');
+others bind to DNA indirectly through interactions with TFs containing a DNA binding domain.
+High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation
+followed by sequencing, or 'ChIP-seq') can be used to identify regions of
+TF binding genome-wide.  These regions are commonly called ChIP-seq peaks.</p>
+<p>
+ENCODE TF ChIP-seq data were processed using the 
+<a target="_blank" href="https://www.encodeproject.org/chip-seq/transcription_factor/">ENCODE Transcription Factor ChIP-seq Processing Pipeline</a> to generate peaks of TF binding.
+Peaks from 1264 experiments (1256 in hg38) representing 338 transcription factors 
+(340 in hg38) in 130 cell types (129 in hg38) are combined here into clusters to produce a 
+summary display showing occupancy regions for each factor.
+The underlying ChIP-seq peak data are available from the
+<i>ENCODE 3 TF ChIP Peaks</i> tracks (
+<a target="_blank" href="../cgi-bin/hgTrackUi?db=hg19&g=encTfChipPk">hg19</a>,
+<a target="_blank" href="../cgi-bin/hgTrackUi?db=hg19&g=encTfChipPk">hg38</a>)</p>
+
+<h2>Display Conventions</h2>
+<p>
+A gray box encloses each peak cluster of transcription factor occupancy, with the
+darkness of the box being proportional to the maximum signal strength observed in any cell type
+contributing to the cluster. The HGNC gene name for the transcription factor is shown 
+to the left of each cluster.<p>
+<p>
+To the right of the cluster a configurable label can optionally display information about the
+cell types contributing to the cluster and how many cell types were assayed for the factor
+(count where detected / count where assayed).
+For brevity in the display, each cell type is abbreviated to a single letter.
+The darkness of the letter is proportional to the signal strength observed in the cell line. 
+Abbreviations starting with capital letters designate
+<a href="https://www.encodeproject.org/search/?type=Biosample&organism.scientific_name=Homo+sapiens"
+target="_blank">ENCODE cell types</a> initially identified for intensive study, 
+while those starting with lowercase letters designate cell lines added later in the project.</p>
+<p>
+Click on a peak cluster to see more information about the TF/cell assays contributing to the
+cluster and the cell line abbreviation table.
+</p>
+
+<h2>Methods</h2>
+<p>
+<p>
+Peaks of transcription factor occupancy ("optimal peak set") from ENCODE ChIP-seq datasets
+were clustered using the UCSC hgBedsToBedExps tool.  
+Scores were assigned to peaks by multiplying the input signal values by a normalization
+factor calculated as the ratio of the maximum score value (1000) to the signal value at one
+standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the
+effect of distributing scores up to mean plus one 1 standard deviation across the score range,
+but assigning all above to the maximum score.
+The cluster score is the highest score for any peak contributing to the cluster.</p>  
+
+<H2>Credits</H2>
+<p>
+Thanks to the ENCODE Consortium, the ENCODE ChIP-seq production laboratories, and the
+ENCODE Data Coordination Center for generating and processing the TF ChIP-seq datasets used here.
+The ENCODE accession numbers of the constituent datasets are available from the peak details page.
+Special thanks to Henry Pratt, Jill Moore, Michael Purcaro, and Zhiping Weng, PI, at the 
+<a target="_blank" href="https://www.umassmed.edu/zlab/">ENCODE Data Analysis Center (ZLab at UMass Medical Center)</a> for providing the peak datasets, metadata, and guidance
+developing this track.</p>
+<P>
+The integrative view presented here was developed by Jim Kent at UCSC.</P>
+
+<h2>References</h2>
+
+<p>ENCODE Project Consortium.
+<a href="https://www.ncbi.nlm.nih.gov/pubmed/21526222" title="https://www.ncbi.nlm.nih.gov/pubmed/21526222"  rel="nofollow" TARGET="_BLANK">
+A user's guide to the encyclopedia of DNA elements (ENCODE)</a>.
+<em>PLoS Biol</em>. 2011 Apr;9(4):e1001046. PMID: 21526222; PMCID: PMC3079585
+</p>
+
+<p>ENCODE Project Consortium.
+<a href="https://www.ncbi.nlm.nih.gov/pubmed/22955616" title="https://www.ncbi.nlm.nih.gov/pubmed/22955616"  rel="nofollow" TARGET="_BLANK">
+An integrated encyclopedia of DNA elements in the human genome</a>.
+<em>Nature</em>. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMCID: PMC3439153
+</p>
+<p>
+Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee
+BT <em>et al</em>.
+<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkv1160" target="_blank">
+ENCODE data at the ENCODE portal</a>.
+<em>Nucleic Acids Res</em>. 2016 Jan 4;44(D1):D726-32.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/26527727" target="_blank">26527727</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702836/" target="_blank">PMC4702836</a>
+</p>
+<p>
+Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J,
+Alexander R <em>et al</em>.
+<a href="https://www.nature.com/articles/nature11245" target="_blank">
+Architecture of the human regulatory network derived from ENCODE data</a>.
+<em>Nature</em>. 2012 Sep 6;489(7414):91-100.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/22955619" target="_blank">22955619</a>
+</p>
+<p>
+Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y
+<em>et al</em>.
+<a href="https://genome.cshlp.org/content/22/9/1798.long" target="_blank">
+Sequence features and chromatin structure around the genomic regions bound by 119 human
+transcription factors</a>.
+<em>Genome Res</em>. 2012 Sep;22(9):1798-812.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/22955990" target="_blank">22955990</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431495/" target="_blank">PMC3431495</a>
+</p>
+<p>
+Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D <em>et
+al</em>.
+<a href="https://academic.oup.com/nar/article/41/D1/D171/1069417" target="_blank">
+Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE
+consortium</a>.
+<em>Nucleic Acids Res</em>. 2013 Jan;41(Database issue):D171-6.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/23203885" target="_blank">23203885</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531197/" target="_blank">PMC3531197</a>
+</p>
+
+<H2> Data Use Policy </H2>
+<P> <B>Users may freely download, analyze and publish results based on any ENCODE data without 
+restrictions.</B>
+Researchers using unpublished ENCODE data are encouraged to contact the data producers to discuss possible coordinated publications; however, this is optional. </p>
+<B><I>Users of ENCODE datasets are requested to cite the ENCODE Consortium and ENCODE
+production laboratory(s) that generated the datasets used, as described in
+<A target="_blank" href="https://www.encodeproject.org/help/citing-encode/">Citing ENCODE</A>.</B></I></p>