aba55984f35dacd02061eafb2ca9a9784efb8bae kate Thu May 9 16:31:31 2019 -0700 Add/polish track labels and descriptions for ENCODE 3 TF tracks. refs #21139 diff --git src/hg/makeDb/trackDb/human/encRegTfbsClustered.html src/hg/makeDb/trackDb/human/encRegTfbsClustered.html new file mode 100644 index 0000000..9de3cf4 --- /dev/null +++ src/hg/makeDb/trackDb/human/encRegTfbsClustered.html @@ -0,0 +1,128 @@ +

Description

+This track shows regions of transcription factor binding derived from a large collection +of ChIP-seq experiments performed by the ENCODE project between February 2011 and November 2018, +spanning the first production phase of ENCODE ("ENCODE 2") through the second full production +phase ("ENCODE 3"). +

+Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to +regulate gene expression. Some TFs contain a DNA binding domain and can bind directly to +specific short DNA sequences ('motifs'); +others bind to DNA indirectly through interactions with TFs containing a DNA binding domain. +High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation +followed by sequencing, or 'ChIP-seq') can be used to identify regions of +TF binding genome-wide. These regions are commonly called ChIP-seq peaks.

+ENCODE TF ChIP-seq data were processed using the +ENCODE Transcription Factor ChIP-seq Processing Pipeline to generate peaks of TF binding. +Peaks from 1264 experiments (1256 in hg38) representing 338 transcription factors +(340 in hg38) in 130 cell types (129 in hg38) are combined here into clusters to produce a +summary display showing occupancy regions for each factor. +The underlying ChIP-seq peak data are available from the +ENCODE 3 TF ChIP Peaks tracks ( +hg19, +hg38)

+ +

Display Conventions

+A gray box encloses each peak cluster of transcription factor occupancy, with the +darkness of the box being proportional to the maximum signal strength observed in any cell type +contributing to the cluster. The HGNC gene name for the transcription factor is shown +to the left of each cluster.

+To the right of the cluster a configurable label can optionally display information about the +cell types contributing to the cluster and how many cell types were assayed for the factor +(count where detected / count where assayed). +For brevity in the display, each cell type is abbreviated to a single letter. +The darkness of the letter is proportional to the signal strength observed in the cell line. +Abbreviations starting with capital letters designate +ENCODE cell types initially identified for intensive study, +while those starting with lowercase letters designate cell lines added later in the project.

+Click on a peak cluster to see more information about the TF/cell assays contributing to the +cluster and the cell line abbreviation table. +

+ +

Methods

+Peaks of transcription factor occupancy ("optimal peak set") from ENCODE ChIP-seq datasets +were clustered using the UCSC hgBedsToBedExps tool. +Scores were assigned to peaks by multiplying the input signal values by a normalization +factor calculated as the ratio of the maximum score value (1000) to the signal value at one +standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the +effect of distributing scores up to mean plus one 1 standard deviation across the score range, +but assigning all above to the maximum score. +The cluster score is the highest score for any peak contributing to the cluster.

+ +

Credits

+Thanks to the ENCODE Consortium, the ENCODE ChIP-seq production laboratories, and the +ENCODE Data Coordination Center for generating and processing the TF ChIP-seq datasets used here. +The ENCODE accession numbers of the constituent datasets are available from the peak details page. +Special thanks to Henry Pratt, Jill Moore, Michael Purcaro, and Zhiping Weng, PI, at the +ENCODE Data Analysis Center (ZLab at UMass Medical Center) for providing the peak datasets, metadata, and guidance +developing this track.

+The integrative view presented here was developed by Jim Kent at UCSC.

+ +

References

+ +

ENCODE Project Consortium. + +A user's guide to the encyclopedia of DNA elements (ENCODE). +PLoS Biol. 2011 Apr;9(4):e1001046. PMID: 21526222; PMCID: PMC3079585 +

+ +

ENCODE Project Consortium. + +An integrated encyclopedia of DNA elements in the human genome. +Nature. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMCID: PMC3439153 +

+Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee +BT et al. + +ENCODE data at the ENCODE portal. +Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32. +PMID: 26527727; PMC: PMC4702836 +

+Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, +Alexander R et al. + +Architecture of the human regulatory network derived from ENCODE data. +Nature. 2012 Sep 6;489(7414):91-100. +PMID: 22955619 +

+Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y +et al. + +Sequence features and chromatin structure around the genomic regions bound by 119 human +transcription factors. +Genome Res. 2012 Sep;22(9):1798-812. +PMID: 22955990; PMC: PMC3431495 +

+Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D et +al. + +Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE +consortium. +Nucleic Acids Res. 2013 Jan;41(Database issue):D171-6. +PMID: 23203885; PMC: PMC3531197 +

+ +

Data Use Policy

Users may freely download, analyze and publish results based on any ENCODE data without +restrictions. +Researchers using unpublished ENCODE data are encouraged to contact the data producers to discuss possible coordinated publications; however, this is optional.

+Users of ENCODE datasets are requested to cite the ENCODE Consortium and ENCODE +production laboratory(s) that generated the datasets used, as described in +Citing ENCODE.