d87e5f63bc000d560c21bca638b6d5f3dee0173d kate Wed Apr 3 12:06:45 2019 -0700 Add scripts and make docs for ENCODE 3 TF binding site tracks. refs #21139 diff --git src/hg/makeDb/trackDb/human/encode3RegTfbsCluster.html src/hg/makeDb/trackDb/human/encode3RegTfbsCluster.html new file mode 100644 index 0000000..2f8ead9 --- /dev/null +++ src/hg/makeDb/trackDb/human/encode3RegTfbsCluster.html @@ -0,0 +1,140 @@ +

Description

+

+This track shows regions of transcription factor binding derived from a large collection +of ChIP-seq experiments performed by the ENCODE project between February 2011 and November 2018.

+

+Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to +regulate gene expression. Some TFs contain a DNA binding domain and can bind directly to +specific short DNA sequences ('motifs'); +others bind to DNA indirectly through interactions with TFs containing a DNA binding domain. +High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation +followed by sequencing, or 'ChIP-seq') can be used to identify regions of +TF binding genome-wide. These regions are commonly called ChIP-seq peaks.

+

+ENCODE TF ChIP-seq data were processed using the +ENCODE Transcription Factor ChIP-seq Processing Pipeline to generate peaks of TF binding. +Peaks from 1264 experiments (1256 in hg38) representing 338 transcription factors +(340 in hg38) in 130 cell types (129 in hg38) are combined here into clusters to produce a +summary display showing occupancy regions for each factor and motif sites +within the regions when identified. + + +

+ +

Display Conventions

+

+A gray box encloses each peak cluster of transcription factor occupancy, with the +darkness of the box being proportional to the maximum signal strength observed in any cell line +contributing to the cluster. The HGNC gene name for the transcription factor is shown +to the left of each cluster. + +

+

+The cell lines where signal was detected for the factor are identified by single-letter +abbreviations shown to the right of the cluster. +The darkness of each letter is proportional to the signal strength observed in the cell line. +Abbreviations starting with capital letters designate +ENCODE cell types initially identified for intensive study, +while those starting with lowercase letters designate cell lines added later in the project.

+

+Click on a peak cluster to see more information about the TF/cell assays contributing to the +cluster and the cell line abbreviation table. +

+ +

Methods

+

+

+Peaks of transcription factor occupancy from ENCODE ChIP-seq datasets provided by the +ENCODE Data Analysis Center in November 2018 +were clustered using the UCSC hgBedsToBedExps tool. +Scores were assigned to peaks by multiplying the input signal values by a normalization +factor calculated as the ratio of the maximum score value (1000) to the signal value at one +standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the +effect of distributing scores up to mean plus one 1 standard deviation across the score range, +but assigning all above to the maximum score. +The cluster score is the highest score for any peak contributing to the cluster.

+ + +

Release Notes

+

+Release 5 (2019) of this track comprises 1264 datasets (1256 in hg38), +representing work performed through the 3rd phase of ENCODE. +Release 4 (February 2014) of this track adds display of the Factorbook motifs. +Release 3 (August 2013) added 124 datasets (690 total, vs. 486 in Release 2), +

+ +

Credits

+ +

References

+ +

+Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, +Alexander R et al. + +Architecture of the human regulatory network derived from ENCODE data. +Nature. 2012 Sep 6;489(7414):91-100. +PMID: 22955619 +

+

+Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y +et al. + +Sequence features and chromatin structure around the genomic regions bound by 119 human +transcription factors. +Genome Res. 2012 Sep;22(9):1798-812. +PMID: 22955990; PMC: PMC3431495 +

+

+Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D et +al. + +Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE +consortium. +Nucleic Acids Res. 2013 Jan;41(Database issue):D171-6. +PMID: 23203885; PMC: PMC3531197 +

+ +

Data Release Policy

+

+While primary ENCODE data was subject to a restriction period as described in the + +ENCODE data release policy, this restriction does not apply to the integrative +analysis results, and all primary data underlying this track have passed the restriction date. +The data in this track are freely available.