0acf7bb36b85100c30813e4c07a41a5b7e6b185b lrnassar Fri Jan 24 11:08:31 2025 -0800 I forgot to commit this page when I first staged the track, refs #34930 diff --git src/hg/makeDb/trackDb/human/TFrPeakClusters.html src/hg/makeDb/trackDb/human/TFrPeakClusters.html new file mode 100644 index 00000000000..dece5049c4f --- /dev/null +++ src/hg/makeDb/trackDb/human/TFrPeakClusters.html @@ -0,0 +1,90 @@ +

Description

+ +

This track displays regulatory regions in the human genome identified using ENCODE +data, specifically spanning ENCODE phases 2 through 4. It highlights genomic +regions bound by DNA-associated proteins involved in transcriptional regulation, +such as RNA polymerase, transcription factors (TFs), and chromatin remodeling +proteins. Sequence-specific TFs bind directly to short DNA motifs via their +DNA-binding domains, while other DNA-associated proteins interact with DNA +indirectly through protein-protein interactions with sequence-specific TFs. Chromatin +immunoprecipitation followed by sequencing (ChIP-seq) is a high-throughput method +for mapping genome-wide protein-DNA interactions. Regions of high ChIP signal, +commonly referred to as ChIP-seq peaks, indicate protein binding sites. For each DNA +-associated protein, all ENCODE ChIP-seq peaks across biosamples were integrated to generate +a set of representative peaks (rPeaks). This track displays these rPeaks alongside +detected DNA motif sites.

+ +

Display Conventions and Configuration

+

Each rPeak is represented as a gray box, with the shade of gray corresponding +to the maximum ChIP-seq signal observed across contributing biosamples. The HGNC +gene name of the associated protein is displayed to the left of the box. If the +rPeak overlaps a cognate TF motif site in the collection built previously (PMID: +37104580 DOI: 10.1126/science.abn7930), +the motif site is highlighted in green.

+ +

Clicking on an rPeak provides detailed information about the biosamples where the +rPeak was detected, including the count of biosamples with contributing ChIP-seq peaks +and the total number of biosamples assayed for the protein. Links to relevant ENCODE +ChIP-seq experiments and overlapping ENCODE candidate cis-regulatory elements (cCREs) +are also provided.

+ +

By default, rPeaks for all 912 DNA-associated proteins with ENCODE ChIP-seq data +are displayed. Users can customize the display by selecting specific DNA-associated +proteins in the track settings.

+ +

Methods

+ +

2,509 ENCODE ChIP-seq experiments were integrated from 912 DNA-associated +proteins across 1,152 unique biosamples to produce representative peaks (rPeaks) +for each protein. The processing steps were as follows:

+ +
    +
  1. ChIP-seq peaks for each protein were downloaded from the ENCODE Portal, +generated using the +ENCODE Transcription Factor ChIP-seq Processing Pipeline.
  2. +
  3. Using bedtools merge, ChIP-seq peaks were clustered from the protein’s experiments across all biosamples.
  4. +
  5. In each cluster, the peak with the highest ChIP signal (normalized by sequencing depth) was selected as the rPeak.
  6. +
  7. All ChIP-seq peaks overlapping this rPeak by at least one nucleotide were marked as represented and removed from subsequent clustering rounds.
  8. +
  9. Steps 2-4 were repeated until a final list of non-overlapping rPeaks was generated, representing all ChIP-seq peaks for the protein.
  10. +
+ +

Data Access

+ +

The raw data for the ENCODE TF rPeak track will soon be available.

+ +

+The raw data can be explored interactively with the Table Browser, +for download, intersection or correlations with other tracks. To join this track with others +based on the chromosome positions, use the Data Integrator. + +

+Regarding access to this data track in the Genome Browser, for automated download +and analysis, the genome annotation is stored in a bigBed file that +can be downloaded from +our download server. +The file for this track is called TFrPeakClusters.bb. Individual +regions or the whole genome annotation can be obtained using our tool bigBedToBed +which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +here. +The tool +can also be used to obtain only features within a given range, e.g. +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/ENCODE4/TFrPeakClusters.bb -chrom=chr21 -start=0 -end=100000000 stdout

+

+ +

For automated access, this track like all others, is also available via our +API. However, for bulk processing in +pipelines, downloading the data and/or using bigBed files as described above is +usually faster.

+ +

Credits

+

This track was made possible thanks to the efforts of the ENCODE Consortium, +ENCODE ChIP-seq production laboratories, and the ENCODE Data Coordination Center +for generating and processing the ChIP-seq datasets. The ENCODE accession numbers +for the constituent datasets are accessible from the peak details page. Special thanks +to Drs. Mingshi Gao, Greg Andrews, Jill Moore, and Zhiping Weng at UMass Chan Medical +School, who were members of the ENCODE Data Analysis Center, for developing this track, +including providing the rPeak and motif datasets and associated metadata and building the +track. We also extend our gratitude to Max Haeussler and Jonathan Casper from the UCSC +Genome Browser Project Team for their assistance in developing this track. For updates +on the track, please contact the Weng lab.