8a3f24f0e70e94dc4812283bbd60c715651d6c46 lrnassar Mon Nov 3 14:50:27 2025 -0800 Staging the ENCODE4 cCREs track. Next step is asking for feedback from the authors, refs #34923 diff --git src/hg/makeDb/trackDb/human/hg38/cCREregistry.html src/hg/makeDb/trackDb/human/hg38/cCREregistry.html new file mode 100644 index 00000000000..bc99e2638b8 --- /dev/null +++ src/hg/makeDb/trackDb/human/hg38/cCREregistry.html @@ -0,0 +1,121 @@ +<h2>Description</h2> +<p> +This track displays the <em>ENCODE Registry of candidate cis-Regulatory Elements</em> (cCREs) +in the human genome from ENCODE 4. A total of <b>2,348,854</b> elements identified and classified by the +ENCODE Data Analysis Center according to biochemical signatures. Most cCREs are anchored on +DNase hypersensitive sites further annotated with histone modifications (H3K4me3 and H3K27ac) +or CTCF binding measured by ChIP-seq experiments. In this latest version of the Registry (V4), +the representative DNase hypersensitive sites (rDHSs) were supplemented +with 86,748 representative transcription factor ChIP-seq peaks (TF +rPeaks)—peaks that represent binding sites for at least five TFs. The Registry of cCREs is +one of the core components of the integrative level of the ENCODE Encyclopedia of DNA Elements.</p> + +<p>Additional exploration of the cCREs and underlying raw ENCODE signal data can be done with the +<b>Core Collection</b> track. The data is also available on the <a +target=" blank" href="https://screen.wenglab.org/">SCREEN</a> (Search Candidate cis-Regulatory +Elements) web tool, designed specifically for the Registry, accessible by item mouseovers and linkouts from the +track details page.</p> + +<h2>Display Conventions and Configurations</h2> +<p> +Each cCRE is displayed as a colored box by type, which reflects its putative functional assignment +based on biochemical signatures and genomic context:</p> +<p> +<img src="../images/encode4cCREs.png" alt="Graphic of cCRE classifications" width="40%"></p> +<p> +Mousing over the data will display the accession ID, the assigned cCRE class type, and the Max-Z scores +for the various underlying biosignals (DNase, H3K4me3, H3K27ac, CTCF). A track filter is also available +to selectively show items based on their cCRE class type.</p> + +<h2>Methods</h2> +<p> +Candidate cis-regulatory elements (cCREs) were first anchored on nucleosome-sized DNase +hypersensitive sites (rDHSs) identified from DNase-seq data. These rDHSs were then annotated +using ChIP-seq data for histone modifications—H3K4me3 and H3K27ac, marking promoters and +enhancers, respectively—and CTCF, marking insulators. To supplement rDHS-anchored cCRE +definitions, transcription factor ChIP-seq peaks were incorporated, enabling identification +of cCREs even in regions of low chromatin accessibility. Although not used for anchoring, +ATAC-seq data were used to assess chromatin accessibility in biosamples lacking DNase-seq.</p> + +<p> +Classification of cCRE's was performed based on the following criteria:</p> +<ol> +<li><strong><span style="color: #ff0000;">Promoter-like signatures (promoter)</span></strong> +must fall within 200 bp of a TSS and have high chromatin accessibility and H3K4me3 signals.</li> +<li><strong><span style="color: #ffa700;">TSS-proximal enhancer-like signatures (proximal +enhancer)</span></strong> have high chromatin accessibility and H3K27ac signals and are +within 2 kb of an annotated TSS. If they are within 200 bp of a TSS, they must +also have low H3K4me3 signal.</li> +<li><strong><span style="color: #ffcd00;">TSS-distal enhancer-like signatures +(distal enhancer)</span></strong> have high chromatin accessibility and H3K27ac signals +and are farther than 2 kb from an annotated TSS.</li> +<li><strong><span style="color: #ffaaaa;">Chromatin accessibility + +H3K4me3 (CA-H3K4me3)</strong></span> have high chromatin accessibility and H3K4me3 +signals but low H3K27ac signals and do not fall within 200 bp of a TSS.</li> +<li><strong><span style="color: #00b0f0;">Chromatin accessibility + +CTCF (CA-CTCF)</strong></span>have high chromatin accessibility and CTCF signals +but low H3K4me3 and H3K27ac signals.</li> +<li><strong><span style="color: #be28e5;">Chromatin accessibility + +transcription factor (CA-TF)</strong></span> have high chromatin accessibility, +low H3K4me3, H3K27ac, and CTCF signals, and are bound by a transcription factor.</li> +<li><strong><span style="color: #06da93;">Chromatin accessibility +(CA)</strong></span>have high chromatin accessibility and low H3K4me3, H3K27ac, and +CTCF signals.</li> +<li><strong><span style="color: #d876ec;">Transcription factor +(TF)</strong></span> have low chromatin accessibility, low H3K4me3, H3K27ac, +and CTCF signals and are bound by a transcription factor.</li> +</ol> + +<h2>Data Access</h2> +<p> +The ENCODE accession numbers of the constituent datasets at the <a target="_blank" +href="https://encodeproject.org/">ENCODE Portal</a> are available from the cCRE details page.</p> +<p> +The data in this track can be interactively explored with the <a target="_blank" +href="/cgi-bin/hgTables">Table Browser</a> or the <a target="_blank" +href="/cgi-bin/hgIntegrator">Data Integrator</a>. The data can be accessed from +scripts through our a <target="_blank" href="https://api.genome.ucsc.edu/">API</a>, +the track name is "cCREregistry".</p> +<p> +For automated download and analysis, this annotation is stored in a bigBed file +that can be downloaded from <a target="_blank" +href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/encode4/ccre/">our download server</a>. +The file for this track is called cCREregistry.bb. Individual regions or the whole genome +annotation can be obtained using our tool bigBedToBed which can be compiled from the source +code or downloaded as a precompiled binary for your system. Instructions for downloading +source code and binaries can be found <a target="_blank" +href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities downloads">here</a>. +The tool can also be used to obtain only features within a given range, e.g.<br><br> +<code>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/encode4/ccre/cCREregistry.bb -chrom=chr21 -start=0 -end=100000000 stdout</code></p> + +<h2>Credits</h2> +<p> +Data were generated by the ENCODE Consortium. The data were further processed for +visualization through a collaborative effort between the <a target="_blank" +href="https://www.umassmed.edu/zlab">Weng lab</a> and the <a target="_blank" +href="https://sites.google.com/view/moore-lab/">Moore lab</a> at UMass Chan Medical +School (funded by NIH grant HG012343). Integration and visualization were developed +by Drs. Mingshi Gao, Jill Moore, and Zhiping Weng at UMass Chan Medical School, who were +part of the ENCODE Data Analysis Center. We thank the ENCODE production labs +for generating the data.</p> + +<h2>References</h2> +<p> +ENCODE Project Consortium, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, +Davis CA, Dobin A <em>et al</em>. +<a href="https://doi.org/10.1038/s41586-020-2493-4" target="_blank"> +Expanded encyclopaedias of DNA elements in the human and mouse genomes</a>. +<em>Nature</em>. 2020 Jul;583(7818):699-710. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/32728249" target="_blank">32728249</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7410828/" target="_blank">PMC7410828</a> +</p> +<p> +Moore JE, Pratt HE, Fan K, Phalke N, Fisher J, Elhajjajy SI, Andrews G, Gao M, Shedd N, Fu Y <em>et +al</em>. +<a href="https://doi.org/10.1101/2024.12.26.629296" target="_blank"> +An Expanded Registry of Candidate cis-Regulatory Elements for Studying Transcriptional +Regulation</a>. +<em>bioRxiv</em>. 2024 Dec 26;. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39763870" target="_blank">39763870</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11703161/" target="_blank">PMC11703161</a> +</p>