6b1648806fc1de0da4b16bb23f51e0fd52065ef7 gperez2 Wed Apr 13 13:12:33 2022 -0700 making ReMap hub into native track, refs #28960 diff --git src/hg/makeDb/trackDb/mouse/reMap.html src/hg/makeDb/trackDb/mouse/reMap.html new file mode 100644 index 0000000..dd19da9 --- /dev/null +++ src/hg/makeDb/trackDb/mouse/reMap.html @@ -0,0 +1,183 @@ +<h2>Description</h2> +<p> +This track represents the ReMap Atlas of regulatory regions, which consists of a +large-scale integrative analysis of all Public ChIP-seq data for transcriptional +regulators from GEO, ArrayExpress, and ENCODE. +</p> + +<p> +Below is a schematic diagram of the types of regulatory regions: +<ul> +<li>ReMap 2022 Atlas (all peaks for each analyzed data set)</li> +<li>ReMap 2022 Non-redundant peaks (merged similar target)</li> +<li>ReMap 2022 Cis Regulatory Modules</li> +</ul> +</p> + +<img src="http://remap.univ-amu.fr/storage/public/hubReMap2022/img/schema_datatype_remap.png" +alt="Schematic diagram data types" style="display: block; margin-left: left; +margin-right: auto; max-width:800px"> +<br> + +<h2> Display Conventions and Configuration </h2> +<ul> +<li> +Each transcription factor follows a specific RGB color. +</li> +<li> +ChIP-seq peak summits are represented by vertical bars. +</li> +<li> +Hsap : A data set is defined as a ChIP/Exo-seq experiment in a given +GEO/ArrayExpress/ENCODE series (e.g. GSE41561), for a given TF (e.g.: ESR1), in +a particular biological condition (e.g. MCF-7). +<br>Data sets are labeled with the concatenation of these three pieces of +information (e.g. GSE41561.ESR1.MCF-7). +</li> +<li> +Atha : The data set is defined as a ChIP-seq experiment in a given series +(e.g. GSE94486), for a given target (e.g. ARR1), in a particular biological +condition (i.e. ecotype, tissue type, experimental conditions ; e.g. +Col-0_seedling_3d-6BA-4h). +<br>Data sets are labeled with the concatenation of these three pieces of +information (e.g. GSE94486.ARR1.Col-0_seedling_3d-6BA-4h). +</li> +</ul> + +<h2>Methods</h2> + +<p> +This release of ReMap (2022) presents the analysis of 5,505 quality controlled +mouse ChIP-seq (n=7,317 before QCs) from public sources (GEO & ENCODE). Those +ChIP-seq data sets have been mapped to the GRCm38/mm10 mouse assembly. The data +set is defined as a ChIP-seq experiment in a given series (e.g. GSE122715), +for a given TF (e.g. USF1), in a particular biological condition (i.e. cell +line, tissue type, disease state, or experimental conditions; e.g. mESC). +Data sets were labeled by concatenating these three pieces of information, such +as GSE122715.USF1.mESC. +</p> +<p>Those merged analyses cover a total of 656 DNA-binding proteins +(transcriptional regulators) such as a variety of transcription factors (TFs), +transcription co-activators (TCFs), and chromatin-remodeling factors (CRFs) for +123 million peaks. +</p> + +<img src="http://remap.univ-amu.fr/storage/public/hubReMap2022/img/Arhgap26_hgt_genome_euro_bc5a_b868b0.png" +alt="Schematic diagram" style="display: block; margin-left: left; margin-right: auto; max-width:800px"> +<br> + +<h4>ENCODE</h4> +Available ENCODE ChIP-seq data sets for transcriptional regulators from the +www.encodeproject.org portal were processed with the standardized ReMap pipeline. +The list of ENCODE data was retrieved as FASTQ files from the ENCODE portal +(https://www.encodeproject.org/) using the following filters: Assay: "ChIP-seq", +Organism: "Homo sapiens", Target of assay: "transcription factor", Available data: +"fastq" on 2016 June 21st. Metadata information in JSON format and FASTQ files +were retrieved using the Python requests module. + + +<h4>ChIP-seq processing</h4> +Both Public and ENCODE data were processed similarly. Bowtie 2 (<a href= +"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322381/" TARGET = _BLANK>PMC3322381 +</a>) (version 2.2.9) with options -end-to-end -sensitive was used to align all +reads on the human genome (GRCh38/hg38 assembly). Biological and technical +replicates for each unique combination of GSE/TF/Cell type or Biological condition +were used for peak calling. TFBS were identified using MACS2 peak-calling tool +(<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3120977/" TARGET = _BLANK> +PMC3120977</a>) (version 2.1.1.2) in order to follow ENCODE ChIP-seq guidelines, +with stringent thresholds (MACS2 default thresholds, p-value: 1e-5). An input data +set was used when available. + + +<h4>Quality assessment</h4> +To assess the quality of public data sets, a score was computed based on the +cross-correlation and the FRiP (fraction of reads in peaks) metrics developed by +the ENCODE Consortium (<a href="http://genome.ucsc.edu/ENCODE/qualityMetrics.html" +TARGET = _BLANK>http://genome.ucsc.edu/ENCODE/qualityMetrics.html</a>). Two +thresholds were defined for each of the two cross-correlation ratios (NSC, +normalized strand coefficient: 1.05 and 1.10; RSC, relative strand coefficient: +0.8 and 1.0). Detailed descriptions of the ENCODE quality coefficients can be +found at <a href="http://genome.ucsc.edu/ENCODE/qualityMetrics.html" +TARGET = _BLANK>http://genome.ucsc.edu/ENCODE/qualityMetrics.html</a>. The +phantompeak tools suite was used +(<a href="https://code.google.com/p/phantompeakqualtools/" +TARGET = _BLANK>https://code.google.com/p/phantompeakqualtools/</a>) to compute +RSC and NSC. +<p> +Please refer to the ReMap 2022, 2020, and 2018 publications for more details +(citation below). +</p> + +<!-- +<p> +<img src="http://pedagogix-tagc.univ-mrs.fr/remap2/hubDirectory/trackhub/img/remap2_figure3_web.png" alt="Detailled view of FOXA1" align="middle"> +</p> +This is a detailled view of the data increase in ReMap v2 with FOXA1 peaks at a specific location. +<br> +--> + +<h2>Data Access</h2> +<p> +ReMap Atlas of regulatory regions data can be explored interactively with the +<a href="../cgi-bin/hgTables">Table Browser</a> and cross-referenced with the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For programmatic access, +the track can be accessed using the Genome Browser's +<a href="../../goldenPath/help/api.html">REST API</a>. +ReMap annotations can be downloaded from the +<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/reMap`">Genome Browser's download server</a> +as a bigBed file. This compressed binary format can be remotely queried through +command line utilities. Please note that some of the download files can be quite large.</p> + +<p> +Individual BED files for specific TFs, or Cells/Biotypes, or data sets can be +found and downloaded on the ReMap website <a href="http://remap.univ-amu.fr/" +target="_blank">http://remap.univ-amu.fr/</a> or at <a href="http://remap2022.univ-amu.fr/" +target="_blank">http://remap2022.univ-amu.fr/</a>. +</p> + +The ReMap BED files for all versions [2022, 2020, 2018, 2015] are available for +download at the ReMap website <a href="http://remap.univ-amu.fr/" +target="_blank">http://remap.univ-amu.fr/</a> in the download tab. + + + +<h2>References</h2> + +<p> +Chèneby J, Gheorghe M, Artufel M, Mathelier A, Ballester B. +<a href="https://www.ncbi.nlm.nih.gov/pubmed/29126285" target="_blank"> +ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP- +seq experiments</a>. +<em>Nucleic Acids Res</em>. 2018 Jan 4;46(D1):D267-D275. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/29126285" target="_blank">29126285</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753247/" target="_blank">PMC5753247</a> +</p> +<p> +Chèneby J, Ménétrier Z, Mestdagh M, Rosnet T, Douida A, Rhalloussi W, Bergon A, Lopez +F, Ballester B. +<a href="https://www.ncbi.nlm.nih.gov/pubmed/31665499" target="_blank"> +ReMap 2020: a database of regulatory regions from an integrative analysis of Human and Arabidopsis +DNA-binding sequencing experiments</a>. +<em>Nucleic Acids Res</em>. 2020 Jan 8;48(D1):D180-D188. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31665499" target="_blank">31665499</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145625/" target="_blank">PMC7145625</a> +</p> +<p> +Griffon A, Barbier Q, Dalino J, van Helden J, Spicuglia S, Ballester B. +<a href="https://www.ncbi.nlm.nih.gov/pubmed/25477382" target="_blank"> +Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory +landscape</a>. +<em>Nucleic Acids Res</em>. 2015 Feb 27;43(4):e27. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/25477382" target="_blank">25477382</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4344487/" target="_blank">PMC4344487</a> +</p> +<p> +Hammal F, de Langen P, Bergon A, Lopez F, Ballester B. +<a href="https://www.ncbi.nlm.nih.gov/pubmed/34751401" target="_blank"> +ReMap 2022: a database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an +integrative analysis of DNA-binding sequencing experiments</a>. +<em>Nucleic Acids Res</em>. 2022 Jan 7;50(D1):D316-D325. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/34751401" target="_blank">34751401</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728178/" target="_blank">PMC8728178</a> +</p> +