e9cb8eda87388e889a16f5737ea831979118445e gperez2 Thu Apr 21 12:48:29 2022 -0700 Code review edits for the ReMap track, refs #29293 diff --git src/hg/makeDb/trackDb/mouse/reMap.html src/hg/makeDb/trackDb/mouse/reMap.html index e882ea2..62b3c21 100644 --- src/hg/makeDb/trackDb/mouse/reMap.html +++ src/hg/makeDb/trackDb/mouse/reMap.html @@ -18,31 +18,31 @@ <img src="http://remap.univ-amu.fr/storage/public/hubReMap2022/img/schema_datatype_remap.png" alt="Schematic diagram data types" style="display: block; margin-left: left; margin-right: auto; max-width:800px"> <br> <h2> Display Conventions and Configuration </h2> <ul> <li> Each transcription factor follows a specific RGB color. </li> <li> ChIP-seq peak summits are represented by vertical bars. </li> <li> Hsap: A data set is defined as a ChIP/Exo-seq experiment in a given -GEO/ArrayExpress/ENCODE series (e.g. GSE41561), for a given TF (e.g.: ESR1), in +GEO/ArrayExpress/ENCODE series (e.g. GSE41561), for a given TF (e.g. ESR1), in a particular biological condition (e.g. MCF-7). <br>Data sets are labeled with the concatenation of these three pieces of information (e.g. GSE41561.ESR1.MCF-7). </li> <li> Atha: The data set is defined as a ChIP-seq experiment in a given series (e.g. GSE94486), for a given target (e.g. ARR1), in a particular biological condition (i.e. ecotype, tissue type, experimental conditions; e.g. Col-0_seedling_3d-6BA-4h). <br>Data sets are labeled with the concatenation of these three pieces of information (e.g. GSE94486.ARR1.Col-0_seedling_3d-6BA-4h). </li> </ul> <h2>Methods</h2> @@ -59,45 +59,43 @@ </p> <p>Those merged analyses cover a total of 656 DNA-binding proteins (transcriptional regulators) such as a variety of transcription factors (TFs), transcription co-activators (TCFs), and chromatin-remodeling factors (CRFs) for 123 million peaks. </p> <img src="http://remap.univ-amu.fr/storage/public/hubReMap2022/img/Arhgap26_hgt_genome_euro_bc5a_b868b0.png" alt="Schematic diagram" style="display: block; margin-left: left; margin-right: auto; max-width:800px"> <br> <h4>ENCODE</h4> Available ENCODE ChIP-seq data sets for transcriptional regulators from the www.encodeproject.org portal were processed with the standardized ReMap pipeline. The list of ENCODE data was retrieved as FASTQ files from the ENCODE portal -(https://www.encodeproject.org/) using the following filters: Assay: "ChIP-seq", -Organism: "Homo sapiens", Target of assay: "transcription factor", Available data: -"fastq" on 2016 June 21st. Metadata information in JSON format and FASTQ files -were retrieved using the Python requests module. +(https://www.encodeproject.org/) using filters. Metadata information in JSON +format and FASTQ files were retrieved using the Python requests module. <h4>ChIP-seq processing</h4> Both Public and ENCODE data were processed similarly. Bowtie 2 (<a href= -"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322381/" TARGET = _BLANK>PMC3322381 -</a>) (version 2.2.9) with options -end-to-end -sensitive was used to align all +"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322381/" TARGET =_BLANK +>PMC3322381</a>) (version 2.2.9) with options -end-to-end -sensitive was used to align all reads on the human genome (GRCh38/hg38 assembly). Biological and technical replicates for each unique combination of GSE/TF/Cell type or Biological condition were used for peak calling. TFBS were identified using MACS2 peak-calling tool -(<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3120977/" TARGET = _BLANK> -PMC3120977</a>) (version 2.1.1.2) in order to follow ENCODE ChIP-seq guidelines, +(<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3120977/" TARGET =_BLANK +>PMC3120977</a>) (version 2.1.1.2) in order to follow ENCODE ChIP-seq guidelines, with stringent thresholds (MACS2 default thresholds, p-value: 1e-5). An input data set was used when available. <h4>Quality assessment</h4> To assess the quality of public data sets, a score was computed based on the cross-correlation and the FRiP (fraction of reads in peaks) metrics developed by the ENCODE Consortium (<a href="http://genome.ucsc.edu/ENCODE/qualityMetrics.html" TARGET = _BLANK>http://genome.ucsc.edu/ENCODE/qualityMetrics.html</a>). Two thresholds were defined for each of the two cross-correlation ratios (NSC, normalized strand coefficient: 1.05 and 1.10; RSC, relative strand coefficient: 0.8 and 1.0). Detailed descriptions of the ENCODE quality coefficients can be found at <a href="http://genome.ucsc.edu/ENCODE/qualityMetrics.html" TARGET = _BLANK>http://genome.ucsc.edu/ENCODE/qualityMetrics.html</a>. The phantompeak tools suite was used @@ -118,37 +116,37 @@ --> <h2>Data Access</h2> <p> ReMap Atlas of regulatory regions data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a> and cross-referenced with the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For programmatic access, the track can be accessed using the Genome Browser's <a href="../../goldenPath/help/api.html">REST API</a>. ReMap annotations can be downloaded from the <a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/reMap`">Genome Browser's download server</a> as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large.</p> <p> -Individual BED files for specific TFs, or Cells/Biotypes, or data sets can be +Individual BED files for specific TFs, cells/biotypes, or data sets can be found and downloaded on the ReMap website <a href="http://remap.univ-amu.fr/" target="_blank">http://remap.univ-amu.fr/</a> or at <a href="http://remap2022.univ-amu.fr/" target="_blank">http://remap2022.univ-amu.fr/</a>. </p> -The ReMap BED files for all versions [2022, 2020, 2018, 2015] are available for +The ReMap BED files for all versions (2022, 2020, 2018, 2015) are available for download at the ReMap website <a href="http://remap.univ-amu.fr/" target="_blank">http://remap.univ-amu.fr/</a> in the download tab. <h2>References</h2> <p> Chèneby J, Gheorghe M, Artufel M, Mathelier A, Ballester B. <a href="https://www.ncbi.nlm.nih.gov/pubmed/29126285" target="_blank"> ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP- seq experiments</a>. <em>Nucleic Acids Res</em>. 2018 Jan 4;46(D1):D267-D275. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/29126285" target="_blank">29126285</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753247/" target="_blank">PMC5753247</a>