f4ace97438e173daf68c76153a4481de73879798 jcasper Fri Nov 8 16:46:02 2019 -0800 Initial commit of supporting docs for hic track type, refs #22316 diff --git src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html index 87a3abe..6335988 100755 --- src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html +++ src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html @@ -68,39 +68,42 @@ Browser (please note that hosting hub files on HTTP tends to work even better than FTP and local hubs can be displayed on <a href="hubQuickStartAssembly.html#blatGbib" target="_blank">GBiB</a>). Track hubs can be displayed on genomes that UCSC directly supports, or on your own sequence. Hubs are a useful tool for visualizing a large number of genome-wide data sets. For example, a project that has produced several wiggle plots of data can use the hub utility to organize the tracks into composite and super-tracks, making it possible to show the data for a large collection of tissues and experimental conditions in a visually elegant way, similar to how the ENCODE native data tracks are displayed in the browser.</p> <p> The track hub utility allows efficient access to data sets from around the world through the familiar Genome Browser interface. Browser users can display tracks from any public track hub that has been registered with UCSC. Additionally, users can import data from unlisted hubs or can set up, display, and share their own track hubs. Genome assemblies that UCSC does not support can be loaded and viewed with associated data.</p> <p> -The data underlying the tracks and optional sequence in a hub reside on the remote server of the -data provider rather than at UCSC. Genomic annotations are stored in compressed binary indexed -files in bigBed, bigBarChart, bigGenePred, bigNarrowPeak, bigPsl, bigChain, bigInteract, bigMaf, bigWig, BAM, CRAM, HAL or VCF -format that contain the data at several resolutions. In the case of assemblies that UCSC does not -support, genomic sequence is stored in the efficient twoBit format. When a hub track is displayed -in the Genome Browser, only the relevant data needed to support the view of the current genomic -region are transmitted rather than the entire file. The transmitted data are cached on the UCSC -server to expedite future access. This on-demand transfer mechanism eliminates the need to transmit -large data sets across the Internet, thereby minimizing upload time into the browser.</p> +The data underlying the tracks and optional sequence in a hub reside on the +remote server of the data provider rather than at UCSC. Genomic annotations are +stored in compressed binary indexed files in bigBed, bigBarChart, bigGenePred, +bigNarrowPeak, bigPsl, bigChain, bigInteract, bigMaf, bigWig, BAM, CRAM, HAL, +hic or VCF format that contain the data at several resolutions. In the case of +assemblies that UCSC does not support, genomic sequence is stored in the +efficient twoBit format. When a hub track is displayed in the Genome Browser, +only the relevant data needed to support the view of the current genomic region +are transmitted rather than the entire file. The transmitted data are cached on +the UCSC server to expedite future access. This on-demand transfer mechanism +eliminates the need to transmit large data sets across the Internet, thereby +minimizing upload time into the browser.</p> <p> The track hub utility offers a convenient way to view and share very large sets of data. Individuals wishing to display only a few small data sets may find it easier to use the Genome Browser <a href="hgTracksHelp.html#CustomTracks">custom track</a> utility. As with hub tracks, custom tracks can be uploaded to the UCSC Genome Browser and viewed alongside the native annotation tracks. Custom tracks can be constructed from a wide range of data types; hub tracks are limited to compressed binary indexed formats that can be remotely hosted. However, the custom tracks utility does not offer the data persistence and track configurability provided by the track hub mechanism: hub tracks can be grouped into composite or super-tracks and configured to display the data using a wide variety of options. There is no way to create a browser on your own sequence with custom tracks. In general, for users who have large data sets that would be prohibitive to upload, need to ensure the persistence of their data, or would like to take full advantage of track functionality, or create a browser on sequence not natively supported by UCSC or a genome browser mirror, track hubs are a better solution. Both mechanisms give data providers the flexibility to directly add, update, and remove data from their display as needed.</p> @@ -178,32 +181,32 @@ tracks</a> in the same manner as other tracks. The data underlying data hub tracks can be viewed, manipulated, and downloaded using the <a href="../../cgi-bin/hgTables">UCSC Table Browser</a>.</p> <!--Question: Are there any restrictions to these uses?--> <!-- ========== Setting Up Your Own Track Hub ============================== --> <a name="Setup"></a> <h2>Setting up your own Track Hub</h2> <p> This section provides a step-by-step description of the process used to set up a track hub on your own server.</p> <p> To create your own hub you will need:</p> <ul> <li> one or more data sets formatted in one of the compressed binary index formats supported by the - Genome Browser: bigBed, bigBarChart, bigGenePred, bigNarrowPeak, bigPsl, bigChain, bigInteract, bigMaf, bigWig, BAM, CRAM, - HAL or VCF</li> + Genome Browser: bigBed, bigBarChart, bigGenePred, bigNarrowPeak, bigPsl, bigChain, bigInteract, + bigMaf, bigWig, BAM, CRAM, HAL, hic or VCF</li> <li> a set of text files that specify properties for the track hub and for each of the data tracks within it</li> <li> a twoBit file with your sequence if you are setting up an assembly hub.</li> <li> an Internet-enabled web server or ftp server </li> <!--Question: do we have some general minimum requirements for this?--> </ul> <p> The files are placed on the server in a file hierarchy like the one shown in <em>Example 1</em>. Users experienced in setting up Genome Browser mirrors that contain their own data will find that setting up a track hub is similar, but is usually much easier. Depending on the number and complexity of the data sets, a track hub can typically be set up in a day or two. It is generally easiest to run the command-line data formatting programs in a Linux programming environment, @@ -249,90 +252,100 @@ rnaSeqLung.bigBed - intron/exon lists for lung </code></pre> <hr> <p> <strong>Step 1. Format the data</strong><br> The data tracks provided by a hub must be formatted in one of the compressed binary index formats supported by the Genome Browser: <a href="bigWig.html">bigWig</a>, <a href="bigBed.html">bigBed</a>, <a href="bigGenePred.html">bigGenePred</a>, <a href="bigChain.html">bigChain</a>, <a href="bigNarrowPeak.html">bigNarrowPeak</a>, <a href="barChart.html">bigBarChart</a>, <a href="interact.html">bigInteract</a>, <a href="bigPsl.html">bigPsl</a>, <a href="bigMaf.html">bigMaf</a>, -<a href="bigWig.html">bigWig</a>, +<a href="hic.html">hic</a>, <a href="bam.html">BAM</a>, -<a href="cram.html">CRAM</a>, HAL or +<a href="cram.html">CRAM</a>, +<a href="https://github.com/ComparativeGenomicsToolkit/hal" target="_blank">HAL</a> or <a href="vcf.html">VCF</a>.</p> <p> <em>bigWig</em> - The bigWig format is best for displaying continuous value plot data, such as read depths from short read sequencing projects or levels of conservation observed in a multiple-species alignment. A bigWig file contains a list of chromosome segments, each of which is associated with a floating point value. When graphed, the segments may appear as a big "wiggle". Although each bigWig file can contain only a single value for any given base, bigWig tracks are often combined into "container multiWig" or "compositeTrack on" tagged tracks. For information on creating and configuring bigWig tracks, see the <a href="bigWig.html">bigWig Track Format</a> help page.</p> <p> <em>bigBed</em> - BigBed files are binary indexed versions of Browser Extensible Data (<a href="../../FAQ/FAQformat.html#format1">BED</a>) files. BED format is useful for associating a name and (optionally) a color and a score with one or more related regions on the same chromosome, such as all the exons of a gene. See the <a href="bigBed.html">bigBed Track Format</a> help page for information on creating and configuring bigBed tracks.</p> <p> -<em>bigNarrowPeak</em> - BigNarrowPeak files are binary indexed versions of Browser Extensible Data -(<a href="../../FAQ/FAQformat.html#format1">BED</a>) files with first six fields being the same as bed, -and an extra four fields that contain various scores and the offset of the base within the block that is -the peak. - See the <a href="bigNarrowPeak.html">bigNarrowPeak Track -Format</a> help page for information on creating and configuring bigNarrowPeak tracks.</p> -<p> <em>bigGenePred</em> - BigGenePred files are binary indexed versions of Browser Extensible Data (<a href="../../FAQ/FAQformat.html#format1">BED</a>) files with an extra eight fields that are useful for describing gene predictions that are modeled after the fields in <a href="../../FAQ/FAQformat.html#format9">genePred</a> files. BigGenePred format is useful for associating a name and (optionally) a color and a score with one or more related regions on the same chromosome, such as all the exons of a gene. See the <a href="bigGenePred.html">bigGenePred Track Format</a> help page for information on creating and configuring bigGenePred tracks.</p> <p> +<em>bigChain</em> - BigChain files are binary indexed versions of +<a href="chain.html">chain</a> files. BigChain format is useful for large pairwise alignment data +sets. See the <a href="bigChain.html">bigChain Track Format</a> help page for more information on +creating and configuring bigChain tracks.</p> +<p> +<em>bigNarrowPeak</em> - BigNarrowPeak files are binary indexed versions of Browser Extensible Data +(<a href="../../FAQ/FAQformat.html#format1">BED</a>) files with first six fields being the same as bed, +and an extra four fields that contain various scores and the offset of the base within the block that is +the peak. + See the <a href="bigNarrowPeak.html">bigNarrowPeak Track +Format</a> help page for information on creating and configuring bigNarrowPeak tracks.</p> +<p> <em>bigBarChart</em> - BigBarChart files are binary indexed versions of <a href="../../FAQ/FAQformat.html#format21">barChart</a> files. BigBarChart format is useful for bringing barChart display into track hubs, and supports schema customization and label configuration that is not supported for regular barChart format. See the <a href="barChart.html">barChart Track Format</a> help page for information on creating and configuring bigBarChart tracks.</p> <p> <em>bigInteract</em> - BigInteract files are binary indexed versions of <a href="../../FAQ/FAQformat.html#format22">interact</a> files. BigInteract format is useful for -bringing interacdt display into track hubs, and supports schema customization and label configuration +bringing interact display into track hubs, and supports schema customization and label configuration that is not supported for regular interact format. See the <a href="interact.html">interact Track Format</a> help page for information on creating and configuring bigInteract tracks.</p> <p> <em>bigPsl</em> - BigPsl files are binary indexed versions of <a href="../../FAQ/FAQformat.html#format2">PSL</a> files. BigPsl format is useful for large data sets created by BLAT or other tools. See the <a href="bigPsl.html">bigPsl Track Format</a> help page for more information on creating and configuring bigPsl tracks.</p> -<p><em>bigChain</em> - BigChain files are binary indexed versions of -<a href="chain.html">chain</a> files. BigChain format is useful for large pairwise alignment data -sets. See the <a href="bigChain.html">bigChain Track Format</a> help page for more information on -creating and configuring bigChain tracks.</p> -<p><em>bigMaf</em> - BigMaf files are binary indexed versions of +<p> +<em>bigMaf</em> - BigMaf files are binary indexed versions of <a href="../../FAQ/FAQformat.html#format5">MAF</a> files. BigMaf format is useful for large multiple alignment data sets. See the <a href="bigMaf.html">bigMaf Track Format</a> help page for more information on creating and configuring bigMaf tracks.</p> -<p><em>BAM</em> - BAM files contain alignments of (generally short) DNA reads to a reference +<p> +<em>hic</em> - Hic files are binary files that store contact matrices from chromatin +conformation experiments. This format is useful for displaying interactions at a scale and depth +that exceeds what can be easily visualized with the interact and bigInteract formats. +See the <a href="hic.html">hic Track Format</a> help page for more information on creating and +configuring hic tracks.</p> +<p> +<em>BAM</em> - BAM files contain alignments of (generally short) DNA reads to a reference sequence, usually a complete genome. BAM files are binary versions of Sequence Alignment/Map (<a href="http://samtools.sourceforge.net/" target="_blank">SAM</a>) format files. Unlike bigWig and bigBed formats, the index for a BAM file is in a separate file, which the track hub expects to be in the same directory with the same root name as the BAM file with the addition of a <em>.bai</em> suffix. See the <a href="bam.html">BAM Track Format</a> help page for more information.</p> <p> <em>CRAM</em> - The CRAM file format is a more dense form of <a href="bam.html">BAM</a> files with the benefit of saving much disk space. While BAM files contain all sequence data within a file, CRAM files are smaller by taking advantage of an additional external "reference sequence" file. This file is needed to both compress and decompress the read information. See the <a href="cram.html">CRAM Track Format</a> help page for more information.</p> <p> <em>HAL</em> - HAL (Hierarchical Alignment Format) is a graph-based structure to efficiently store and index multiple genome alignments and ancestral reconstructions. <a href="https://github.com/glennhickey/hal/blob/master/README.md" target="_blank">HAL</a> files are @@ -531,35 +544,36 @@ remaining characters must be letters, numbers, or under-bar ("_"). Each track must have a unique name. This tag pair must be the first entry in the trackDb.txt file.</p> <p> <em>bigDataUrl</em> - the file name, path, or Web location of the track's data file. The bigDataUrl can be a full URL. If it is not prefaced by a protocol, such as <em>http://</em>, <em>https://</em> or <em>ftp://</em>, then it is considered to be a path relative to the trackDb.txt file.</p> <p> <em>shortLabel</em> - the short name for the track displayed in the track list, in the configuration and track settings, and on the details pages. Suggested maximum length is 17 characters.</p> <p> <em>longLabel</em> - the longer description label for the track that is displayed in the configuration and track settings, and on the details pages. Suggested maximum length is 80 characters.</p> <p> <em>type</em> - the format of the file specified by bigDataUrl. Must be either <em>bigWig</em>, -<em>bigBed</em>, <em>bigBarChart</em>, <em>bigGenePred</em>, <em>bigInteract</em>, <em>bigNarrowPeak</em>, <em>bigChain</em>, <em>bigPsl</em>, -<em>bigMaf</em>, <em>bam</em>, <em>halSnake</em> or <em>vcfTabix</em> (Note: use <em> type bam</em> -for CRAM files). If the type is <em>bigBed</em>, it may be followed by an optional number denoting -the number of fields in the bigBed file (e.g., "type bigBed 12" for a file with 12 fields -or "type bigBed 12 +" for a file that contains additional <a href="../../FAQ/FAQformat.html#format1" +<em>bigBed</em>, <em>bigBarChart</em>, <em>bigGenePred</em>, <em>bigInteract</em>, <em>bigNarrowPeak</em>, +<em>bigChain</em>, <em>bigPsl</em>, <em>bigMaf</em>, <em>hic</em>, <em>bam</em>, <em>halSnake</em> or +<em>vcfTabix</em> (Note: use <em> type bam</em> for CRAM files). If the type is <em>bigBed</em>, it +may be followed by an optional number denoting the number of fields in the bigBed file (e.g., +"type bigBed 12" for a file with 12 fields or "type bigBed 12 +" for a file +that contains additional <a href="../../FAQ/FAQformat.html#format1" target="_blank">non-standard columns</a>). If no number is given, a default value of 3 is assumed (a very limited display that omits names, strand information, and exon boundaries).</p> <p> <strong><em>Example 4:</em></strong> Sample trackDb.txt file containing two simple tracks.</p> <pre><code><strong>track</strong> dnaseSignal <strong>bigDataUrl</strong> dnaseSignal.bigWig <strong>shortLabel</strong> DNAse Signal <strong>longLabel</strong> Depth of alignments of DNAse reads <strong>type</strong> bigWig <br> <strong>track</strong> dnaseReads <strong>bigDataUrl</strong> dnaseReads.bam <strong>shortLabel</strong> DNAse Reads <strong>longLabel</strong> DNAse reads mapped with MAQ