33adf27afa5f8d959d92d6c6fc5ab52b06e2b62f brianlee Mon May 10 14:56:06 2021 -0700 Adding a longTabix entry on data formats after Max email, refs #20885 diff --git src/hg/htdocs/FAQ/FAQformat.html src/hg/htdocs/FAQ/FAQformat.html index 5d50cd0..e34a433 100755 --- src/hg/htdocs/FAQ/FAQformat.html +++ src/hg/htdocs/FAQ/FAQformat.html @@ -28,30 +28,31 @@ <li><a href="#format9.6">bigLolly table format</a></li> <li><a href="#format6.1">bigWig format</a></li> <li><a href="../goldenPath/help/chain.html">Chain format</a></li> </ul> </div> <!-- Right column --> <div class="col-md-6"> <ul> <li><a href="#format5.2">CRAM format</a></li> <li><a href="#format9">GenePred table format</a></li> <li><a href="#format3">GFF format</a></li> <li><a href="#format4">GTF format</a></li> <li><a href="#format20">HAL format</a></li> <li><a href="#format23">Hic format</a></li> <li><a href="#format22">Interact and bigInteract format</a></li> + <li><a href="#format24">Longrange longTabix format</a></li> <li><a href="#format5">MAF format</a></li> <li><a href="#format6.5">Microarray format</a></li> <li><a href="../goldenPath/help/net.html">Net format</a></li> <li><a href="#format10">Personal Genome SNP format</a></li> <li><a href="#format2">PSL format</a></li> <li><a href="#format10.1">VCF format</a></li> <li><a href="#format6">WIG format</a></li> </ul> </div> </div> <a name="ENCODE"></a> <h6>ENCODE-specific formats</h6> <ul> <li><a href="#format13">ENCODE broadPeak format</a></li> <li><a href="#format14">ENCODE gappedPeak format</a></li> @@ -527,45 +528,58 @@ <a name="format23"></a> <h2>Hic format</h2> <p> Hic files are binary files that store contact matrices from chromatin conformation experiments. This format is useful for displaying interactions at a scale and depth that exceeds what can be easily visualized with the interact and bigInteract formats. See the <a href="../goldenPath/help/hic.html">hic Track Format</a> help page for more information on creating and configuring hic tracks. More information on the hic format itself can be found in the documentation on <a href="https://github.com/aidenlab/juicer/wiki/Data/#hic-files" target="_blank">Github</a>. The hic format was created by the <a href="https://www.aidenlab.org" target="_blank">Aiden Lab</a> at <a href="https://www.bcm.edu" target="_blank">Baylor College of Medicine</a>. </p> - <a name="format22"></a> <h2>Interact format</h2> <p> The interact (and bigInteract) track format displays pairwise interactions as arcs or half-rectangles connecting two genomic regions on the same chromosome. Cross-chromosomal interactions can also be represented in this format. This format is useful for displaying functional element interactions such as SNP/gene interactions, and is also suitable for low-density chromatin interactions, such as ChIA-PET, and other use cases with a limited number of interactions on the genome. It is not suitable for high-density chromatin data such as Hi-C.</p> <p> Click <a href="../goldenPath/help/interact.html">here</a> for more information on the interact and bigInteract formats.</p> +<a name="format24"></a> +<h2>Longrange longTabix format</h2> +<p> +The longrange track is a bed format-like file type. Each row contains columns +that define chromosome, start position (0-based), and end position (not included), +and interaction target in this format chr2:333-444,55. For examples, +see the source of this format +at <a href="https://epigenomegateway.readthedocs.io/en/latest/tracks.html#longrange" +target="_blank">WashU Epigenome Browser</a>.</p> +<p> +Also, review the enhanced <a href="../goldenPath/help/interact.html">interact</a> format +for information on how to visualize pairwise interactions as arcs or in the browser. +</p> + <a name="format5"></a> <h2>MAF format</h2> <p> The multiple alignment format stores a series of multiple alignments in a format that is easy to parse and relatively easy to read. This format stores multiple alignments at the DNA level between entire genomes. Previously used formats are suitable for multiple alignments of single proteins or regions of DNA without rearrangements, but would require considerable extension to cope with genomic issues such as forward and reverse strand directions, multiple pieces to the alignment, and so forth.</p> <p> <strong>General Structure</strong><br> The <em>.maf</em> format is line-oriented. Each multiple alignment ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form