f1bfefbd4ee51872a1da796930a0fa33f101e8e2 jnavarr5 Fri Nov 22 14:11:06 2024 -0800 Adding documentation about using the .csi index for VCF files and adding a note about using the bigDataIndex to the list of steps to generate a VCF track, refs #28368 diff --git src/hg/htdocs/goldenPath/help/vcf.html src/hg/htdocs/goldenPath/help/vcf.html index 3fdf505..0d33460 100755 --- src/hg/htdocs/goldenPath/help/vcf.html +++ src/hg/htdocs/goldenPath/help/vcf.html @@ -97,51 +97,62 @@ If you haven't done so already, <a href="http://sourceforge.net/projects/samtools/files/tabix/" target="_blank"> download</a> and build the <a href="http://samtools.sourceforge.net/tabix.shtml" target="_blank">tabix and bgzip</a> programs. Test your installation by running tabix with no command-line arguments; it should print a brief usage message. For help with tabix, please contact the <a href="https://lists.sourceforge.net/lists/listinfo/samtools-help" target="_blank">samtools-help mailing list</a> (tabix is part of the samtools project).</li> <li> Create VCF or convert another format to VCF. Items must be sorted by genomic position.</li> <li> Compress your <em>.vcf</em> file using the <code>bgzip</code> program: <pre><code>bgzip my.vcf</code></pre> For more information about the <code>bgzip</code> command, run it with no arguments to display the usage message.</li> <li> - Create a tabix index file for the bgzip-compressed VCF (<em>.vcf.gz</em>): + Create a tabix index file (<code>.tbi</code> or <code>.csi</code>) for the bgzip-compressed VCF + (<em>.vcf.gz</em>). + By default, the tabix command appends <em>.tbi</em> to the <em>my.vcf.gz</em> filename, creating + a binary index file named <em>my.vcf.gz.tbi</em> with which genomic + coordinates can quickly be translated into file offsets in <em>my.vcf.gz</em>: <pre><code>tabix -p vcf my.vcf.gz</code></pre> - The tabix command appends <em>.tbi</em> to the <em>my.vcf.gz</em> filename, creating a binary - index file named <em>my.vcf.gz.tbi</em> with which genomic coordinates can quickly be translated - into file offsets in <em>my.vcf.gz</em>.</li> + The tabix (<code>.tbi</code>) and BAI index formats can handle individual chromosomes up to 512 Mbp + (2^29 bases) in length. If your input file contains data lines with start or end positions + greater than 512 Mbp, you will need to use a CSI (<code>.csi</code>) index instead. + <pre><code>tabix --csi -p vcf my.vcf.gz</code></pre> + </li> <li> - Move both the compressed VCF file (<em>my.vcf.gz</em>) and tabix index file - (<em>my.vcf.gz.tbi</em>) to an http, https, or ftp location.Note that the Genome Browser - looks for an index file with the same URL as the VCF file with the .tbi suffix added. If your - hosting site does not use the filename as the URL link, you will have to specifically - call the location of this .vcf.tbi index file with the <code>bigDataIndex</code> keyword. - This keyword is relevant for Custom Tracks and Track Hubs. You can read more about - <em>bigDataIndex</em> in + Move both the compressed VCF file (<em>my.vcf.gz</em>) and index file + (<em>my.vcf.gz.tbi</em> or <em>my.vcf.gz.csi</em>) to an http, https, or ftp location. Note that + the Genome Browser looks for an index file with the same URL as the VCF file with the .tbi or .csi + suffix added. + <br><br> + If your hosting site does not use the filename as the URL link, you will have to specifically + call the location of this .vcf.tbi/csi index file with the <code>bigDataIndex</code> keyword. + You can read more about <em>bigDataIndex</em> in <a href="trackDb/trackDbHub.html#bigDataIndex">the TrackDb Database Definition page</a>.</li> <li> Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single <a href="hgTracksHelp.html#TRACK">track line</a>. The basic version of the track line will look something like this: <pre><code>track type=vcfTabix name="My VCF" bigDataUrl=<em>http://myorg.edu/mylab/my.vcf.gz</em></code></pre> Again, in addition to <em>http://myorg.edu/mylab/my.vcf.gz</em>, the associated index file - <em>http://myorg.edu/mylab/my.vcf.gz.tbi</em> must also be available at the same location.</li> + <em>http://myorg.edu/mylab/my.vcf.gz.tbi</em> must also be available at the same location. + If the file is in a different location or uses a different filename, then use the + <em>bigDataIndex</em> attribute in the track line to point to the index file. + <pre><code>track type=vcfTabix name="My VCF" bigDataUrl=<em>http://myorg.edu/mylab/my.vcf.gz</em> bigDataIndex=<em>http://myorg.edu/someOtherDirectory/myvcf.gz.tbi</em></code></pre> + </li> <li> Paste the custom track line into the text box in the <a href="../../cgi-bin/hgCustom" target="_blank">custom track management page</a>, click "submit" and view in the Genome Browser.</li> </ol> <h2>Parameters for VCF custom track definition lines</h2> <p> All options are placed in a single line separated by spaces (lines are broken only for readability here):</p> <pre><code><strong>track type=vcfTabix bigDataUrl=</strong><em>http://...</em> <strong>hapClusterEnabled=</strong><em>true|false</em> <strong>hapClusterColorBy=</strong><em>altOnly|refAlt|base</em> <strong>hapClusterTreeAngle=</strong><em>triangle|rectangle</em> <strong>hapClusterHeight=</strong><em>N</em>