a941de9925864ae6f9f3ee113354f4ef9bf9e123 dschmelt Thu Sep 24 17:24:27 2020 -0700 Fixing doc up for refs #26076 diff --git src/hg/htdocs/goldenPath/help/vcf.html src/hg/htdocs/goldenPath/help/vcf.html index 409b674..9c63ce8 100755 --- src/hg/htdocs/goldenPath/help/vcf.html +++ src/hg/htdocs/goldenPath/help/vcf.html @@ -13,32 +13,31 @@ target="_blank">1000 Genomes Project</a> for releases of single nucleotide variants, indels, copy number variants and structural variants discovered by the project. When a VCF file is compressed and indexed using <a href="http://samtools.sourceforge.net/tabix.shtml" target="_blank">tabix</a>, and made web-accessible, the Genome Browser is able to fetch only the portions of the file necessary to display items in the viewed region. This makes it possible to display variants from files that are so large that the connection to UCSC would time out when attempting to upload the whole file to UCSC. Both the VCF file and its tabix index file remain on your web-accessible server (http, https, or ftp), not on the UCSC server. UCSC temporarily caches the accessed portions of the files to speed up interactive display. If you do not have access to a web-accessible server and need hosting space for your VCF files, please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help documentation.</p> <p>The UCSC tools support VCF versions 3.3 and greater.</p> <p>If you have VCF trio data, you may be interested in formatting your track as a -<a href="#trio">Phased Trios</a> track instead. Please see the Phased Trio section -<a href="#trio">below</a> for more information. +<a href="#trio">Phased Trios</a> track as described below. <h2>Generating a VCF track</h2> <p> The typical workflow for generating a VCF custom track is this:</p> <ol> <li> If you haven't done so already, <a href="http://sourceforge.net/projects/samtools/files/tabix/" target="_blank"> download</a> and build the <a href="http://samtools.sourceforge.net/tabix.shtml" target="_blank">tabix and bgzip</a> programs. Test your installation by running tabix with no command-line arguments; it should print a brief usage message. For help with tabix, please contact the <a href="https://lists.sourceforge.net/lists/listinfo/samtools-help" target="_blank">samtools-help mailing list</a> (tabix is part of the samtools project).</li> <li> Create VCF or convert another format to VCF. Items must be sorted by genomic position.</li> @@ -113,51 +112,59 @@ <pre><code><strong>name </strong><em>track label </em> # default is "User Track" <strong>description </strong><em>center label </em> # default is "User Supplied Track" <strong>visibility </strong><em>squish|pack|full|dense|hide</em> # default is hide (will also take numeric values 4|3|2|1|0) <strong>priority </strong><em>N </em> # default is 100 <strong>db </strong><em>genome database </em> # e.g. hg19 for Human Feb. 2009 (GRCh37) <strong>maxWindowToDraw </strong><em>N </em> # don't display track when viewing more than N bases <strong>chromosomes </strong><em>chr1,chr2,... </em> # track contains data only on listed reference assembly sequences </code></pre> The <a target="_blank" href="hgVcfTrackHelp.html">VCF track configuration</a> help page describes the VCF track configuration page options.</p> <a name="trio"></a> <h2>Phased Trio format</h2> <p>The vcfPhasedTrio track type is available for users whose VCF contains genotype data from one to three individuals. The underlying VCF follows the standard VCF format as described above, with the added caveat that there must be GENOTYPE columns for each of the individuals present. An -example of the trio display is show below for the 1000 Genomes Trio track on Human/GRCh38: +example of the trio display is show below for the +<a href="../../cgi-bin/hgTrackUi?db=hg38&g=tgpTrios">1000 Genomes Trio track on Human/GRCh38</a>: </p> + <div class="text-center"> -<img width="80%" height="80%" src="../../images/trioExample.png" alt="3 VCF Phased Trio tracks along with the GENCODE v32 genes from the Human/GRCh38 assembly. Each of two diploid haplotypes for each individual in a trio is drawn as a black lane, with snps as vertical ticks on the haplotype they fall on. Ticks are shaded blue,red,green or black according to their predicted functional effect."> +<a href="../../cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=dschmelt&hgS_otherUserSessionName=tgpTrios"> +<img width="80%" height="80%" +src="../../images/trioExample.png" alt="3 VCF Phased Trio tracks along with the GENCODE +v32 genes from the Human/GRCh38 assembly. Each of two diploid haplotypes for each individual +in a trio is drawn as a black lane, with snps as vertical ticks on the haplotype they fall on. +Ticks are shaded blue,red,green or black according to their predicted functional effect."></a> </div> -</p> + <p> Unlike a regular genome browser track, Trio tracks display the genome variants of each individual as two haplotypes; SNPs, small insertions and deletions are mapped to each haplotype based on the -phasing information of the VCF file. Each haplotype is displayed as two separate black lanes for -the browser window region. Each variant is drawn as a vertical dash. Homozygous variants will -show two identical dashes on both haplotype lanes. Phased heterozygous variants are placed on one +phasing information of the VCF file. Each haplotype is displayed on two separate, horizontal black +lines across the browser window. Each variant is drawn as a vertical dash. Homozygous variants will +show two identical dashes on both haplotype lines. Phased heterozygous variants are placed on one of the haplotype lanes and unphased heterozygous variants are displayed in the area between the -two haplotype lanes. +two haplotype lines. </p> <p> -After generating this VCF and moving the file to a web accessible location as described above, -then you can use the following required vcfPhasedTrio trackDb settings to get the trio display: +Follow the steps for a normal VCF file, including moving the file to a web accessible location +and generating a tabix index file, then use the following required vcfPhasedTrio trackDb settings +to view the trio display: <pre><code><strong>type </strong><em>vcfPhasedTrio </em> # The track type is required and must be "vcfPhasedTrio" -<strong>bigDataUrl </strong><em>http://url.to.vcfFile </em> The bigDataUrl is rquired +<strong>bigDataUrl </strong><em>http://url.to.vcfFile </em> # The bigDataUrl is rquired <strong>vcfChildSample </strong><em>GT ID|alias </em> # the Genotype column ID of the "child" sample, with an optional "|" followed by a human readable alias for the ID </code></pre> <p>There are also two optional settings for vcfPhasedTrio tracks:</p> <pre><code><strong>vcfParentSamples </strong><em>GT ID1|alias1,GT ID2|alias2 </em> # comma separated (no spaces) list of the "parent" samples, with optional aliases <strong>vcfUseAltSampleNames </strong><em>GT ID </em> # Use the aliases in the display by default instead of the Genotype column ID </code></pre> <p>Other optional settings are not specific to VCF, but relevant:</p> <pre><code><strong>maxWindowToDraw </strong><em>N </em> # don't display track when viewing more than N bases <strong>chromosomes </strong><em>chr1,chr2,... </em> # track contains data only on listed reference assembly sequences </code></pre> <h2>Examples</h2> <h3>Example #1</h3> <p> In this example, you will create a custom track for an indexed VCF file that is already on a public server — variant calls generated by the <a href="http://1000genomes.org/"