6cc1e1e4ce2f0553e9a45485279b6497e06a64e9
kuhn
  Thu Jan 7 17:09:44 2021 -0800
added a clarifyng phrase

diff --git src/hg/htdocs/goldenPath/help/vcf.html src/hg/htdocs/goldenPath/help/vcf.html
index cb4135b..49fb004 100755
--- src/hg/htdocs/goldenPath/help/vcf.html
+++ src/hg/htdocs/goldenPath/help/vcf.html
@@ -1,307 +1,308 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser VCF+tabix Track Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>VCF+tabix Track Format</h1>
 <p>
 <a href="http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41" 
 target="_blank">Variant Call Format</a> (VCF) is a flexible and extendable line-oriented text
 format developed by the <a href="http://www.1000genomes.org/" 
 target="_blank">1000 Genomes Project</a> for releases of single nucleotide variants, indels, copy 
 number variants and structural variants discovered by the project. When a VCF file is compressed and
 indexed using <a href="http://samtools.sourceforge.net/tabix.shtml" target="_blank">tabix</a>, and 
 made web-accessible, the Genome Browser is able to fetch only the portions of the file necessary to 
 display items in the viewed region. This makes it possible to display variants from files that are 
 so large that the connection to UCSC would time out when attempting to upload the whole file to 
 UCSC. Both the VCF file and its tabix index file remain on your web-accessible server (http, https, 
 or ftp), not on the UCSC server. UCSC temporarily caches the accessed portions of the files to speed
 up interactive display. If you do not have access to a web-accessible server and need hosting space
 for your VCF files, please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the
 Track Hub Help documentation.</p>
 
 <p>The UCSC tools support VCF versions 3.3 and greater.</p>
 
 <p>If you have VCF trio data, you may be interested in formatting your track as a
 <a href="#trio">Phased Trios</a> track as described below.
 
 <h2>VCF Format </h2>
 <p>
 VCF is an all-purpose format for defining variants of all types: SNVs,
-CNVs and translocations.  It can annotate all the variants in an individual as well
+CNVs and translocations relative to a reference assembly.  It can annotate all the
+variants in an individual as well
 as a population.  Typically, a VCF file is too large to load directly into a custom
 track on the Browser and must be loaded as binary tabix-indexed file as described
 below.
 
 The full specification of VCF is found in
 <a href = "http://samtools.github.io/hts-specs/VCFv4.3.pdf" target = _blank>documents
 on github</a>.
 
 <p>Here is a look at an example from that file showing a few rows of data for
 three samples.  Details and descriptions of the data fields are in the
 <a href = "http://samtools.github.io/hts-specs/VCFv4.3.pdf" target = _blank>.pdf</a>.
 The data here can be pasted directly into hg18 human assembly on the Genome Browser.  We
 have added two lines at the top of the entry that are not in the official example
 to make the display work in the Browser.
 </p>
 <p>
 Note that the first data field, identifying the chromosome, is in official VCF
 format which does not include the "chr" usually associated with Genome Browser
 chrom names.  Either version will work in the Browser.
 </p>
 
 <p>
 <pre>
 track type=vcf name="vcf example" description="three samples in a vcf" db=hg18 visibility="full"
 browser position chr20:1-1306000
 ##fileformat=VCFv4.2
 ##fileDate=20090805
 ##source=myImputationProgramV3.1
 ##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
 ##contig=&lt;ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x&gt;
 ##phasing=partial
 ##INFO=&lt;ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"&gt;
 ##INFO=&lt;ID=DP,Number=1,Type=Integer,Description="Total Depth"&gt;
 ##INFO=&lt;ID=AF,Number=A,Type=Float,Description="Allele Frequency"&gt;
 ##INFO=&lt;ID=AA,Number=1,Type=String,Description="Ancestral Allele"&gt;
 ##INFO=&lt;ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"&gt;
 ##INFO=&lt;ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"&gt;
 ##FILTER=&lt;ID=q10,Description="Quality below 10"&gt;
 ##FILTER=&lt;ID=s50,Description="Less than 50% of samples have data"&gt;
 ##FORMAT=&lt;ID=GT,Number=1,Type=String,Description="Genotype"&gt;
 ##FORMAT=&lt;ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"&gt;
 ##FORMAT=&lt;ID=DP,Number=1,Type=Integer,Description="Read Depth"&gt;
 ##FORMAT=&lt;ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"&gt;
 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
 20	14370	rs6054257	G	A	29	PASS	NS=3;DP=14;AF=0.5;DB;H2	GT:GQ:DP:HQ	0|0:48:1:51,51	1|0:48:8:51,51	1/1:43:5:.,.
 20	17330	.	T	A	3	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3
 20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ	1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4
 20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:7:56,60	0|0:48:4:51,51	0/0:61:2
 20	1234567	microsat1	GTC	G,GTCT	50	PASS	NS=3;DP=9;AA=G	GT:GQ:DP	0/1:35:4	0/2:17:2	1/1:40:3
 
 </pre>
 
 </p>
 
 <h2>Generating a VCF track</h2>
 <p>
 The typical workflow for generating a VCF custom track is this:</p>
 <ol>
   <li>
   If you haven't done so already, 
   <a href="http://sourceforge.net/projects/samtools/files/tabix/" target="_blank"> download</a> 
   and build the <a href="http://samtools.sourceforge.net/tabix.shtml" target="_blank">tabix and 
   bgzip</a> programs. Test your installation by running tabix with no command-line 
   arguments; it should print a brief usage message. For help with tabix, please contact
   the <a href="https://lists.sourceforge.net/lists/listinfo/samtools-help" 
   target="_blank">samtools-help mailing list</a> (tabix is part of the samtools project).</li>
   <li>
   Create VCF or convert another format to VCF. Items must be sorted by genomic position.</li>
   <li>
   Compress your <em>.vcf</em> file using the <code>bgzip</code> program:
   <pre><code>bgzip my.vcf</code></pre>
   For more information about the <code>bgzip</code> command, run it with no arguments to
   display the usage message.</li>
   <li>
   Create a tabix index file for the bgzip-compressed VCF (<em>.vcf.gz</em>):
   <pre><code>tabix -p vcf my.vcf.gz</code></pre>
   The tabix command appends <em>.tbi</em> to the <em>my.vcf.gz</em> filename, creating a binary 
   index file named <em>my.vcf.gz.tbi</em> with which genomic coordinates can quickly be translated 
   into file offsets in <em>my.vcf.gz</em>.</li>
   <li>
   Move both the compressed VCF file (<em>my.vcf.gz</em>) and tabix index file 
   (<em>my.vcf.gz.tbi</em>) to an http, https, or ftp location.Note that the Genome Browser
   looks for an index file with the same URL as the VCF file with the .tbi suffix added. If your
   hosting site does not use the filename as the URL link, you will have to specifically
   call the location of this .vcf.tbi index file with the <code>bigDataIndex</code> keyword.
   This keyword is relevant for Custom Tracks and Track Hubs. You can read more about 
   <em>bigDataIndex</em> in
   <a href="trackDb/trackDbHub.html#bigDataIndex">the TrackDb Database Definition page</a>.</li>
   <li>
   Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single 
   <a href="hgTracksHelp.html#TRACK">track line</a>. The basic version of the track line will look 
   something like this:
   <pre><code>track type=vcfTabix name="My VCF" bigDataUrl=<em>http://myorg.edu/mylab/my.vcf.gz</em></code></pre>
   Again, in addition to <em>http://myorg.edu/mylab/my.vcf.gz</em>, the associated index file 
   <em>http://myorg.edu/mylab/my.vcf.gz.tbi</em> must also be available at the same location.</li>
   <li>
   Paste the custom track line into the text box in the <a href="../../cgi-bin/hgCustom" 
   target="_blank">custom track management page</a>, click &quot;submit&quot; and view in the Genome 
   Browser.</li>
 </ol>
 
 <h2>Parameters for VCF custom track definition lines</h2>
 <p>
 All options are placed in a single line separated by spaces (lines are broken only for readability 
 here):</p>
 <pre><code><strong>track type=vcfTabix bigDataUrl=</strong><em>http://...</em>
     <strong>hapClusterEnabled=</strong><em>true|false</em>
     <strong>hapClusterColorBy=</strong><em>altOnly|refAlt|base</em>
     <strong>hapClusterTreeAngle=</strong><em>triangle|rectangle</em>
     <strong>hapClusterHeight=</strong><em>N</em>
     <strong>applyMinQual=</strong><em>true|false</em> <strong>minQual=</strong><em>Q</em>
     <strong>minFreq=</strong><em>F</em>
     <strong>name=</strong><em>track_label</em>
     <strong>description=</strong><em>center_label</em> 
     <strong>visibility=</strong><em>display_mode</em>
     <strong>priority=</strong><em>priority</em>
     <strong>db=</strong><em>db</em> <strong>maxWindowToDraw=</strong><em>N</em> 
     <strong>chromosomes=</strong><em>chr1,chr2,...</em> </code></pre>
 <p>
 Note if you copy/paste the above example, you must remove the line breaks.
 Click <a href="examples/vcfExample.txt">here</a> for a text version that you can paste 
 without editing.</p>
 <p>
 The track type and bigDataUrl are REQUIRED:</p>
 <pre><code><strong>type=vcfTabix bigDataUrl=</strong><em>http://myorg.edu/mylab/my.vcf.gz</em></strong> </code></pre>
 <p>
 The remaining settings are OPTIONAL.  Some are specific to VCF:</p>
 <pre><code><strong>hapClusterEnabled   </strong><em>true|false             </em> # if file has phased genotypes, sort by local similarity
 <strong>hapClusterColorBy   </strong><em>altOnly|refAlt|base    </em> # coloring scheme, default altOnly, conditional on hapClusterEnabled
 <strong>hapClusterTreeAngle </strong><em>triangle|rectangle     </em> # draw leaves as < or [, default <, conditional on hapClusterEnabled
 <strong>hapClusterHeight    </strong><em>N                      </em> # height of track in pixels, default 128, conditional on hapClusterEnabled
 <strong>applyMinQual        </strong><em>true|false             </em> # if true, don't display items with QUAL < minQual; default false
 <strong>minQual             </strong><em>Q                      </em> # minimum value of Q column to display item, conditional on applyMinQual
 <strong>minFreq             </strong><em>F                      </em> # minimum minor allele frequency to display item; default 0.0 </code></pre>
 <p>
 Other optional settings are not specific to VCF, but relevant:</p>
 <pre><code><strong>name            </strong><em>track label                </em> # default is "User Track"
 <strong>description     </strong><em>center label               </em> # default is "User Supplied Track"
 <strong>visibility      </strong><em>squish|pack|full|dense|hide</em> # default is hide (will also take numeric values 4|3|2|1|0)
 <strong>priority        </strong><em>N                          </em> # default is 100
 <strong>db              </strong><em>genome database            </em> # e.g. hg19 for Human Feb. 2009 (GRCh37)
 <strong>maxWindowToDraw </strong><em>N                          </em> # don't display track when viewing more than N bases
 <strong>chromosomes     </strong><em>chr1,chr2,...              </em> # track contains data only on listed reference assembly sequences </code></pre>
 The <a target="_blank" href="hgVcfTrackHelp.html">VCF track configuration</a> help page
 describes the VCF track configuration page options.</p>
 
 <a name="trio"></a>
 <h2>Phased Trio format</h2>
 <p>The vcfPhasedTrio track type is available for users whose VCF contains genotype data from one
 to three individuals. The underlying VCF follows the standard VCF format as described above, with
 the added caveat that there must be GENOTYPE columns for each of the individuals present. An
 example of the trio display is show below for the 
 <a href="../../cgi-bin/hgTrackUi?db=hg38&g=tgpTrios">1000 Genomes Trio track on Human/GRCh38</a>:
 </p>
 
 <div class="text-center">
 <a href="../../cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=dschmelt&hgS_otherUserSessionName=tgpTrios">
 <img width="80%" height="80%" 
 src="../../images/trioExample.png" alt="3 VCF Phased Trio tracks along with the GENCODE 
 v32 genes from the Human/GRCh38 assembly. Each of two diploid haplotypes for each individual 
 in a trio is drawn as a black lane, with snps as vertical ticks on the haplotype they fall on. 
 Ticks are shaded blue,red,green or black according to their predicted functional effect."></a>
 </div>
 
 <p>
 Unlike a regular genome browser track, Trio tracks display the genome variants of each individual
 as two haplotypes; SNPs, small insertions and deletions are mapped to each haplotype based on the
 phasing information of the VCF file. Each haplotype is displayed on two separate, horizontal black
 lines across the browser window. Each variant is drawn as a vertical dash. Homozygous variants will
 show two identical dashes on both haplotype lines. Phased heterozygous variants are placed on one
 of the haplotype lanes and unphased heterozygous variants are displayed in the area between the
 two haplotype lines.
 </p>
 
 <p>
 Follow the steps for a normal VCF file, including moving the file to a web accessible location 
 and generating a tabix index file, then use the following required vcfPhasedTrio trackDb settings 
 to view the trio display:
 <pre><code><strong>type                   </strong><em>vcfPhasedTrio                </em> # The track type is required and must be &quot;vcfPhasedTrio&quot;
 <strong>bigDataUrl             </strong><em>http://url.to.vcfFile        </em> # The bigDataUrl is required
 <strong>vcfChildSample         </strong><em>GT ID|alias                  </em> # the Genotype column ID of the "child" sample, with an optional &quot;|&quot; followed by a human readable alias for the ID
 </code></pre>
 <p>There are also two optional settings for vcfPhasedTrio tracks:</p>
 <pre><code><strong>vcfParentSamples       </strong><em>GT ID1|alias1,GT ID2|alias2  </em> # comma separated (no spaces) list of the &quot;parent&quot; samples, with optional aliases
 <strong>vcfUseAltSampleNames   </strong><em>GT ID                        </em> # Use the aliases in the display by default instead of the Genotype column ID
 </code></pre>
 <p>Other optional settings are not specific to VCF, but relevant:</p>
 <pre><code><strong>maxWindowToDraw        </strong><em>N                            </em> # don't display track when viewing more than N bases
 <strong>chromosomes            </strong><em>chr1,chr2,...                </em> # track contains data only on listed reference assembly sequences </code></pre>
 
 <h2>Examples</h2>
 <h3>Example #1</h3>
 <p>
 In this example, you will create a custom track for an indexed VCF file that is already on a public 
 server &mdash; variant calls generated by the <a href="http://1000genomes.org/" 
 target="_blank">1000 Genomes Project</a>. The line breaks inserted here for readability must be 
 removed before submitting the track line:</p>
 <pre><code>browser position chr21:33,034,804-33,037,719
 track type=vcfTabix name="VCF Example One" description="VCF Ex. 1: 1000 Genomes phase 1 interim SNVs"
     chromosomes=chr21 maxWindowToDraw=200000
     db=hg19 visibility=pack
     bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/vcfExample.vcf.gz </code></pre>
 <p>
 The &quot;browser&quot; line above is used to view a small region of chromosome 21 with variants 
 from the <em>.vcf.gz</em> file.</p>
 <p>
 Note if you copy/paste the above example, you must remove the line breaks (or, click 
 <a href="examples/vcfExampleOne.txt">here</a> for a text version that you can paste without 
 editing).</p>
 <p>
 Paste the &quot;browser&quot; line and &quot;track&quot; line into the 
 <a href="../../cgi-bin/hgCustom" target="_blank">custom track management page</a> for the human 
 assembly hg19 (Feb. 2009), then click the &quot;submit&quot; button. On the following page, click 
 the &quot;chr21&quot; link in the custom track listing to view the VCF track in the Genome 
 Browser.</p>
 
 <h3>Example #2</h3>
 <p>
 In this example, you will create compressed, indexed VCF from an existing VCF text file. First, save
 this VCF file -- <em><a href="examples/vcfExampleTwo.vcf" target="_blank">vcfExampleTwo.vcf</a></em>
 -- to your machine. Perform steps 1 and 3-7 in the workflow described above, but substitute
 <em>vcfExampleTwo.vcf</em> for <em>my.vcf</em>. On the <a href="../../cgi-bin/hgCustom" 
 target="_blank">custom track management page</a>, click the &quot;add custom tracks&quot; button if 
 necessary and make sure that the genome is set to &quot;Human&quot; and the assembly is set to 
 &quot;Feb. 2009 (hg19) &quot; before pasting the track line and submitting. Remember to remove the 
 line breaks that have been added to the track line for readability (or, click 
 <a href="examples/vcfExampleTwo.txt">here</a> for a text version that you can paste without 
 editing):</p>
 <pre><code>track type=vcfTabix name="VCF Example Two" bigDataUrl=<em>http://myorg.edu/mylab/vcfExampleTwo.vcf.gz</em>
     description="VCF Ex. 2: More variants from 1000 Genomes" visibility=pack db=hg19 chromosomes=chr21
 
 browser position chr21:33,034,804-33,037,719
 browser pack snp132Common</code></pre>
 
 <h3>Example #3</h3>
 <p>
 In this example, you will load a hub that has VCF data described in a hub's trackDb.txt file.
 First, navigate to the <a href="hubQuickStart.html" target="_blank">Basic Hub Quick Start Guide</a>
 and review an introduction to hubs.</p>
 <p>
 Visualizing VCF files in hubs involves creating three text files: <em>hub.txt</em>, 
 <em>genomes.txt</em>, and <em>trackDb.txt</em>. The browser is passed a URL to the top-level 
 <em>hub.txt</em> file that points to the related <em>genomes.txt</em> and <em>trackDb.txt</em> 
 files. The <em>trackDb.txt</em> file contains stanzas for each track that outlines the details 
 and type of each track to display, such as these lines for a VCF file located at the bigDataUrl 
 location:</p>
 <pre><code>track vcf1
 bigDataUrl http://hgdownload.soe.ucsc.edu/gbdb/hg19/1000Genomes/ALL.chr21.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz
 #Note: there is a corresponding fileName.vcf.gz.tbi in the same above directory
 shortLabel chr21 VCF example
 longLabel This chr21 VCF file is an example from the 1000 Genomes Phase 1 Integrated Variant Calls Track
 type vcfTabix
 visibility dense </code></pre>
 <p>
 <b>Note:</b> there is now a <code>useOneFile on</code> hub setting that allows the hub
 properties to be specified in a single file. More information about this setting can be found on the
 <a href="./hgTracksHelp.html#UseOneFile" target="_blank">Genome Browser User Guide</a>.</p>
 <p>
 Here is a direct link to the <em><a href="examples/hubDirectory/hg19/trackDb.txt" 
 target="_blank">trackDb.txt</a></em> file to see more information about this example hub, and below 
 is a direct link to visualize the hub in the browser, where this example VCF file displays in dense 
 mode alongside the other tracks in this hub. You can find more Track Hub VCF display options on the 
 <a href="trackDb/trackDbHub.html#vcfTabix" target="_blank">Track Database (trackDb) Definition 
 Document</a> page.</p>
 <pre><code><a href="../../cgi-bin/hgTracks?db=hg19&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt"
 target="_blank">http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt</a> </code></pre>
 
 <h2>Sharing Your Data with Others</h2>
 <p>
 If you would like to share your VCF data track with a colleague, learn how to create a URL by 
 looking at <strong>Example 6</strong> on the <a href="customTrack.html#SHARE">custom tracks</a> 
 page.</p>
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->