c9814cbb9c9556663ee2cb8c9ea096a784582aac
dschmelt
  Fri Apr 19 16:48:30 2019 -0700
Adding a note about function of hap sort display as requested #22039

diff --git src/hg/htdocs/goldenPath/help/hgVcfTrackHelp.html src/hg/htdocs/goldenPath/help/hgVcfTrackHelp.html
index bf91342..45b4916 100755
--- src/hg/htdocs/goldenPath/help/hgVcfTrackHelp.html
+++ src/hg/htdocs/goldenPath/help/hgVcfTrackHelp.html
@@ -1,133 +1,134 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser VCF Tracks" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <!-- VCF Track Help Page -->
 
 <a name="Topic5"></a>
 <h1>Configuring VCF tracks</h1>
 <p>
 Genome Browser VCF tracks may be configured in a variety of ways to highlight different aspects of 
 the displayed information. By default, VCFs will display alleles with base-specific coloring.
 Homozygote data are shown as one letter, while heterozygotes will be displayed with both letters.
 </p>
 
 <p class='text-center'>
   <img src="../../images/vcfDefault.png" alt="VCF default display" width=600px>
   <p class='gbsCaption text-center'>The default VCF custom track will display colored bases and 
 will not show clustering unless specified as VCF/tabix in the custom track page.</p>
 </p>
 <p>
 The following section describes configuration settings available to VCF files compressed and
 indexed in the Tabix format. This requires VCF manipulation, separate index files, and a web
 accessible directory to reference from the bigDataUrl track line. For more information
 on setting up and uploading VCF/Tabix data, click the link on <a href="vcf.html" target="_blank">
 VCF custom track creation</a>.
 </p>
 <h2>Configuring the haplotype sorting display</h2>
 <p>
 If the VCF file contains genotype columns for at least two samples (four haplotypes), then a 
-haplotype sorting display can be configured.</p> 
+haplotype sorting display can be configured. This can be useful for determining similarity
+between the samples and possible inheritance at a particular loci. </p> 
 <p>
 <strong>Enable Haplotype sorting display:</strong> When this option is checked, each sample's phased
 and/or homozygous genotypes are split into haplotypes, clustered by similarity around a central 
 variant, and sorted for display by their position in the clustering tree. The tree (as 
 space allows) is drawn in the label area next to the track image. Leaf clusters, in 
 which all haplotypes are identical (at least for the variants used in clustering), are 
 colored purple. 
 </p>
 <p class='text-center'>
   <img src="../../images/vcfTree.png" alt="VCF tree diagram" width=600px>
   <p class='gbsCaption text-center'>The haplotype tree can be seen to the left of the track.</p>
 </p>
 <p>
 Each variant is drawn as a vertical column, using color to distinguish between 
 reference alleles and alternate alleles of the horizontally running haplotypes. If unchecked, then 
 the display is the same as for VCF without genotypes: a stacked bar graph of the top two alleles, 
 showing the proportion of alleles if allele counts are available. This setting is enabled by 
 default.</p>
 
 <p>
 The following options are applicable only when the haplotype sorting display is enabled:</p>
 <p> 
 <strong>Haplotype sorting order:</strong> Haplotypes are sorted using a distance function that 
 uses a central variant. Differences between haplotypes are penalized with weights that decrease 
 for each successive variant away from the central variant.  By default, the median variant in 
 the window is used. By clicking on a variant in the display, you will get the option to always 
 use that variant when it is in the current view.</p>
 <p>
 <strong>Haplotype coloring scheme:</strong> There are three ways that reference and alternate 
 alleles can be colored:</p>
 <ul>
   <li>
   By default, the reference allele is invisible and the alternate allele is black. When multiple
   haplotypes must be combined into the same pixel row, grayscale is used to shade according to 
   the proportions of reference and alternate alleles. The central variant has a thin purple 
   outline. Extra pixel rows at the top and bottom show the locations of variants in case they 
   are hard to see due when the invisible reference allele is the major allele. Variants used in 
   clustering have purple marks in these rows; variants outside the clustered regions have black 
   marks.</li>
   <li>
   The reference allele is blue and the alternate allele is red. Purple indicates a mix of 
   reference and alternate alleles. The central variant has a thick black outline.</li>
   <li>
   Both alleles are colored using the same color scheme as when there are no genotypes: A is 
   red, C is blue, G is green and T is magenta. Gray indicates a mix of reference and alternate 
   alleles. The central variant has a thick black outline.</li>
 </ul>
 <p>
 In all coloring modes, if some alleles in a haplotype are undefined, a pale yellowish color is 
 used for those alleles.</p>
 <p> 
 <strong>Haplotype clustering leaf shape:</strong> Leaf clusters are collections of identical 
 haplotypes. By default, they are drawn as open triangles &lt;. They can also be displayed as 
 open rectangles [.</p>
 
 <p class='text-center'>
   <img src="../../images/vcfSquare.png" alt="VCF options" width=600px>
 </p>
 
 
 <p>
 <strong>Haplotype sorting display height:</strong> This number represents the track height in
 pixels. If the number of pixels is fewer than the number of haplotypes (2 * the number of genotype
 columns), some horizontal pixel rows must represent multiple haplotypes; with differing haplotypes'
 colors combined according to the selected coloring scheme.</p>
 
 <p class='text-center'>
   <img src="../../images/vcfOptions.png" alt="VCF options" width=600px>
 </p>
 
 <h2>Filtering out variants</h2>
 <p>
 Variants can be filtered out of the display according to several properties:</p>
 <ul>
   <li>
   <strong>Exclude items with QUAL score less than <em>N</em>:</strong> If the checkbox is checked, 
   then all variants whose QUAL column has a non-numeric value (e.g. &quot;.&quot;) or a value less 
   than <em>N</em> are excluded from display. By default, the checkbox is not checked and <em>N</em> 
   is 0.</li>
   <li>
   <strong>Exclude items with these FILTER values:</strong> This option appears only if the VCF 
   header defines at least one FILTER code. There is a checkbox for each code defined in the header.
   If checked, then all variants with that code in the FILTER column are excluded from display. By 
   default, no checkboxes are checked, so all variants are displayed regardless of FILTER column 
   values.</li>
   <li>
   <strong>Minimum minor allele frequency (if INFO column includes AF or AC+AN):</strong> If a 
   variant's INFO field includes AF (alternate allele frequency) or both AC and AN (alternate allele 
   count and total number of alleles), then its minor allele frequency can be compared against this 
   threshold. If the minor allele frequency is less than the threshold, the variant will not be 
   displayed.</li>
 </ul>
 <p class='text-center'>
   <img src="../../images/vcfFilters.png" alt="VCF options" width=600px>
 </p>
 <p>
 When you have finished making your configuration changes, click the <button>Submit</button> button 
 to return to the annotation track display page.</p>
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->