6840fda62e60999b2c5858609efa2c6f361ded2f
lrnassar
  Fri Oct 18 09:43:50 2019 -0700
Expanding blat FAQ, adding new entry for 1 base difference refs #24298

diff --git src/hg/htdocs/FAQ/FAQblat.html src/hg/htdocs/FAQ/FAQblat.html
index a5991f2..7a54f0b 100755
--- src/hg/htdocs/FAQ/FAQblat.html
+++ src/hg/htdocs/FAQ/FAQblat.html
@@ -12,30 +12,32 @@
 <ul>
 <li><a href="#blat1">BLAT vs. BLAST</a></li>
 <li><a href="#blat1b">Blat cannot find a sequence at all or not all expected matches</a></li>
 <li><a href="#blat1c">Blat or In-Silico PCR finds multiple matches such as chr_alt or chr_fix even though only one is
 expected</a></li>
 <li><a href="#blat2">BLAT use restrictions</a></li>
 <li><a href="#blat3">Downloading Blat source and documentation</a></li>
 <li><a href="#blat5">Replicating web-based Blat parameters in command-line version</a></li>
 <li><a href="#blat6">Using the <em>-ooc</em> flag</a></li>
 <li><a href="#blat4">Replicating web-based Blat percent identity and score calculations</a></li>
 <li><a href="#blat7">Replicating web-based Blat &quot;I'm feeling lucky&quot; search 
 results</a></li>
 <li><a href="#blat8">Using Blat for short sequences with maximum sensitivity</a></li>
 <li><a href="#blat9">Blat ALL genomes</a></li>
 <li><a href="#blat10">Blat ALL genomes: No matches found</a></li>
+<li><a href="#blat11">Approximating web-based Blat results using gfServer/gfClient</a></li>
+<li><a href="#blat12">Standalone or gfServer/gfClient result start positions off by one</a></li>
 
 </ul>
 <hr>
 <p>
 <a href="index.html">Return to FAQ Table of Contents</a></p>
 
 <a name="blat1"></a>
 <h2>BLAT vs. BLAST</h2>
 <h6>What are the differences between BLAT and BLAST?</h6>
 <p>
 BLAT is an alignment tool like BLAST, but it is structured differently. On DNA, BLAT works by 
 keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of 
 GenBank sequences, but instead an index derived from the assembly of the entire genome. By default,
 the index consists of all non-overlapping 11-mers except for those heavily involved in repeats, and 
 it uses less than a gigabyte of RAM. This smaller size means that BLAT is far more easily 
@@ -148,37 +150,41 @@
 Blat source may be downloaded from 
 <a href="http://hgdownload.soe.ucsc.edu/admin/">http://hgdownload.soe.ucsc.edu/admin/</a> (located 
 at /kent/src/blat within the most recent jksrci*.zip source tree). For Blat executables, go to 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">http://hgdownload.soe.ucsc.edu/admin/exe/</a> 
 and choose your machine type.</p>
 <p>
 Documentation on Blat program specifications is available 
 <a href="../goldenPath/help/blatSpec.html">here</a>. Note that the command-line BLAT 
 does not return matches to U nucleotides in the query sequence.</p>
 
 <a name="blat5"></a>
 <h2>Replicating web-based Blat parameters in command-line version</h2>
 <h6>I'm setting up my own Blat server and would like to use the same parameter values that the 
 UCSC web-based Blat server uses.</h6> 
 <p>
-We almost always expect small differences between the hgBlat/gfServer and the 
+We almost always <strong>expect small differences</strong> between the hgBlat/gfServer and the 
 stand-alone, command-line Blat. The best matches can be found using pslReps and pslCDnaFilter 
 utilities. The web-based Blat is tuned permissively with a minimum cut-off score of 20, which will 
 display most of the alignments. We advise deciding which 
 filtering parameters make the most sense for the experiment or analysis. Often these settings will 
 be different and more stringent than those of the web-based Blat. With that in mind, use the 
-following settings to replicate the search results of the web-based Blat:</p>
+following settings to approximate the search results of the web-based Blat:</p>
+<p>
+<strong>Note:</strong> There are cases where the gfServer/gfClient approach provide a better
+approximation of web results than standalone Blat. See the <a href="#blat11">example below</a>
+for an overview of this process.</p>
 <p>
 <em>standalone Blat</em>:</p>
 <ul>
   <li>Blat search:<br>
   &nbsp;&nbsp;&nbsp;<code>blat -stepSize=5 -repMatch=2253 -minScore=20 -minIdentity=0
   database.2bit query.fa output.psl </code><br></li>
   <li><strong>Note:</strong> To replicate web results, PSL output should be used. BLAT handles
   alternative output formats (such as blast8) slightly differently, and this can lead to minor
   differences in results; particularly for short alignments. Furthermore, the query sequence
   should have all U nucleotides converted to T nucleotides or have the "-q=rna" flag used
   to match the web-Blat.</li>
 </ul>
 <p>
 <em>faToTwoBit</em>:</p>
 <ul>
@@ -386,17 +392,65 @@
 After the index is built, the first step of alignment is to read the query (search) sequence, 
 extract all the 11-mers, and look those up in the genome 11-mer index currently in memory. 
 Matches found there represent the initial &quot;hits&quot; you see in the Blat All results page. 
 The next step is to look for hits that overlap or fall within a certain distance of each other, 
 and attempt to align the sequences between the hit locations in target and query.</p>
 
 <p>
 For example, if two 11-base tile hits align perfectly, it would result in a score of 22. This is 
 above the minimum required score of 20 (see <a href="#blat9">BLAT ALL genomes</a>), and would be 
 reported as an alignment. However, there are penalties for gaps and mismatches, as well as potential 
 overlap (see stepsize in <a href="../goldenPath/help/blatSpec.html">BLAT specifications</a>), all 
 of which could bring the score below 20. In that case, BLAT All would report 2 &quot;hits&quot;, 
 but clicking into the assembly would report no matches. This most often occurs when there are 
 only a few (1-3) hits reported by BLAT All.</p>
 
+<a name="blat11"></a>
+<h2>Approximating web-based Blat results using gfServer/gfClient</h2>
+
+<p>
+Often times using the gfServer/gfClient provides a better approximation or even replicate of
+the web-based Blat results, which otherwise cannot be found using standalone Blat. This approach
+mimics the blat server used by the Genome Browser web-based Blat. The following example will show
+how to set up an hg19 gfServer, then make a query. First, download the appropriate utility for
+the operating system and give it executable permissions:</p>
+<pre>
+#For linux
+rsync -a rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/blat/ ./
+#For MacOS
+rsync -a rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/macOSX.x86_64/blat/ ./
+
+chmod +x gfServer gfClient blat
+</pre>
+<p>
+Next, download the appropriate .2bit genome (hg19 in this example), and run the gfServer
+utility with the web Blat parameters, designating the local machine and port 1234:</p>
+<pre>
+wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit
+./gfServer start 127.0.0.1 1234 -stepSize=5 hg19.2bit
+</pre>
+<p>
+After a few moments, the gfServer will initialize and be ready to recieve queries. In order
+to apporximate web Blat we will us the gfClient with the following parameters, designating
+our input and output files.</p>
+<pre>
+./gfClient -minScore=20 -minIdentity=0 127.0.0.1 1234 . input.fa out.psl
+</pre>
+<p>The output file <code>out.psl</code> should have results very similar to web-based Blat.</p>
+
+<a name="blat12"></a>
+<h2>Standalone or gfServer/gfClient result start positions off by one</h2>
+<h6>My standalone Blat results or gfServer/gfClient blat results have a start
+position that is one less that what I see on web blat results</h6>
+<p>
+This is due to how we store internal coordinates in the Genome Browser. The default
+blat <strong>Output type</strong> of <strong>hyperlink</strong> shows results in our
+internal coordinate data structure. These internal coordinates have a zero-based start
+and a one-based end. See the following <a target="_blank" href="/FAQ/FAQtracks#tracks1"
+>FAQ entry</a> for more information.</p>
+<p>
+If the <strong>Output type</strong> is changed to <strong>psl</strong> on web blat, the same
+zero-based half open coordinate results will be seen as the standalone blat and gfServer/gfClient 
+procedures.</p>
+
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->
 </body>