6840fda62e60999b2c5858609efa2c6f361ded2f lrnassar Fri Oct 18 09:43:50 2019 -0700 Expanding blat FAQ, adding new entry for 1 base difference refs #24298 diff --git src/hg/htdocs/FAQ/FAQblat.html src/hg/htdocs/FAQ/FAQblat.html index a5991f2..7a54f0b 100755 --- src/hg/htdocs/FAQ/FAQblat.html +++ src/hg/htdocs/FAQ/FAQblat.html @@ -12,30 +12,32 @@
Return to FAQ Table of Contents
BLAT is an alignment tool like BLAST, but it is structured differently. On DNA, BLAT works by keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences, but instead an index derived from the assembly of the entire genome. By default, the index consists of all non-overlapping 11-mers except for those heavily involved in repeats, and it uses less than a gigabyte of RAM. This smaller size means that BLAT is far more easily @@ -148,37 +150,41 @@ Blat source may be downloaded from http://hgdownload.soe.ucsc.edu/admin/ (located at /kent/src/blat within the most recent jksrci*.zip source tree). For Blat executables, go to http://hgdownload.soe.ucsc.edu/admin/exe/ and choose your machine type.
Documentation on Blat program specifications is available here. Note that the command-line BLAT does not return matches to U nucleotides in the query sequence.
-We almost always expect small differences between the hgBlat/gfServer and the +We almost always expect small differences between the hgBlat/gfServer and the stand-alone, command-line Blat. The best matches can be found using pslReps and pslCDnaFilter utilities. The web-based Blat is tuned permissively with a minimum cut-off score of 20, which will display most of the alignments. We advise deciding which filtering parameters make the most sense for the experiment or analysis. Often these settings will be different and more stringent than those of the web-based Blat. With that in mind, use the -following settings to replicate the search results of the web-based Blat:
+following settings to approximate the search results of the web-based Blat: ++Note: There are cases where the gfServer/gfClient approach provide a better +approximation of web results than standalone Blat. See the example below +for an overview of this process.
standalone Blat:
blat -stepSize=5 -repMatch=2253 -minScore=20 -minIdentity=0
database.2bit query.fa output.psl
faToTwoBit:
For example, if two 11-base tile hits align perfectly, it would result in a score of 22. This is above the minimum required score of 20 (see BLAT ALL genomes), and would be reported as an alignment. However, there are penalties for gaps and mismatches, as well as potential overlap (see stepsize in BLAT specifications), all of which could bring the score below 20. In that case, BLAT All would report 2 "hits", but clicking into the assembly would report no matches. This most often occurs when there are only a few (1-3) hits reported by BLAT All.
+ ++Often times using the gfServer/gfClient provides a better approximation or even replicate of +the web-based Blat results, which otherwise cannot be found using standalone Blat. This approach +mimics the blat server used by the Genome Browser web-based Blat. The following example will show +how to set up an hg19 gfServer, then make a query. First, download the appropriate utility for +the operating system and give it executable permissions:
++#For linux +rsync -a rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/blat/ ./ +#For MacOS +rsync -a rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/macOSX.x86_64/blat/ ./ + +chmod +x gfServer gfClient blat ++
+Next, download the appropriate .2bit genome (hg19 in this example), and run the gfServer +utility with the web Blat parameters, designating the local machine and port 1234:
++wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit +./gfServer start 127.0.0.1 1234 -stepSize=5 hg19.2bit ++
+After a few moments, the gfServer will initialize and be ready to recieve queries. In order +to apporximate web Blat we will us the gfClient with the following parameters, designating +our input and output files.
++./gfClient -minScore=20 -minIdentity=0 127.0.0.1 1234 . input.fa out.psl ++
The output file out.psl
should have results very similar to web-based Blat.
+This is due to how we store internal coordinates in the Genome Browser. The default +blat Output type of hyperlink shows results in our +internal coordinate data structure. These internal coordinates have a zero-based start +and a one-based end. See the following FAQ entry for more information.
++If the Output type is changed to psl on web blat, the same +zero-based half open coordinate results will be seen as the standalone blat and gfServer/gfClient +procedures.
+