4f059beb788c0f7685ae2c95958d45e649f2f805 galt Mon Sep 19 14:10:25 2016 -0700 updating webBlat docs and making consistent diff --git src/webBlat/install.txt src/webBlat/install.txt index 31b438a..5644488 100644 --- src/webBlat/install.txt +++ src/webBlat/install.txt @@ -17,93 +17,93 @@ The input to faToTwoBit is one or more fasta format files each of which can contain multiple records. If the sequence contains repeat sequences, as is the case with vertebrates and many plants, the repeat sequences can be represented in lower case and the other sequence in upper case. The gfServer program can be configured to ignore the repeat sequences. The output of faToTwoBit is a file which is designed for fast random access and efficient storage. The output files store four bases per byte. They use a small amount of additional space to store the case of the DNA and to keep track of runs of N's in the input. Non-N ambiguity codes such as Y and U in the input sequence will be converted to N. Here's how a typical installation might create a mouse and a human genome database: cd /data/genomes mkdir twoBit - faToTwoBit human/hg16/*.fa twoBit/hg16.2bit - faToTwoBit mouse/mm4/*.fa twoBit/mm4.2bit + faToTwoBit human/hg19/*.fa twoBit/hg19.2bit + faToTwoBit mouse/mm10/*.fa twoBit/mm10.2bit There's no need to put all of the databases in the same directory, but it can simplify bookkeeping. The databases can also be in the .nib format which was used with blat and gfClient/gfServer until recently. The .nib format only packed 2 bases per byte, and could only handle one record per nib file. Recent versions of blat and related programs can use .2bit files as well. CREATING IN-MEMORY INDICES WITH GFSERVER The gfServer program creates an in-memory index of a nucleotide sequence database. The index can either be for translated or untranslated searches. Translated indexes enable protein-based blat queries and use approximately two bytes per unmasked base in the database. Untranslated indexes are used nucleotide-based blat queries as well as for In-silico PCR. An index for normal blat uses approximately 1/4 byte per base. For blat on smaller (primer-sized) queries or for In-silico PCR a more thorough index that requires 1/2 byte per base is recommended. The gfServer is memory intensive but typically doesn not require a lot of CPU power. Memory permitting multiple gfServers can be run on the same machine. A typical installation might go: ssh bigRamMachine cd /data/genomes/twoBit - gfServer start bigRamMachine 17779 hg16.2bit & - gfServer -trans -mask start bigRamMachine 17778 hg16.2bit & + gfServer start bigRamMachine 17779 hg19.2bit & + gfServer -trans -mask start bigRamMachine 17778 hg19.2bit & the -trans flag makes a translated index. It will take approximately 15 minutes to build an untranslated index, and 45 minutes to build a translate index. To build an untranslated index to be shared with In-silico PCR do - gfServer -stepSize=5 bigRamMachine 17779 hg16.2bit & + gfServer -stepSize=5 bigRamMachine 17779 hg19.2bit & This index will be slightly more sensitive, noticeably so for small query sequences, with blat. EDITING THE WEBBLAT.CFG FILE The webBlat.cfg file tells the webBlat program where to look for gfServers and for sequence. The basic format of the .cfg file is line oriented with the first word of the line being a command. Blank lines and lines starting with # are ignored. The webBlat.cfg and webPcr.cfg files are similar. The webBlat.cfg commands are: gfServer - defines host and port a (untranslated) gfServer is running on, the associated sequence directory, and the name of the database to display in the webPcr web page. gfServerTrans - defines location of a translated server. background - defines the background image if any to display on web page company - defines company name to display on web page tempDir - where to put temporary files. This path is relative to where the web server executes CGI scripts. It is good to remove files that haven't been accessed for 24 hours from this directory periodically, via a cron job or similar mechanism. The background and company commands are optional. The webBlat.cfg file must have at least one valid gfServer or gfServerTrans line, and a tempDir line. Here is a webBlat.cfg file that you might find at a typical installation: company Awesome Research Amalgamated background /images/dnaPaper.jpg -gfServerTrans bigRamMachine 17778 /data/genomes/2bit Human Genome -gfServer bigRamMachine 17779 /data/genomes/2bit Human Genome -gfServerTrans mouseServer 17780 /data/genomes/2bit Mouse Genome -gfServer mouseServer 17781 /data/genomes/2bit Mouse Genome +gfServerTrans bigRamMachine 17778 /data/genomes/twoBit/hg19.2bit Human Genome +gfServer bigRamMachine 17779 /data/genomes/twoBit/hg19.2bit Human Genome +gfServerTrans bigRamMachine 17780 /data/genomes/twoBit/mm10.2bit Mouse Genome +gfServer bigRamMachine 17781 /data/genomestwoBit/mm10.2bit Mouse Genome tempDir ../tmp PUTTING WEBBLAT WHERE THE WEB SERVER CAN EXECUTE IT The details of this step vary highly from web server to web server. On a typical Apache installation it might be: ssh webServer cd kent/webBlat cp webBlat /usr/local/apache/cgi-bin cp webBlat.cfg /usr/local/apache/cgi-bin assuming that you've put the executable and config file in kent/webBlat. On OS-X, instead of /usr/local/apache/cgi-bin typically you'll copy stuff to /LibraryWebServer/CGI-Executables. Unless you are administering your own computer you will likely need to ask your local system administrators for help with this step.