src/webBlat/install.txt 87a162ccdc3ac94525fc1011dec5bc10e2cf2171

87a162ccdc3ac94525fc1011dec5bc10e2cf2171
markd
  Thu Jul 2 16:12:57 2020 -0700
web blat working

diff --git src/webBlat/install.txt src/webBlat/install.txt
index 5644488..fa9e401 100644
--- src/webBlat/install.txt
+++ src/webBlat/install.txt
@@ -1,109 +1,141 @@
 INSTALLING WEBBLAT
 
 Installing A Web-Based Blat Server involves four major steps:
  1) Creating sequence databases.
  2) Running the gfServer program to create in-memory indexes of the databases.
  3) Editing the webBlat.cfg file to tell it what machine and port the gfServer(s)
     are running on, and optionally customizing the webBlat appearance to users.
  4) Copying the webBlat executable and webBlat.cfg to a directory where the web server can
     execute webBlat as a CGI.
 
 CREATING SEQUENCE DATABASES
 
 You create databases with the program faToTwoBit. Typically you'll create
 a separate database for each genome you are indexing.  Each database can
 contain up to four billion bases of sequence in an unlimited number of
 records.  The databases for webPcr and webBlat are identical.
 
 The input to faToTwoBit is one or more fasta format files each of which
 can contain multiple records.  If the sequence contains repeat sequences,
 as is the case with vertebrates and many plants, the repeat sequences can
 be represented in lower case and the other sequence in upper case.  The gfServer
 program can be configured to ignore the repeat sequences.  The output of
 faToTwoBit is a file which is designed for fast random access and efficient
 storage.  The output files store four bases per byte.  They use a small amount
 of additional space to store the case of the DNA and to keep track of runs of
 N's in the input.  Non-N ambiguity codes such as Y and U in the input sequence
 will be converted to N.
 
 Here's how a typical installation might create a mouse and a human genome database:
     cd /data/genomes
     mkdir twoBit
     faToTwoBit human/hg19/*.fa twoBit/hg19.2bit
     faToTwoBit mouse/mm10/*.fa twoBit/mm10.2bit
 There's no need to put all of the databases in the same directory, but it can
 simplify bookkeeping.  
 
 The databases can also be in the .nib format which was used with blat and
 gfClient/gfServer until recently.  The .nib format only packed 2 bases per
 byte, and could only handle one record per nib file.  Recent versions of blat
 and related programs can use .2bit files as well.
 
 CREATING IN-MEMORY INDICES WITH GFSERVER
 
 The gfServer program creates an in-memory index of a nucleotide sequence database.  
 The index can either be for translated or untranslated searches.  Translated indexes
 enable protein-based blat queries and use approximately two bytes per unmasked base 
 in the database.  Untranslated indexes are used nucleotide-based blat queries as well
 as for In-silico PCR.  An index for normal blat uses approximately 1/4 byte per base.  For
 blat on smaller (primer-sized) queries or for In-silico PCR a more thorough index that
 requires 1/2 byte per base is recommended.  The gfServer is memory intensive but typically
 doesn not require a lot of CPU power.  Memory permitting multiple gfServers can be
 run on the same machine.  
 
 A typical installation might go:
     ssh bigRamMachine
     cd /data/genomes/twoBit
     gfServer start bigRamMachine 17779 hg19.2bit &
     gfServer -trans -mask start bigRamMachine 17778 hg19.2bit &
 the -trans flag makes a translated index.   It will take approximately
 15 minutes to build an untranslated index, and 45 minutes to build a
 translate index.  To build an untranslated index to be shared with 
 In-silico PCR do
     gfServer -stepSize=5 bigRamMachine 17779 hg19.2bit &
 This index will be slightly more sensitive, noticeably so for small query sequences,
 with blat.
 
+
+CREATING A DYNAMIC GFSERVER
+
+To avoid having a larger number of gfServer processing sitting in memory,
+indices can be pre-generate as files on disk and then gfServer instances
+launched dynamically using xinetd.
+
+See gfServer/blat-xinetd.example for an example xinetd configuration file.
+Data files must be create using the following layout, where the $rootdir
+is the directory specified to gfServer in the xinetd file
+
+    $rootdir/$genome/$genome.2bit
+    $rootdir/$genome/$genome.untrans.gfidx
+    $rootdir/$genome/$genome.trans.gfidx
+
+The gfServer index command is used to generate the indices:
+
+  gfServer index $rootdir/$genome/$genome.untrans.gfidx $rootdir/$genome/$genome.2bit
+  gfServer index -trans $rootdir/$genome/$genome.trans.gfidx $rootdir/$genome/$genome.2bit
+
+
 EDITING THE WEBBLAT.CFG FILE
 
 The webBlat.cfg file tells the webBlat program where to look for gfServers and
 for sequence.  The basic format of the .cfg file is line oriented with the
 first word of the line being a command.  Blank lines and lines starting with #
 are ignored.  The webBlat.cfg and webPcr.cfg files are similar. The webBlat.cfg
 commands are:
    gfServer  -  defines host and port a (untranslated) gfServer is running on, the 
    		associated sequence directory, and the name of the database to display in
                 the webPcr web page.
    gfServerTrans - defines location of a translated server.
+   dynServer - define the location of a dynamic gfServer that runs under xinetd with
+               pre-build indexes.
    background - defines the background image if any to display on web page
    company    - defines company name to display on web page
    tempDir    - where to put temporary files.  This path is relative to where the web
                 server executes CGI scripts.  It is good to remove files that haven't
 		been accessed for 24 hours from this directory periodically, 
 		via a cron job or similar mechanism.
 The background and company commands are optional.  The webBlat.cfg file must
 have at least one valid gfServer or gfServerTrans line, and a tempDir line.
 Here is a webBlat.cfg file that you
 might find at a typical installation:
 
 company Awesome Research Amalgamated
 background /images/dnaPaper.jpg
-gfServerTrans bigRamMachine 17778 /data/genomes/twoBit/hg19.2bit Human Genome
-gfServer bigRamMachine 17779 /data/genomes/twoBit/hg19.2bit Human Genome
-gfServerTrans bigRamMachine 17780 /data/genomes/twoBit/mm10.2bit Mouse Genome
-gfServer bigRamMachine 17781 /data/genomestwoBit/mm10.2bit Mouse Genome
+gfServerTrans bigRamMachine 17778 /data/genomes/twoBit Human Genome
+gfServer bigRamMachine 17779 /data/genomes/twoBit Human Genome
+gfServerTrans bigRamMachine 17780 /data/genomes/twoBit Mouse Genome
+gfServer bigRamMachine 17781 /data/genomestwoBit Mouse Genome
 tempDir ../tmp
 
+
+For dynamic gfServer, which server both untranslated and translated request,
+must be given a genome name, multiple genomes can be on same port.
+The specification is in the form:
+
+dynServer mediumRamMachine 17799 hg19 /data/genomes/hg19 Human Genome
+dynServer mediumRamMachine 17799 mm10 /data/genomes/mm10 Mouse Genome
+
+
 PUTTING WEBBLAT WHERE THE WEB SERVER CAN EXECUTE IT
 
 The details of this step vary highly from web server to web server.  On
 a typical Apache installation it might be:
      ssh webServer
      cd kent/webBlat
      cp webBlat /usr/local/apache/cgi-bin
      cp webBlat.cfg /usr/local/apache/cgi-bin
 assuming that you've put the executable and config file in kent/webBlat.
 On OS-X, instead of /usr/local/apache/cgi-bin typically you'll copy stuff
 to /LibraryWebServer/CGI-Executables.  Unless you are administering your
 own computer you will likely need to ask your local system administrators
 for help with this step.