65f00dd6b049342f185de41542e9c968ec3e98d3 markd Wed Dec 9 21:31:03 2020 -0800 added gfServer design overview diff --git src/gfServer/design.txt src/gfServer/design.txt new file mode 100644 index 0000000..3aa03fc --- /dev/null +++ src/gfServer/design.txt @@ -0,0 +1,48 @@ +Dynamic gfServer design: + +The standard method of running gfServer is as an in-memory TCP/IP server. +This results in fast response times. However, this can result in a high cost +for many infrequently used genomes. With one pair of servers per genome, +frequently used genomes serialize requests while seldom used servers are +mostly idle. Indexing a genome on startup may take over five minutes, which is +prohibitive for on-demand startup. + +The dynamic mode of gfServer pre-indexes a genome and saves the index to a +file. When the server starts, the index is map into gfServer's virtual memory +and accessed directly. If the index has recently been accessed, it will still +be in memory. If not, it is paged into memory. + +Dynamic gfServer runs via the xinetd super-server. When a client connects to +xinetd, it starts a gfServer process to handle the request. The number of +parallel process can be limited in the xinetd configuration. The server is +given a root directory, and each request is passed a directory (genomeDataDir) +relative to the root and a genome name. The genomeDataDir contains the 2bit +file, translates and untranslated indices for the genome, and must follow this +naming convention: + + $rootdir/$genomeDataDir/$genome.2bit + $rootdir/$genomeDataDir/$genome.untrans.gfidx\n" + $rootdir/$genomeDataDir/$genome.trans.gfidx\n" + +All search parameters, such as tileSize, are store in the index file and can +not be changed at runtime. + +To allow the index file mapping into virtual memory at any address, they +contain offsets and arrays, not pointers. The file starts with a header, +defined by struct genoFindIndexFileHdr in jkOwnLib/genoFind.c. + +Untranslated indexes have a single set of hash arrays, and translated indexes +contain six sets of hash arrays. Each set of hash arrays starts with a section +defined by struct genoFindFileHdr. + +When an index is opened, the entire file is mapped into virtual memory using +the mmap system call, and a struct genoFindIndex object build with points into +the file's address space. A madvise system call informs the kernel that the +entire file will be accessed randomly. + +The static server protocol only handles one command per connection. A +translated DNA query is implemented as six requires to the server. To +implement the "I'm feeling luckly", hgBlats makes an addition status query. To +avoid loading the index multiple time for a given client task, gfServer +supports multiple commands over a single connection. +