9a1029b1c89bb9c8aa387cd20c2e83f8c21c546a markd Mon Mar 15 16:36:03 2021 -0700 make gfServer tests insensitive to version changes diff --git src/blat/version.doc src/blat/version.doc index 5609eff..c4eb896 100644 --- src/blat/version.doc +++ src/blat/version.doc @@ -1,248 +1,248 @@ 38: o (in 37x1) Added Dynamic BLAT using virtual memory mapped precreated indexes on disk. - This allows gfServe to be launched on-demand by xinetd. + This allows gfServer to be launched on-demand by xinetd. 37: o (in 36x1) Fixed problem with -fastMap option that made it put in gaps in the allignment sometimes when a short mismatch would be a better choice. o (in 36x2) Fixed problem with -maxIntron set to non-default value on protein query. o (in 36x3) Added ipv6 support. o (in 36x4) Added _alt support. o (in 36x5) If ipv6 disabled, retries with ipv4-only socket. o (in 36x6) Added noSimpRepMask option to blat gfServer isPcr to skip simple repeat masking. o (in 36x7) Fixed compiler optimization issue on mac 2020-06-17 in dlSort(). o (in 36x8) Added -timeout option to prevent hanging on blocking input for server read and write. o (in 36x9) Changed default -timeout to 90 seconds instead of 10. Improved handling skips further sending on socket after failure to avoid multiple long timeouts. 36: o (in 35x1) added repMatch default values for tileSizes 16 to 18 in genoFind.c 35: o (in 34x1) Making total query output reporting a 64 bit number to avoid overflow when people using more than 4 gig of query sequence. o (in 34x2) Fixed -out=blast to use +/- instead of -/+ for non-translated. o (in 34x3) Fixed -minScore, filter was not working when over half query-size. o (in 34x4) Made it convert u's to t's for RNA sequence stuff. o (in 34x5) Made gfServer calculate repMatch based on stepSize/tileSize combination the way blat does rather than just being good for stepSize 11. o (in 34x6) Fixed negative strand pcr psl output o (in 34x7) Made it check and error out if the same name is reused in the target database. o (in 34x8) Truncate in output PCR primers that dangle off target chrom ends. o (in 34x9) If -fastMap is used, check that query sizes do not exceed MAXSINGLEPIECESIZE. o (in 34x10) Added IP-logging option to gfServer. o (in 34x11) Fixed segfault when maxIntron <= 11 and tileSize <= 11. o (in 34x12) Fixed segfault when -fine -minIdentity=0 -minScore=0. o (in 34x13) out=sim4: Now filters on -minIdentity, and has alignments output in the right order. 34: o (in 33x1) Put in check for two bit files with more than 4 gig. o (in 33x2) Allowing -maxGap in gfServer. (Forgot to add it to options table.) o (in 33x3) Making -maxIntron work better. o (in 33x4) Fixed very rare bug that manifested doing protein/translated DNA alignments in gfClient (not blat) in some frames when the alignment extended all the way to the end of the DNA. o (in 33x4) Support .nib and .2bit:seq[:start-end] file types for query. o (in 33x5) Making polyA trimming leave two bases so as not to mess up stop codon in rna's that just contain coding region. o (in 33x6) Improving usage message on repMatch. o (in 33x7) Making blat not abort with error on empty target database. o (in 33x8) Fixing bug in translated DNA/protein case where a missing x3 in the ffAli to chain caused a disagreement between the chainer and the rest of the program on gap scores, sometimes causing good alignments to be thrown out. 33: o Fixed problem with out=blast8 and out=blast9 output using dnax or rnax options where coordinates on the minus strand were given relative to end of sequence rather than beginning. o Made translated blast output of protein vs. DNA give DNA (multiplied by three) coordinates for both start and end of DNA side of match. o Making translated blast output work better (Fan's work). o Improving splice sites handling a little in a case that effects about 1 in 500 alignments. o Fixed a divide by zero error that happened with minScore=0 in a few rare cases involving sequences with rows of N's. o Added checking of command line so that misspelled options result in an error message instead of being silently ignored. 32: o Dealing with out of order exons in a more sophisticated way so as to fix another problem introduced by last change in v29. o Fixed problem in gfClient where it was only looking at top three chains in a region as opposed to the top 16 chains that standalone blat finds. Gunnar at Decode found a case, AK090418 on 8p23.1 where this caused two good alignments to be dropped. o The banded extension phase used when the -fine flag is set now uses affine rather than fixed gap penalties. o PolyA trimming now accommodates a small amount of sequencing error. o Fixing another way that you could get out of order exons. (This came up aligning zebrafish bacEnd sequences to the zebrafish genome.) 31: o Fixed a pretty bad memory leak when doing blast type output. Also fixed a more subtle one that occured in some cases usually involving repeat-rich query sequences. o Fixed bug where blat of protein vs. translated DNA -q=prot -t=dnax would drop exons in some situations. This was a side effect of the last change in version 29. Slippery stuff this! 30: o Fixed bug in webBlat when using "Blat's Guess" and no sequence name. 29: o Created webBlat, a CGI script that blats and nicely displays the results without requiring the whole UCSC browser and database to work. o Made -out=blast9 and -out=blast8 report e values in format that will print 1.3e-123 instead of 0.00000. o Made non-psl output for gfClient show the chromosome name rather than the chromosome nib/2bit file name and subsequence. o Adding -stepSize to gfServer and blat, so that can have tiles overlap. o Adding PCR handling to gfServer. (Still need a separate pcr client.) o Put in some code to remove out of order exons that were generated in some rare situations. This was a side effect of the second change in version 26. 28: o Blat, gfClient, and gfServer now work with .2bit files. Like .nib files, .2bit files are a compressed binary random access format for DNA. There are two advantages to the .2bit files over .nibs. Most importantly a single .2bit file can hold multiple sequence records. Also .2bit files as the name implies, use just 2 bits per DNA base rather than the 4 bits/base found in a .nib file. The .2bit files still do allow for soft-masking by lower case, and for N characters. They do this by storing lists of masked areas and N's separately from the sequence. There are also faToTwoBit and twoBitToFa utilities to convert in and out of this format. o Making minScore work with other alignment types than psl. 27: o Made it so that gfClient does not abort when gfServer runs out of memory on a request. Instead it just prints a warning message. 26: o fixed pslPretty to allow '--' strands in axt output format o fixed -long flag in pslPretty which was swapping target and query sequence on long inserts o fixed bug that was causing blat to drop an exon in around 1% of genes. o put in error message when faToNib is used on fa files with more than one record. 25: o Made it so that gfServer does not abort when it runs out of memory. o Fixed bug in pslPretty in case where there is a small gap in both query and target sides of alignment. o Fixed bug in out=blast or out=wublast where in some cases negative strand alignments did not line up properly. o Fixed crash when aligning translated sequences of less than 3 bases. o Added extendThroughN flag. 24: o Tweaked gap penalty in trimFlakyEnds, which should make it so that with the -fine flag you get less small terminal exons with bad splice sites way away from the rest of the alignment. o Fixing a memory leak. o Adding sim4 compatable output. o Adding tabular blast compatable output (blast -m 8 and -m 9) as -out=blast8 and -out=blast9. 23: Made -maxIntron a command line option. Default is 750000. 22: Fixed bug in jiggle-small exons that had reverse strand consensus as ca/ct rather than ac/ct. Thanks to Par Engstrom for finding this. 21: Added -fine parameter. This will result in improved alignments for high quality mRNA sequences. When -fine is used a banded Smith-Waterman is used in the extension phase, and the program looks harder in various ways for small initial or terminal exons. Currently I don't recommend using -fine on EST sequences, as EST ends tend to be of low quality, and the short additional matches it finds are as likely to be by chance as real hit. For fully sequenced mRNAs though the -fine option is an advantage. It is only available currently in blat, not in gfClient. 20: o Recompiled without -o3 in Linux. This recent change to a makefile ended up causing problems. 19: o Bumping max size of intron that gets stitched to 750kb from 500kb. o Handle a rare case that could lead to zero sized exons. o Handled a rare case that could lead to backtracking. o Switched blat from cgiSpoof to optionHash so that it can be called from a web page more easily. o Adding -fastMap option for very fast high similarity DNA searches (blat only, not gfServer). o Making it able to take subsets of a nib file as input. o Incorperating Arnaldur's improvement where clumps are sorted by amount of query sequence covered rather than by hit count. This will help gfServer/gfClient performance in particular on sequences containing tandem repeats. o added maxAaSize and maxNtSize options to gfServer command line. 18: o Replaced super-stitcher dynamic program that generates a table of size N*N times the number of blocks with a kdTree based program one that is much more space efficient and a bit faster, especially in the worse cases. This eliminates the 'warning trimming xxxx blocks to 7000' messages and fixes a crash bug manifesting specifically on the alpha archetecture as well. It still uses the old routine when the number of blocks is less than 10. o Made it so that it will find perfect 20-mers with tileSize=10. o Fixed bug where it wasn't closing genome files, resulting in it only being able to process 250 files on the genome side on most systems. o Fixed bug in sort gfHitSort2. I'm amazed blat has worked as well as it has as long as it has with this bug! o Put a 'Database: ' line at the end of the blast output to make bioperl happy. o Fixed bug where you had to do -oneOff=1 rather than -oneOff in command line of blat. o Clarified makeOoc usage message. o Made it handle fasta files with more than 64k of sequence in a single line. 17 - Added -minScore -minIdentity and -prot command line options to gfClient. gfClient uses same protein default minScore as blat does. Made -mask=lower work with nib files in blat, and -mask do the same thing in gfServer (though gfClient still won't fill in repMatches separately from matches). Made 'stdin' and 'stdout' work properly as parameters in place of files in the blat command line. 16 - Fixed case where program would abort with 'Negative gap size' message on some legitimate protein/translated alignments in modes using blast, wublast, axt, or maf output. Fixed bug where nibFrag on minus strand would return usage message rather than running. Made blat store 'chr1' rather than '/long/path/to/chr1.nib' in name fields when input is a nib file. Made pslPretty handle record separator lines of up to 1023 characters rather than 511. 15 - Blast output seems to work. Also wublast, axt and maf output. 14 - Starting to hack in blast output format. Added -trimHardA flag. 13 - Made things so that BLAT could find perfect 21mer matches fairly reliably. 12 - Put in fix from Darren Platt that cured crashes on protein/translated DNA alignments on Solaris. 11 - Fixed a bug that led to translated protein alignments coming back in multiple pieces. This also caused small exons to sometimes be dropped. Added -noTrimA flag