src/blat/version.doc 1.74
1.74 2009/10/08 18:09:38 kent
Moving repMatch calculations from blat to library so can be shared with gfServer.
Index: src/blat/version.doc
===================================================================
RCS file: /projects/compbio/cvsroot/kent/src/blat/version.doc,v
retrieving revision 1.73
retrieving revision 1.74
diff -b -B -U 1000000 -r1.73 -r1.74
--- src/blat/version.doc 10 Feb 2009 21:59:11 -0000 1.73
+++ src/blat/version.doc 8 Oct 2009 18:09:38 -0000 1.74
@@ -1,219 +1,221 @@
35:
o (in 34x1) Making total query output reporting a 64 bit number to avoid
overflow when people using more than 4 gig of query sequence.
o (in 34x2) Fixed -out=blast to use +/- instead of -/+ for non-translated.
o (in 34x3) Fixed -minScore, filter was not working when over half query-size.
o (in 34x4) Made it convert u's to t's for RNA sequence stuff.
+ o (in 34x5) Made gfServer calculate repMatch based on stepSize/tileSize combination the way blat does
+ rather than just being good for stepSize 11.
34:
o (in 33x1) Put in check for two bit files with more than 4 gig.
o (in 33x2) Allowing -maxGap in gfServer. (Forgot to add it to
options table.)
o (in 33x3) Making -maxIntron work better.
o (in 33x4) Fixed very rare bug that manifested doing protein/translated
DNA alignments in gfClient (not blat) in some frames when the
alignment extended all the way to the end of the DNA.
o (in 33x4) Support .nib and .2bit:seq[:start-end] file types for query.
o (in 33x5) Making polyA trimming leave two bases so as not to mess up
stop codon in rna's that just contain coding region.
o (in 33x6) Improving usage message on repMatch.
o (in 33x7) Making blat not abort with error on empty target database.
o (in 33x8) Fixing bug in translated DNA/protein case where a missing x3
in the ffAli to chain caused a disagreement between the chainer
and the rest of the program on gap scores, sometimes causing
good alignments to be thrown out.
33:
o Fixed problem with out=blast8 and out=blast9 output using
dnax or rnax options where coordinates on the minus strand
were given relative to end of sequence rather than beginning.
o Made translated blast output of protein vs. DNA give DNA
(multiplied by three) coordinates for both start and end
of DNA side of match.
o Making translated blast output work better (Fan's work).
o Improving splice sites handling a little in a case that
effects about 1 in 500 alignments.
o Fixed a divide by zero error that happened with minScore=0
in a few rare cases involving sequences with rows of N's.
o Added checking of command line so that misspelled options
result in an error message instead of being silently
ignored.
32:
o Dealing with out of order exons in a more sophisticated way
so as to fix another problem introduced by last change
in v29.
o Fixed problem in gfClient where it was only looking at top
three chains in a region as opposed to the top 16 chains that
standalone blat finds. Gunnar at Decode found a case,
AK090418 on 8p23.1 where this caused two good alignments to
be dropped.
o The banded extension phase used when the -fine flag is set
now uses affine rather than fixed gap penalties.
o PolyA trimming now accommodates a small amount of sequencing
error.
o Fixing another way that you could get out of order exons.
(This came up aligning zebrafish bacEnd sequences to the
zebrafish genome.)
31:
o Fixed a pretty bad memory leak when doing blast type output.
Also fixed a more subtle one that occured in some cases usually
involving repeat-rich query sequences.
o Fixed bug where blat of protein vs. translated DNA
-q=prot -t=dnax would drop exons in some situations. This
was a side effect of the last change in version 29.
Slippery stuff this!
30:
o Fixed bug in webBlat when using "Blat's Guess" and no
sequence name.
29:
o Created webBlat, a CGI script that blats and nicely displays the
results without requiring the whole UCSC browser and database to
work.
o Made -out=blast9 and -out=blast8 report e values in format
that will print 1.3e-123 instead of 0.00000.
o Made non-psl output for gfClient show the chromosome name
rather than the chromosome nib/2bit file name and subsequence.
o Adding -stepSize to gfServer and blat, so that can have
tiles overlap.
o Adding PCR handling to gfServer. (Still need a separate
pcr client.)
o Put in some code to remove out of order exons that were generated
in some rare situations. This was a side effect of the second change
in version 26.
28:
o Blat, gfClient, and gfServer now work with .2bit files.
Like .nib files, .2bit files are a compressed binary
random access format for DNA. There are two advantages
to the .2bit files over .nibs. Most importantly a single
.2bit file can hold multiple sequence records. Also .2bit
files as the name implies, use just 2 bits per DNA base
rather than the 4 bits/base found in a .nib file. The .2bit
files still do allow for soft-masking by lower case, and
for N characters. They do this by storing lists of masked
areas and N's separately from the sequence. There are also
faToTwoBit and twoBitToFa utilities to convert in and out of
this format.
o Making minScore work with other alignment types than psl.
27:
o Made it so that gfClient does not abort when gfServer runs out
of memory on a request. Instead it just prints a warning
message.
26:
o fixed pslPretty to allow '--' strands in axt output format
o fixed -long flag in pslPretty which was swapping target and
query sequence on long inserts
o fixed bug that was causing blat to drop an exon in around
1% of genes.
o put in error message when faToNib is used on fa files with
more than one record.
25:
o Made it so that gfServer does not abort when it runs out
of memory.
o Fixed bug in pslPretty in case where there is a small
gap in both query and target sides of alignment.
o Fixed bug in out=blast or out=wublast where in some
cases negative strand alignments did not line up
properly.
o Fixed crash when aligning translated sequences of less than 3 bases.
o Added extendThroughN flag.
24:
o Tweaked gap penalty in trimFlakyEnds, which should make it so
that with the -fine flag you get less small terminal exons with bad
splice sites way away from the rest of the alignment.
o Fixing a memory leak.
o Adding sim4 compatable output.
o Adding tabular blast compatable output (blast -m 8 and -m 9) as
-out=blast8 and -out=blast9.
23:
Made -maxIntron a command line option. Default is 750000.
22:
Fixed bug in jiggle-small exons that had reverse strand
consensus as ca/ct rather than ac/ct. Thanks to Par
Engstrom for finding this.
21:
Added -fine parameter. This will result in improved alignments
for high quality mRNA sequences. When -fine is used a banded
Smith-Waterman is used in the extension phase, and the program
looks harder in various ways for small initial or terminal
exons. Currently I don't recommend using -fine on EST
sequences, as EST ends tend to be of low quality, and the
short additional matches it finds are as likely to be by
chance as real hit. For fully sequenced mRNAs though the -fine
option is an advantage. It is only available currently in
blat, not in gfClient.
20:
o Recompiled without -o3 in Linux. This recent change to a makefile
ended up causing problems.
19:
o Bumping max size of intron that gets stitched to 750kb from 500kb.
o Handle a rare case that could lead to zero sized exons.
o Handled a rare case that could lead to backtracking.
o Switched blat from cgiSpoof to optionHash so that it can
be called from a web page more easily.
o Adding -fastMap option for very fast high similarity DNA
searches (blat only, not gfServer).
o Making it able to take subsets of a nib file as input.
o Incorperating Arnaldur's improvement where clumps are
sorted by amount of query sequence covered rather than
by hit count. This will help gfServer/gfClient performance
in particular on sequences containing tandem repeats.
o added maxAaSize and maxNtSize options to gfServer command line.
18:
o Replaced super-stitcher dynamic program that generates
a table of size N*N times the number of blocks
with a kdTree based program one that is much more space efficient
and a bit faster, especially in the worse cases. This eliminates
the 'warning trimming xxxx blocks to 7000' messages and
fixes a crash bug manifesting specifically on the alpha
archetecture as well. It still uses the old routine when the
number of blocks is less than 10.
o Made it so that it will find perfect 20-mers with tileSize=10.
o Fixed bug where it wasn't closing genome files, resulting in
it only being able to process 250 files on the genome side
on most systems.
o Fixed bug in sort gfHitSort2. I'm amazed blat has worked
as well as it has as long as it has with this bug!
o Put a 'Database: ' line at the end of the blast output to make
bioperl happy.
o Fixed bug where you had to do -oneOff=1 rather than -oneOff
in command line of blat.
o Clarified makeOoc usage message.
o Made it handle fasta files with more than 64k of sequence
in a single line.
17 - Added -minScore -minIdentity and -prot command line options
to gfClient. gfClient uses same protein default minScore as
blat does. Made -mask=lower work with nib files in blat,
and -mask do the same thing in gfServer (though gfClient
still won't fill in repMatches separately from matches).
Made 'stdin' and 'stdout' work properly as parameters in
place of files in the blat command line.
16 - Fixed case where program would abort with 'Negative gap size'
message on some legitimate protein/translated alignments in modes
using blast, wublast, axt, or maf output. Fixed bug where
nibFrag on minus strand would return usage message rather than running.
Made blat store 'chr1' rather than '/long/path/to/chr1.nib' in
name fields when input is a nib file. Made pslPretty handle
record separator lines of up to 1023 characters rather than 511.
15 - Blast output seems to work. Also wublast, axt and maf output.
14 - Starting to hack in blast output format. Added -trimHardA flag.
13 - Made things so that BLAT could find perfect 21mer matches
fairly reliably.
12 - Put in fix from Darren Platt that cured crashes on
protein/translated DNA alignments on Solaris.
11 - Fixed a bug that led to translated protein alignments coming
back in multiple pieces. This also caused small exons
to sometimes be dropped. Added -noTrimA flag