155197c63e3259d0922c5c3a0e859f4812129db4
galt
  Sun May 17 17:52:03 2020 -0700
gfServer and blat and isPcr extended with new commandline option -noSimpRepMask to help with small genomes. refs #25477

diff --git src/blat/version.doc src/blat/version.doc
index fe10fbc..c0cc8a7 100644
--- src/blat/version.doc
+++ src/blat/version.doc
@@ -1,240 +1,241 @@
 37:
      o (in 36x1) Fixed problem with -fastMap option that made it put in gaps in the
        allignment sometimes when a short mismatch would be a better choice.
      o (in 36x2) Fixed problem with -maxIntron set to non-default value on protein query.
      o (in 36x3) Added ipv6 support.
      o (in 36x4) Added _alt support.
      o (in 36x5) If ipv6 disabled, retries with ipv4-only socket.
+     o (in 36x6) Added noSimpRepMask option to blat gfServer isPcr to skip simple repeat masking.
 
 36:
      o (in 35x1) added repMatch default values for tileSizes 16 to 18 in genoFind.c
 
 35:
      o (in 34x1) Making total query output reporting a 64 bit number to avoid 
        overflow when people using more than 4 gig of query sequence.
      o (in 34x2) Fixed -out=blast to use +/- instead of -/+ for non-translated.
      o (in 34x3) Fixed -minScore, filter was not working when over half query-size.
      o (in 34x4) Made it convert u's to t's for RNA sequence stuff.
      o (in 34x5) Made gfServer calculate repMatch based on stepSize/tileSize combination the way blat does
        rather than just being good for stepSize 11.
      o (in 34x6) Fixed negative strand pcr psl output
      o (in 34x7) Made it check and error out if the same name is reused in the target database.
      o (in 34x8) Truncate in output PCR primers that dangle off target chrom ends.
      o (in 34x9) If -fastMap is used, check that query sizes do not exceed MAXSINGLEPIECESIZE.
      o (in 34x10) Added IP-logging option to gfServer.
      o (in 34x11) Fixed segfault when maxIntron <= 11 and tileSize <= 11.
      o (in 34x12) Fixed segfault when -fine -minIdentity=0 -minScore=0.
      o (in 34x13) out=sim4: Now filters on -minIdentity, and has alignments output in the right order.
      
 
 34:
      o (in 33x1) Put in check for two bit files with more than 4 gig.
      o (in 33x2) Allowing -maxGap in gfServer.  (Forgot to add it to
        options table.)
      o (in 33x3) Making -maxIntron work better.
      o (in 33x4) Fixed very rare bug that manifested doing protein/translated
        DNA alignments in gfClient (not blat) in some frames when the
        alignment extended all the way to the end of the DNA.
      o (in 33x4) Support .nib and .2bit:seq[:start-end] file types for query.
      o (in 33x5) Making polyA trimming leave two bases so as not to mess up
        stop codon in rna's that just contain coding region.
      o (in 33x6) Improving usage message on repMatch.
      o (in 33x7) Making blat not abort with error on empty target database.
      o (in 33x8) Fixing bug in translated DNA/protein case where a missing x3
        in the ffAli to chain caused a disagreement between the chainer
        and the rest of the program on gap scores, sometimes causing 
        good alignments to be thrown out.
 
 33:
      o Fixed problem with out=blast8 and out=blast9 output using
        dnax or rnax options where coordinates on the minus strand
        were given relative to end of sequence rather than beginning.
      o Made translated blast output of protein vs. DNA give DNA
        (multiplied by three) coordinates for both start and end
        of DNA side of match.
      o Making translated blast output work better (Fan's work).
      o Improving splice sites handling a little in a case that
        effects about 1 in 500 alignments.
      o Fixed a divide by zero error that happened with minScore=0
        in a few rare cases involving sequences with rows of N's.
      o Added checking of command line so that misspelled options
        result in an error message instead of being silently
        ignored.
 
 32:
      o Dealing with out of order exons in a more sophisticated way
        so as to fix another problem introduced by last change
        in v29.
      o Fixed problem in gfClient where it was only looking at top
        three chains in a region as opposed to the top 16 chains that
        standalone blat finds.  Gunnar at Decode found a case,
        AK090418 on 8p23.1 where this caused two good alignments to
        be dropped.
      o The banded extension phase used when the -fine flag is set
        now uses affine rather than fixed gap penalties.
      o PolyA trimming now accommodates a small amount of sequencing
        error.
      o Fixing another way that you could get out of order exons.
        (This came up aligning zebrafish bacEnd sequences to the
        zebrafish genome.)
 
 31:
      o Fixed a pretty bad memory leak when doing blast type output.
        Also fixed a more subtle one that occured in some cases usually
        involving repeat-rich query sequences.
      o Fixed bug where blat of protein vs. translated DNA
        -q=prot -t=dnax would drop exons in some situations.  This
        was a side effect of the last change in version 29.  
        Slippery stuff this!
 
 30:
      o Fixed bug in webBlat when using "Blat's Guess" and no
        sequence name.
 
 29:
      o Created webBlat, a CGI script that blats and nicely displays the
        results without requiring the whole UCSC browser and database to
        work.
      o Made -out=blast9 and -out=blast8 report e values in format
        that will print 1.3e-123 instead of 0.00000.
      o Made non-psl output for gfClient show the chromosome name
        rather than the chromosome nib/2bit file name and subsequence.
      o Adding -stepSize to gfServer and blat, so that can have
        tiles overlap.
      o Adding PCR handling to gfServer.  (Still need a separate
        pcr client.)
      o Put in some code to remove out of order exons that were generated
        in some rare situations.  This was a side effect of the second change
        in version 26.
 
 28:
      o Blat, gfClient, and gfServer now work with .2bit files.
        Like .nib files, .2bit files are a compressed binary
        random access format for DNA.  There are two advantages
        to the .2bit files over .nibs.  Most importantly a single
        .2bit file can hold multiple sequence records.  Also .2bit
        files as the name implies, use just 2 bits per DNA base
        rather than the 4 bits/base found in a .nib file.  The .2bit
        files still do allow for soft-masking by lower case, and
        for N characters.  They do this by storing lists of masked
        areas and N's separately from the sequence.  There are also
        faToTwoBit and twoBitToFa utilities to convert in and out of
        this format.
      o Making minScore work with other alignment types than psl.
 
 27:
      o Made it so that gfClient does not abort when gfServer runs out
        of memory on a request.  Instead it just prints a warning
        message.
 
 26:
      o fixed pslPretty to allow '--' strands in axt output format
      o fixed -long flag in pslPretty which was swapping target and
        query sequence on long inserts
      o fixed bug that was causing blat to drop an exon in around
        1% of genes.
      o put in error message when faToNib is used on fa files with
        more than one record.
 
 25:
      o Made it so that gfServer does not abort when it runs out
        of memory.
      o Fixed bug in pslPretty in case where there is a small
        gap in both query and target sides of alignment.
      o Fixed bug in out=blast or out=wublast where in some
        cases negative strand alignments did not line up
        properly.
      o Fixed crash when aligning translated sequences of less than 3 bases.
      o Added extendThroughN flag.
 
 24:
      o Tweaked gap penalty in trimFlakyEnds, which should make it so
        that with the -fine flag you get less small terminal exons with bad
        splice sites way away from the rest of the alignment.
      o Fixing a memory leak.
      o Adding sim4 compatable output.
      o Adding tabular blast compatable output (blast -m 8 and -m 9) as
        -out=blast8 and -out=blast9.
 
 23:
      Made -maxIntron a command line option.  Default is 750000.
 
 22:
      Fixed bug in jiggle-small exons that had reverse strand
      consensus as ca/ct rather than ac/ct.  Thanks to Par
      Engstrom for finding this.
 
 21:
      Added -fine parameter.  This will result in improved alignments
      for high quality mRNA sequences.  When -fine is used a banded
      Smith-Waterman is used in the extension phase,  and the program
      looks harder in various ways for small initial or terminal
      exons.   Currently I don't recommend using -fine on EST
      sequences, as EST ends tend to be of low quality,  and the
      short additional matches it finds are as likely to be by
      chance as real hit.  For fully sequenced mRNAs though the -fine
      option is an advantage.  It is only available currently in
      blat, not in gfClient.   
 
 20:
      o  Recompiled without -o3 in Linux.  This recent change to a makefile
         ended up causing problems.
 19: 
      o  Bumping max size of intron that gets stitched to 750kb from 500kb.
      o  Handle a rare case that could lead to zero sized exons.
      o  Handled a rare case that could lead to backtracking.
      o  Switched blat from cgiSpoof to optionHash so that it can
         be called from a web page more easily.
      o  Adding -fastMap option for very fast high similarity DNA
         searches (blat only, not gfServer).
      o  Making it able to take subsets of a nib file as input.
      o  Incorperating Arnaldur's improvement where clumps are
         sorted by amount of query sequence covered rather than
 	by hit count.  This will help gfServer/gfClient performance
 	in particular on sequences containing tandem repeats.
      o  added maxAaSize and maxNtSize options to gfServer command line.
 18: 
      o  Replaced super-stitcher dynamic program that generates 
         a table of size N*N times the number of blocks
         with a kdTree based program one that is much more space efficient 
         and a bit faster, especially in the worse cases.  This eliminates
         the 'warning trimming xxxx blocks to 7000' messages and
         fixes a crash bug manifesting specifically on the alpha
         archetecture as well.  It still uses the old routine when the
 	number of blocks is less than 10.
      o  Made it so that it will find perfect 20-mers with tileSize=10.
      o  Fixed bug where it wasn't closing genome files, resulting in
         it only being able to process 250 files on the genome side
 	on most systems.
      o  Fixed bug in sort gfHitSort2.  I'm amazed blat has worked
         as well as it has as long as it has with this bug!
      o  Put a 'Database: ' line at the end of the blast output to make
         bioperl happy.
      o  Fixed bug where you had to do -oneOff=1 rather than -oneOff
         in command line of blat.  
      o  Clarified makeOoc usage message.
      o  Made it handle fasta files with more than 64k of sequence
         in a single line.  
 17 - Added -minScore -minIdentity and -prot command line options
      to gfClient.  gfClient uses same protein default minScore as 
      blat does. Made -mask=lower work with nib files in blat, 
      and -mask do the same thing in gfServer (though gfClient
      still won't fill in repMatches separately from matches).
      Made 'stdin' and 'stdout' work properly as parameters in
      place of files in the blat command line.
 16 - Fixed case where program would abort with 'Negative gap size'
      message on some legitimate protein/translated alignments in modes
      using blast, wublast, axt, or maf output.   Fixed bug where
      nibFrag on minus strand would return usage message rather than running.
      Made blat store 'chr1' rather than '/long/path/to/chr1.nib' in
      name fields when input is a nib file.   Made pslPretty handle
      record separator lines of up to 1023 characters rather than 511.
 15 - Blast output seems to work.  Also wublast, axt and maf output.
 14 - Starting to hack in blast output format. Added -trimHardA flag.
 13 - Made things so that BLAT could find perfect 21mer matches
      fairly reliably.
 12 - Put in fix from Darren Platt that cured crashes on
      protein/translated DNA alignments on Solaris.
 11 - Fixed a bug that led to translated protein alignments coming
      back in multiple pieces.  This also caused small exons
      to sometimes be dropped.  Added -noTrimA flag