835a903827681f7295594708dccaec0cbf2e20fa
kent
  Mon Aug 12 21:37:48 2019 -0700
Adding new built-in function fix() for doing constant/constant substitutions. Allowing negative indexes in split() and separate() that work like negative array indexes do.

diff --git src/lib/strex.doc src/lib/strex.doc
index 29dbd95..542ab3b 100644
--- src/lib/strex.doc
+++ src/lib/strex.doc
@@ -1,52 +1,75 @@
 The strex language is a small string expression evaluation language.  
 This document describes its built in functions and operators:
 
 +  - returns the concatenation of the surrounding strings.  Will convert a number to a string
 
-[ix] - selects a character from string given an integer zero based index. As in Python
-       if ix is negative it selects characters from the end of the string.  -1
+[index] - selects a character from string given an integer zero based index. As in Python
+       if index is negative it selects characters from the end of the string.  -1
        corresponds to the last character of string, as 0 corresponds to first.
+       Returns empty string if index out of range.
       
 between(prefix, string, suffix) - returns the part of string found between prefix and suffix
    example:   between("abc", "01234abcHelloxyz56789", "xyz")  fetches just "Hello"
    If there are multiple places the prefix occurs, it will choose the first one, and the
    then the first place the suffix matches after that.  The biologist might think of it as
    a text oriented PCR, though the primer prefix and suffixes are not included in the output.
    The prefix "" corresponds to beginning of string and the suffix "" corresponds to end.
    Returns empty string if nothing found.
 
 split(string, index)  - white space separated word from string of given 0 based index.
    Returns empty string if index is too large.
+   A negative input will work much like an array index does,  with -1 selecting the
+   last word in the string.  Returns empty string if index out of range.
 
-separate(string, splitter, index) - separates string with splitter character. Returns string of given index.
-   Returns empty string if index is too large.
+separate(string, splitter, index) - separates string with splitter character. Returns 
+   string of given index.  Index can be negative as in split to pick off of end rather than start.
+   Returns empty string if index is too out of rnge.
 
 [start:end] - Use a colon in an array index to select a range of a string.  This follows 
               Python conventions, where start is zero based and end is one past the end of 
 	      this string (or equivalently one based).  If start is left out the start of 
 	      the string is implied.  If end is left out the end of the string is emplied.  
 	      This leads to some curious but useful constructs such as these shown with an 
 	      example applied to the string "0123456789"
 	           [:3] = "012"       - first three characters of string
 		   [3:] = "3456789"    - everything past the first three characters
 		   [3:5] = "34"       - two characters from the fourth up through the fifth
 		   [5:3] = ""         - you get an empty string if the start is bigger than the end
 		   [3] = "3"          - the fourth character  (the fun of zero based indexes
 		   [-3] = "7"         - the third character from the end
 		   [-3:] = "789"      - last three characters of string
 		   [:-3] = "0123456"  - everything up to the last three
 	      Python actually goes further than this and allows a third, step, specification that
 	      strex has not implemented.  
 
 untsv(string, index) - separate by tab. Synonym for separate(string, '\t', index)
 
 uncsv(string, index) - do comma separated value extraction of string. Includes quote escaping.
 
 trim(string) - returns copy of string with leading and trailing spaces removed
 
-replace(string, old, new) - returns string with all instances of old replaced by new
+strip(string, toRemove) - remove all occurrences of any character in toRemove from string
+
+replace(string, oldPart, newPart) - returns string with all instances of old replaced by new.
+	    The cases where either old or new are empty string are useful special cases.
+	    If new is "", then all instances of the old string will be deleted.
+            If old is "", then empty strings will be replaced by new strings, useful in setting
+	    a default value for a field.  
+               
+fix(string, target, newString) - similar to replace but works at the whole string level.
+            Returns string unchanged except for the case where string matches target exactly.
+	    In that case it returns newString instead.  The name "fix" comes from it
+	    being used generally to replace one constant, fixed, string with another.
+	    Also, a lot of the time when you do this it is to fix a small inconsistency
+	    in the metadata.  In general fix is faster to execute and quicker to type than
+	    replace and the effects are more specific.
+    example to help clean up minor variations in vocabulary
+            fix(fix(fix(fix( sex, "M","male"),  "F","female"),  "Male","male")  "Female","female")
+    example to give something a value if non is present
+	    fix(requiredField, "", "reasonable default value")
 
-md5(string) - returns an MD5 sum digest/hash of string. Useful for creating IDs
+md5(string) - returns an MD5 sum digest/hash of string. Useful for creating IDs out of large or
+            merged fields. 
 
 now() - returns current time and date in a really aweful unix ctime(2) format.  We will improve it.