e53a538138cc2e88a4c823a1442aef461cc0824b kent Wed Aug 14 22:37:11 2019 -0700 Adding the ternary conditional operator, as well as built in boolean functions in() same() starts() and ends() diff --git src/lib/strex.doc src/lib/strex.doc index 8990903..a6fd423 100644 --- src/lib/strex.doc +++ src/lib/strex.doc @@ -63,52 +63,68 @@ [3] = "3" - the fourth character (the fun of zero based indexes [-3] = "7" - the third character from the end [-3:] = "789" - last three characters of string [:-3] = "0123456" - everything up to the last three Python actually goes further than this and allows a third, step, specification that strex has not implemented. untsv(string, index) - separate by tab. Synonym for separate(string, '\t', index) uncsv(string, index) - do comma separated value extraction of string. Includes quote escaping. trim(string) - returns copy of string with leading and trailing spaces removed strip(string, toRemove) - remove all occurrences of any character in toRemove from string +upper(string) - returns all upper case version of string + +lower(string) - returns all lower case version of string + +md5(string) - returns an MD5 sum digest/hash of string. + symbol(prefix, string) - turn string into a computer usable symbol that starts with the given prefix. To create the rest of the symbol, the string is mangled. First the spaces, tabs, and newlines are all turned into _ chars, then any remaining characters that aren't ascii letters or numerical digits are removed. If the result is 32 characters or less it's used, but if it's longer it's converted into an MD5 sum. replace(string, oldPart, newPart) - returns string with all instances of old replaced by new. The cases where either old or new are empty string are useful special cases. If new is "", then all instances of the old string will be deleted. If old is "", then empty strings will be replaced by new strings, useful in setting a default value for a field. fix(string, target, newString) - similar to replace but works at the whole string level. Returns string unchanged except for the case where string matches target exactly. In that case it returns newString instead. The name "fix" comes from it being used generally to replace one constant, fixed, string with another. Also, a lot of the time when you do this it is to fix a small inconsistency in the metadata. In general fix is faster to execute and quicker to type than replace and the effects are more specific. example to help clean up minor variations in vocabulary fix(fix(fix(fix( sex, "M","male"), "F","female"), "Male","male") "Female","female") example to give something a value if non is present fix(requiredField, "", "reasonable default value") -pick(query, key1:val1, key2:val2, ... keyN:valN) +pick(query ? key1:val1, key2:val2, ... keyN:valN) Looks through keys for one that matches query, and returns associated value if it finds it. Otherwise returns empty string. Can be used to apply different expressions to parsing in different conditions: pick(species, "human" : ethnicity, "mouse" : strain ) +(boolean ? trueVal : falseVal) + This is the trinary conditional expression found in C, Python and many other languages. + If the boolean before the question mark is true, then the result is the trueVal before the + colon, otherwise it's the falseVal after the colon. Empty strings and zeros as booleans are + considered false, other strings and numbers true. + +in(string, query) - returns true if query is a substring of string, false otherwise + +same(a, b) - returns true if the two arguments are the same, false otherwise + +starts(prefix, string) - returns true if string starts with prefix -md5(string) - returns an MD5 sum digest/hash of string. Useful for creating IDs out of large or - merged fields. +ends(string, suffix) - returns true if string ends with suffix now() - returns current time and date in a really aweful unix ctime(2) format. We will improve it.