src/lib/asParse.c 2866dd07e3cc9012e19faf41c18529ea2f1d8e08

2866dd07e3cc9012e19faf41c18529ea2f1d8e08
galt
  Fri Apr 13 17:08:01 2012 -0700
This is a squashed merge to make git-reports code-review simpler.
The main thing is that there is a new shared validator routine in lib/basicBed.c
which uses asParse.c to handle bedPlus.

This validator is shared among validateFiles, bedToBigBed, hgLoadBed, and customTracks.

Some effort has been made to standardize commandline options,
and vf has been simplified a little by removing some debugging options.

vf has also recently gaind the ability to validate native bigBed format,
via some new code in linefile.c for attaching to a bigBed.

ct: use of the new validator is controlled by an hg.conf flag that
can be turned on and if needed turned off again.  It will be off
by default for now. As soon as we are happy with the code
and it has been established, we can remove the switch.

Code has been added to compare .as files,
and it is here used to compare against the library standard BED.

As an experiment I am leaving in the list of squashed commits messages below:

Squashed commit of the following:

commit a55eb050055911c699120432cc98e33cefa5fffc
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Fri Apr 13 17:00:30 2012 -0700

fixing freeMem bug; making better option-combination checking, fixing as, adding test bed6

commit ac4b98f41e89875bc100cd2f1c1fc3825cafa57d
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Fri Apr 13 12:56:32 2012 -0700

unless we add back in everywhere the -zerosOk option, we must tolerate them for SNP type objects

commit efa2c269a6df4ea0630084670d152b4d518e232b
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Apr 12 16:16:46 2012 -0700

adding bed15 example input

commit f2b5af00e2786283e785ef501caa036e5a945dc7
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Apr 12 11:09:18 2012 -0700

increasing maximum row length buffer automatically in lineFile on bigbed

commit 1bb07ce21fcd4ce1d55752b0b61997ffb71a159e
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Apr 12 11:03:59 2012 -0700

increasing maximum row length buffer automatically

commit 59e831df690c5b8ac68afc9648443b3cb5dfd51b
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Apr 12 10:45:22 2012 -0700

increasing maximum row length buffer

commit 711db7c2edf0a6f7b710fde1929af1ba388cb81d
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Apr 11 23:45:18 2012 -0700

adding lineFileOnBigBed, using it to add bigBed validation to validateFiles.

commit 1b5e5e1eaba802c25c94a8a3dd000898d2fb3150
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Apr 11 17:01:49 2012 -0700

for consistency with basicBed.c

commit ca8e7f93af179803f4a6fe2d073703f379e271a3
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Apr 11 16:55:26 2012 -0700

standardizing - have to call the field "reserved" so that .sql will contain the right name and existing trackhandlers will work

commit fea0cb3bb7163353496c3d092d3716f1b5c30e53
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Apr 11 12:12:25 2012 -0700

renaming option -tabs to -tab to be consistent with hgLoadBed

commit 6452f634c4cc7730b521869cdfb11f4253c59ff5
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 10 18:44:11 2012 -0700

trap aborts from weird errs reading as files

commit 9800e8416b5959d7a80f3785e116987569e7273c
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 10 18:14:05 2012 -0700

oops

commit f9f1b7f9d9e136864df4b908cd32773e1a04a260
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 10 18:08:13 2012 -0700

adding asCompare utility for comparing a given .as against many others

commit 605f9a9da14b3a168821519eba737b7f4cd48163
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 10 15:17:11 2012 -0700

add parameter to return the number of columns that did match the give .as, even if the entire match might fail

commit 86fb68307dfbefd2960c9173fbbae1c4d4c49b11
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 10 12:00:24 2012 -0700

adding a handy standard bed12 .as file for testing

commit 79bd2364e383e133c886d034a9176c6e636181cc
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 10 11:27:09 2012 -0700

added support for linked-Size in .as validation so that list sizes get validated

commit 67ee8435b668f779fce28744519e5e76d6ae3433
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 10 09:53:19 2012 -0700

oops signed flag was backwards

commit ebae2bef410150aefd39b7b01918fbc8c18e6785
Merge: ab0cee9 8f806f5
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 16:28:10 2012 -0700

Merge commit 'origin/master' into validateFiles

commit ab0cee922869996045074c6076c38abd8487d7c2
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 16:15:53 2012 -0700

fixing rgb

commit 4187a167aeeb69eb20ad9df8021db9c8f71d9193
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 14:42:27 2012 -0700

updated testing files

commit 0ee0c51132cb09a4e9909caa9718bdf7aa5af9ff
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 14:33:42 2012 -0700

fix err msg bug where colors field had already been chopped up by the parser

commit edeed9458c345d1d8629d1e1b5230eb6b9e424a8
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 14:16:27 2012 -0700

adding -tabs option but making whitespace the default. this is to make it like b2bb and hglb

commit 239ca9b99d04b7ff6817867160b34efe8af1d25a
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 12:35:06 2012 -0700

more printf fixes %d ==> %u

commit 57745b3096c096d73b02a651cba0060210acad07
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 12:19:46 2012 -0700

fixing some %d to %u for correct sign of bed struct members in printf

commit 7c3f3f24402f01f2f152f6b839b9efb01fa9c5c1
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 09:33:30 2012 -0700

oops need to use FromDatabase with chromDb option

commit 237ab4c1b707ba19ba12079483365269db3b6119
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Apr 9 09:26:43 2012 -0700

little fix removing unused option maxErrors

commit 689c10edba73f49658f315a14201468864e5d83c
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Sun Apr 8 12:07:04 2012 -0700

reducing redundancy by making allInts both polymorphic and fast; added checking of .as fields against BED standard for the first bedN columns

commit c6bd5e6eec93c4ca6e947e7028907110e4e73d0e
Merge: 4d09bfd b7b8f22
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Apr 4 12:58:34 2012 -0700

Merge commit 'origin/master' into validateFiles

commit 4d09bfdd35087b6637fa638297be9c07b45cb450
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 3 14:31:32 2012 -0700

because tabs may be used, cannot confirm here that the strings are non-empty, because they might be, so removing the check

commit cfaa756790ee029e480a045f05a51733ebef4e4b
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 3 14:25:03 2012 -0700

adding back the check for chrHash (chromDb or chromInfo) that was lost when I reverted the tabs option deletion

commit 5531303fe2ee39d28cec7c9f4baa472ce0ab5eba
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 3 14:20:27 2012 -0700

Revert "cleaned up unneeded options, using chopByWhite instead of chopByTab."

This reverts commit 458a52f976edade78177908bc9f5886c81e7b6ab.

commit 0ea039f35541360ca1d64d754af3f9dec6621a38
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Apr 3 14:15:45 2012 -0700

resolved reversion of fa3b343f1ff1eb7a50df9029e71141484f260a22

commit 44e7cbc4dbf3b5a955f7fa1eea6a65a7d8570528
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Mar 29 16:34:47 2012 -0700

removed unneeded errs variable

commit 404c095271351b38233124e74cd35cbc57b8aad4
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Mar 29 16:10:27 2012 -0700

removed line count variable

commit 5561bd6a7b85e6115f8162eb2e597de0f7c2c284
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Mar 29 16:06:03 2012 -0700

removed unneeded flags printFailLines printOkLines, and quick

commit 5a6eae96483d91baa7690df436d4becc13cdbf2f
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Mar 29 15:10:13 2012 -0700

jk prefers brackets to curly-braces

commit 7e431a76a215c1402c5a6a37816f6fc7ca234363
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Mar 29 15:01:45 2012 -0700

adding version # for b2bb, by crickets request

commit fa3b343f1ff1eb7a50df9029e71141484f260a22
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Mar 28 16:43:33 2012 -0700

removed -tabs option, using chopByWhite instead of chopByTab because of bed definition according to JK

commit 458a52f976edade78177908bc9f5886c81e7b6ab
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Mar 28 16:40:10 2012 -0700

cleaned up unneeded options, using chopByWhite instead of chopByTab.

commit 7e179aab9da9961221bf864fdd136ed8bbc0392b
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Mar 28 16:13:30 2012 -0700

removing zeroSizeOk option

commit 8c8ad8ca09239b7a7a8cc0de693b8a7c7d49a669
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Mar 28 15:51:45 2012 -0700

improving the wording of help

commit 1bf63d9d7e7fa89e060eab26a9751acd3c8ad8e4
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Mar 28 12:37:37 2012 -0700

changing edge-case definition slightly for chromEnd

commit 7e5c151234e12cd8b55d10e79042d7f92b3fcb5a
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Mar 28 12:04:29 2012 -0700

incrementing version

commit c0e129d917eaac888e6a9c521c4407bf9380b4b0
Merge: f6ce988 68a21eb
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Mar 28 11:16:45 2012 -0700

Merge commit 'origin/master' into validateFiles

commit f6ce988fb6f76bb74439c77c4094483f26465c90
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Feb 8 15:39:54 2012 -0800

oops

commit 3dfdebd293e93073354f9af486a7d9407f0b9471
Merge: cf104e3 75c9145
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Feb 8 14:59:58 2012 -0800

Merge commit 'origin/master' into validateFiles

commit cf104e362325a5832911022865bb69241e1e472d
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Feb 8 14:55:23 2012 -0800

adding hg.conf switch to activate new validator use

commit ef5806814a6e9730682ea56a0a0e45ab962a9ee1
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Feb 8 14:26:43 2012 -0800

various cleanup and consistency

commit 3e4e92b25cf414129ba0339a91023380c2f8f974
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Fri Feb 3 17:21:51 2012 -0800

adding chromDb, and chromInfo options to hgLoadBed

commit 05ed14c1e92684861789163e3c936bf07778acb2
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Feb 1 21:06:55 2012 -0800

added optional validation support for bed and bedPlus to hgLoadBed

commit 7445f19ac4009eed47b989999de35f92b550fac5
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Feb 1 20:09:54 2012 -0800

tested bedPlus, and extended checking for more types/cases, e.g. string~ and unsigned numbers

commit 31b44ec97ecd57101b3c65096288347592276d5d
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Jan 31 11:11:07 2012 -0800

adding support for context during array-list parsing

commit 5ffcc5974b59f00551940f0dc20b56f1e7990f12
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Mon Jan 30 11:15:12 2012 -0800

re-working things, adding better checking

commit 46e10b9ff5d32793656dfd1d136687f2a0733b4e
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Fri Jan 20 17:28:41 2012 -0800

validateFiles now supports bed, bedPlus using the shared validator lib function

commit d2adfda8e080b562c93084f956ddcdcf47f06974
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Fri Jan 20 02:33:09 2012 -0800

cleanup, handling ct differently than the others which care only for validation but not the actual bed results

commit 1fe9aa5028374750a491a0a3f836f4657f9b9a43
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Jan 19 16:54:12 2012 -0800

added support for bedPlus via .as object

commit 7b32efb4152f57038ae56d3dca15fd700398ea5a
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Thu Jan 19 16:13:40 2012 -0800

moved validation code from customFactory.c to basicBed.c, added validation support to b2bb

commit 64b293690baeb50ce375f34e01493dbc9a50cbf5
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Jan 17 14:06:35 2012 -0800

ok, have bed 12 linked-features validation working

commit f0a6d2c2a749fffedadc8189ff7ba86583003128
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Jan 11 11:37:10 2012 -0800

adding README describing how some bed tests are made

commit 46400cceb3a602423ae2a5656bc442d9969e63df
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Wed Jan 11 11:26:13 2012 -0800

fixed bigwig tests; added bed12ok test

commit 305e5c0e07210d278aa257d7c859bdd47abc77c5
Author: Galt Barber <galt@soe.ucsc.edu>
Date:   Tue Jan 10 17:05:13 2012 -0800

added support for bed files (not including bedPlus); also began working on re-organizing the make test cases

diff --git src/lib/asParse.c src/lib/asParse.c
index af82459..0505d4a 100644
--- src/lib/asParse.c
+++ src/lib/asParse.c
@@ -1,379 +1,518 @@
 /* asParse - parse out an autoSql .as file. */
 
 #include "common.h"
 #include "linefile.h"
 #include "tokenizer.h"
 #include "dystring.h"
 #include "asParse.h"
 
 
 /* n.b. switched double/float from %f to %g to partially address losing
  * precision.  Values like 2e-12 were being rounded to 0.0 with %f.  While %g
  * doesn't match the precision of the database fields, specifying a larger
  * precision with %g resulted in numbers like 1.9999999999999999597733e-12,
- *  which might impact load time.  THis issue needs more investigation.*/
+ *  which might impact load time.  This issue needs more investigation.*/
 struct asTypeInfo asTypes[] = {
-    {t_double,  "double",  FALSE, FALSE, "double",           
-            "double",        "Double", "Double", "%g", "FloatField"},
-    {t_float,   "float",   FALSE, FALSE, "float",            
-            "float",         "Float",  "Float",  "%g", "FloatField"},
-    {t_char,    "char",    FALSE, FALSE, "char",             
-            "char",          "Char",   "Char",   "%c", "CharField"},
-    {t_int,     "int",     FALSE, FALSE, "int",              
-            "int",           "Signed", "Signed", "%d", "IntegerField"},
-    {t_uint,    "uint",    TRUE,  FALSE, "int unsigned",
-            "unsigned",      "Unsigned","Unsigned", "%u", "PositiveIntegerField"},
-    {t_short,   "short",   FALSE, FALSE, "smallint",         
-            "short",         "Short",  "Signed", "%d", "SmallIntegerField"},
-    {t_ushort,  "ushort",  TRUE,  FALSE, "smallint unsigned",
-            "unsigned short","Ushort", "Unsigned", "%u", "SmallPositiveIntegerField"},
-    {t_byte,    "byte",    FALSE, FALSE, "tinyint",          
-            "signed char",   "Byte",   "Signed", "%d", "SmallIntegerField"},
-    {t_ubyte,   "ubyte",   TRUE,  FALSE, "tinyint unsigned",
-            "unsigned char", "Ubyte",  "Unsigned", "%u", "SmallPositiveIntegerField"},
-    {t_off,     "bigint",  FALSE,  FALSE,"bigint",           
-            "long long",     "LongLong", "LongLong", "%lld", "BigIntegerField"},
-    {t_string,  "string",  FALSE, TRUE,  "varchar(255)",     
-            "char *",        "String", "String", "%s", "CharField"},
-    {t_lstring,    "lstring",    FALSE, TRUE,  "longblob",   
-            "char *",        "String", "String", "%s", "TextField"},
-    {t_enum,    "enum",    FALSE, FALSE, "enum",             
-            "!error!",       "Enum",   "Enum", NULL, "CharField"},
-    {t_set,     "set",     FALSE, FALSE, "set",              
-            "unsigned",      "Set",    "Set", NULL, NULL},
-    {t_object,  "object",  FALSE, FALSE, "longblob",         
-            "!error!",       "Object", "Object", NULL, "TextField"},
-    {t_object,  "table",   FALSE, FALSE, "longblob",         
-            "!error!",       "Object", "Object", NULL, "TextField"},
-    {t_simple,  "simple",  FALSE, FALSE, "longblob",         
-            "!error!",       "Simple", "Simple", NULL, "TextField"},
+    {t_double,  "double",  FALSE, FALSE, "double",            "double",        "Double",   "Double",   "%g",   "FloatField"},
+    {t_float,   "float",   FALSE, FALSE, "float",             "float",         "Float",    "Float",    "%g",   "FloatField"},
+    {t_char,    "char",    FALSE, FALSE, "char",              "char",          "Char",     "Char",     "%c",   "CharField"},
+    {t_int,     "int",     FALSE, FALSE, "int",               "int",           "Signed",   "Signed",   "%d",   "IntegerField"},
+    {t_uint,    "uint",    TRUE,  FALSE, "int unsigned",      "unsigned",      "Unsigned", "Unsigned", "%u",   "PositiveIntegerField"},
+    {t_short,   "short",   FALSE, FALSE, "smallint",          "short",         "Short",    "Signed",   "%d",   "SmallIntegerField"},
+    {t_ushort,  "ushort",  TRUE,  FALSE, "smallint unsigned", "unsigned short","Ushort",   "Unsigned", "%u",   "SmallPositiveIntegerField"},
+    {t_byte,    "byte",    FALSE, FALSE, "tinyint",           "signed char",   "Byte",     "Signed",   "%d",   "SmallIntegerField"},
+    {t_ubyte,   "ubyte",   TRUE,  FALSE, "tinyint unsigned",  "unsigned char", "Ubyte",    "Unsigned", "%u",   "SmallPositiveIntegerField"},
+    {t_off,     "bigint",  FALSE, FALSE, "bigint",            "long long",     "LongLong", "LongLong", "%lld", "BigIntegerField"},
+    {t_string,  "string",  FALSE, TRUE,  "varchar(255)",      "char *",        "String",   "String",   "%s",   "CharField"},
+    {t_lstring, "lstring", FALSE, TRUE,  "longblob",          "char *",        "String",   "String",   "%s",   "TextField"},
+    {t_enum,    "enum",    FALSE, FALSE, "enum",              "!error!",       "Enum",     "Enum",     NULL,   "CharField"},
+    {t_set,     "set",     FALSE, FALSE, "set",               "unsigned",      "Set",      "Set",      NULL,   NULL},
+    {t_object,  "object",  FALSE, FALSE, "longblob",          "!error!",       "Object",   "Object",   NULL,   "TextField"},
+    {t_object,  "table",   FALSE, FALSE, "longblob",          "!error!",       "Object",   "Object",   NULL,   "TextField"},
+    {t_simple,  "simple",  FALSE, FALSE, "longblob",          "!error!",       "Simple",   "Simple",   NULL,   "TextField"},
 };
 
 static struct asTypeInfo *findLowType(struct tokenizer *tkz)
 /* Return low type info.  Squawk and die if s doesn't
  * correspond to one. */
 {
 char *s = tkz->string;
 int i;
 for (i=0; i<ArraySize(asTypes); ++i)
     {
     if (sameWord(asTypes[i].name, s))
 	return &asTypes[i];
     }
 tokenizerErrAbort(tkz, "Unknown type '%s'", s);
 return NULL;
 }
 
 static void sqlSymDef(struct asColumn *col, struct dyString *dy)
 /* print symbolic column definition for sql */
 {
 dyStringPrintf(dy, "%s(", col->lowType->sqlName);
 struct slName *val;
 for (val = col->values; val != NULL; val = val->next)
     {
     dyStringPrintf(dy, "\"%s\"", val->name);
     if (val->next != NULL)
         dyStringAppend(dy, ", ");
     }
 dyStringPrintf(dy, ") ");
 }
 
 struct dyString *asColumnToSqlType(struct asColumn *col)
 /* Convert column to a sql type spec in returned dyString */
 {
 struct asTypeInfo *lt = col->lowType;
 struct dyString *type = dyStringNew(32);
 if ((lt->type == t_enum) || (lt->type == t_set))
     sqlSymDef(col, type);
 else if (col->isList || col->isArray)
     dyStringPrintf(type, "longblob");
 else if (lt->type == t_char)
     dyStringPrintf(type, "char(%d)", col->fixedSize ? col->fixedSize : 1);
 else
     dyStringPrintf(type, "%s", lt->sqlName);
 return type;
 }
 
 static struct asColumn *mustFindColumn(struct asObject *table, char *colName)
 /* Return column or die. */
 {
 struct asColumn *col;
 
 for (col = table->columnList; col != NULL; col = col->next)
     {
     if (sameWord(col->name, colName))
 	return col;
     }
 errAbort("Couldn't find column %s", colName);
 return NULL;
 }
 
 static struct asObject *findObType(struct asObject *objList, char *obName)
 /* Find object with given name. */
 {
 struct asObject *obj;
 for (obj = objList; obj != NULL; obj = obj->next)
     {
     if (sameWord(obj->name, obName))
 	return obj;
     }
 return NULL;
 }
 
 static void asParseColArraySpec(struct tokenizer *tkz, struct asObject *obj,
                                 struct asColumn *col)
 /* parse the array length specification for a column */
 {
 if (col->lowType->type == t_simple)
     col->isArray = TRUE;
 else
     col->isList = TRUE;
 tokenizerMustHaveNext(tkz);
 if (isdigit(tkz->string[0]))
     {
     col->fixedSize = atoi(tkz->string);
     tokenizerMustHaveNext(tkz);
     }
 else if (isalpha(tkz->string[0]))
     {
 #ifdef OLD
     if (obj->isSimple)
         tokenizerErrAbort(tkz, "simple objects can't include variable length arrays\n");
 #endif /* OLD */
     col->linkedSizeName = cloneString(tkz->string);
     col->linkedSize = mustFindColumn(obj, col->linkedSizeName);
     col->linkedSize->isSizeLink = TRUE;
     tokenizerMustHaveNext(tkz);
     }
 else
     tokenizerErrAbort(tkz, "must have column name or integer inside []'s\n");
 tokenizerMustMatch(tkz, "]");
 }
 
 static void asParseColSymSpec(struct tokenizer *tkz, struct asObject *obj,
                               struct asColumn *col)
 /* parse the enum or set symbolic values for a column */
 {
 tokenizerMustHaveNext(tkz);
 while (tkz->string[0] != ')')
     {
     slSafeAddHead(&col->values, slNameNew(tkz->string));
     /* look for `,' or `)', but allow `,' after last token */
     tokenizerMustHaveNext(tkz);
     if (!((tkz->string[0] == ',') || (tkz->string[0] == ')')))
         tokenizerErrAbort(tkz, "expected `,' or `)' got `%s'", tkz->string);
     if (tkz->string[0] != ')')
         tokenizerMustHaveNext(tkz);
     }
 tokenizerMustMatch(tkz, ")");
 slReverse(&col->values);
 }
 
 static void asParseColDef(struct tokenizer *tkz, struct asObject *obj)
 /* Parse a column definintion */
 {
 struct asColumn *col;
 AllocVar(col);
 
 col->lowType = findLowType(tkz);
 tokenizerMustHaveNext(tkz);
 
 if (col->lowType->type == t_object || col->lowType->type == t_simple)
     {
     col->obName = cloneString(tkz->string);
     tokenizerMustHaveNext(tkz);
     }
 
 if (tkz->string[0] == '[')
     asParseColArraySpec(tkz, obj, col);
 else if (tkz->string[0] == '(')
     asParseColSymSpec(tkz, obj, col);
 
 col->name = cloneString(tkz->string);
 tokenizerMustHaveNext(tkz);
 tokenizerMustMatch(tkz, ";");
 col->comment = cloneString(tkz->string);
 tokenizerMustHaveNext(tkz);
 if (col->lowType->type == t_char && col->fixedSize != 0)
     col->isList = FALSE;	/* It's not really a list... */
 slAddHead(&obj->columnList, col);
 }
 
 static struct asObject *asParseTableDef(struct tokenizer *tkz)
 /* Parse a table or object definintion */
 {
 struct asObject *obj;
 AllocVar(obj);
 if (sameWord(tkz->string, "table"))
     obj->isTable = TRUE;
 else if (sameWord(tkz->string, "simple"))
     obj->isSimple = TRUE;
 else if (sameWord(tkz->string, "object"))
     ;
 else
     tokenizerErrAbort(tkz, "Expecting 'table' or 'object' got '%s'", tkz->string);
 tokenizerMustHaveNext(tkz);
 obj->name = cloneString(tkz->string);
 tokenizerMustHaveNext(tkz);
 obj->comment = cloneString(tkz->string);
 
 /* parse columns */
 tokenizerMustHaveNext(tkz);
 tokenizerMustMatch(tkz, "(");
 while (tkz->string[0] != ')')
     asParseColDef(tkz, obj);
 slReverse(&obj->columnList);
 return obj;
 }
 
 static void asLinkEmbeddedObjects(struct asObject *obj, struct asObject *objList)
 /* Look up any embedded objects. */
 {
 struct asColumn *col;
 for (col = obj->columnList; col != NULL; col = col->next)
     {
     if (col->obName != NULL)
         {
         if ((col->obType = findObType(objList, col->obName)) == NULL)
             errAbort("%s used but not defined", col->obName);
         if (obj->isSimple)
             {
             if (!col->obType->isSimple)
                 errAbort("Simple object %s with embedded non-simple object %s",
                     obj->name, col->name);
             }
         }
     }
 }
 
 static struct asObject *asParseTokens(struct tokenizer *tkz)
 /* Parse file into a list of objects. */
 {
 struct asObject *objList = NULL;
 struct asObject *obj;
 
 while (tokenizerNext(tkz))
     {
     obj = asParseTableDef(tkz);
     if (findObType(objList, obj->name))
         tokenizerErrAbort(tkz, "Duplicate definition of %s", obj->name);
     slAddTail(&objList, obj);
     }
 
 for (obj = objList; obj != NULL; obj = obj->next)
     asLinkEmbeddedObjects(obj, objList);
 
 return objList;
 }
 
+char *asTypesIntSizeDescription(enum asTypes type)
+/* Return description of integer size.  Do not free. */
+{
+int size = asTypesIntSize(type);
+switch (size)
+    {
+    case 1:
+	return "byte";
+    case 2:
+	return "short integer";
+    case 4:
+	return "integer";
+    case 8:
+	return "long long integer";
+    default:
+        errAbort("Unexpected error in asTypesIntSizeDescription: expecting integer type size of 1, 2, 4, or 8.  Got %d.", size);
+	return NULL; // happy compiler, never gets here
+    
+    }
+}
+
+int asTypesIntSize(enum asTypes type)
+/* Return size in bytes of any integer type - short, long, unsigned, etc. */
+{
+switch (type)
+    {
+    case t_int:
+    case t_uint:
+	return 4;
+    case t_short:
+    case t_ushort:
+	return 2;
+    case t_byte:
+    case t_ubyte:
+	return 1;
+    case t_off:
+	return 8;
+    default:
+        errAbort("Unexpected error in  asTypesIntSize: expecting integer type.  Got %d.", type);
+	return 0; // happy compiler, never gets here
+    }
+}
+
+boolean asTypesIsUnsigned(enum asTypes type)
+/* Return TRUE if it's any integer type - short, long, unsigned, etc. */
+{
+switch (type)
+    {
+    case t_uint:
+    case t_ushort:
+    case t_ubyte:
+       return TRUE;
+    default:
+       return FALSE;
+    }
+}
+
 boolean asTypesIsInt(enum asTypes type)
 /* Return TRUE if it's any integer type - short, long, unsigned, etc. */
 {
 switch (type)
    {
    case t_int:
    case t_uint:
    case t_short:
    case t_ushort:
    case t_byte:
    case t_ubyte:
    case t_off:
        return TRUE;
    default:
        return FALSE;
    }
 }
 
 boolean asTypesIsFloating(enum asTypes type)
 /* Return TRUE if it's any floating point type - float or double. */
 {
 switch (type)
    {
    case t_float:
    case t_double:
        return TRUE;
    default:
        return FALSE;
    }
 }
 
 static struct asObject *asParseLineFile(struct lineFile *lf)
 /* Parse open line file.  Closes lf as a side effect. */
 {
 struct tokenizer *tkz = tokenizerOnLineFile(lf);
 struct asObject *objList = asParseTokens(tkz);
 tokenizerFree(&tkz);
 return objList;
 }
 
 
 void asColumnFree(struct asColumn **pAs)
 /* free a single asColumn */
 {
 struct asColumn *as = *pAs;
 if (as != NULL)
     {
     freeMem(as->name);
     freeMem(as->comment);
     freez(pAs);
     }
 }
 
 
 void asColumnFreeList(struct asColumn **pList)
 /* free a list of asColumn */
 {
 struct asColumn *el, *next;
 
 for (el = *pList; el != NULL; el = next)
     {
     next = el->next;
     asColumnFree(&el);
     }
 *pList = NULL;
 }
 
 void asObjectFree(struct asObject **pAs)
 /* free a single asObject */
 {
 struct asObject *as = *pAs;
 if (as != NULL)
     {
     freeMem(as->name);
     freeMem(as->comment);
     asColumnFreeList(&as->columnList);
     freez(pAs);
     }
 }
 
 
 void asObjectFreeList(struct asObject **pList)
 /* free a list of asObject */
 {
 struct asObject *el, *next;
 
 for (el = *pList; el != NULL; el = next)
     {
     next = el->next;
     asObjectFree(&el);
     }
 *pList = NULL;
 }
 
 struct asObject *asParseFile(char *fileName)
 /* Parse autoSql .as file. */
 {
 return asParseLineFile(lineFileOpen(fileName, TRUE));
 }
 
 
 struct asObject *asParseText(char *text)
 /* Parse autoSql from text (as opposed to file). */
 {
 char *dupe = cloneString(text);
 struct lineFile *lf = lineFileOnString("text", TRUE, dupe);
 struct asObject *objList = asParseLineFile(lf);
 freez(&dupe);
 return objList;
 }
 
+
+boolean asCompareObjs(char *name1, struct asObject *as1, char *name2, struct asObject *as2, int numColumnsToCheck,
+ int *retNumColumnsSame, boolean abortOnDifference)
+/* Compare as-objects as1 and as2 making sure several important fields show they are the same name and type.
+ * If difference found, print it to stderr.  If abortOnDifference, errAbort.
+ * Othewise, return TRUE if the objects columns match through the first numColumnsToCheck fields. 
+ * If retNumColumnsSame is not NULL, then it will be set to the number of contiguous matching columns. */
+{
+boolean differencesFound = FALSE;
+struct asColumn *col1 = as1->columnList, *col2 = as2->columnList;
+int checkCount = 0;
+int verboseLevel = 2;
+if (abortOnDifference)
+    verboseLevel = 1;
+if (as1->isTable != as2->isTable)
+    {
+    verbose(verboseLevel,"isTable does not match: %s=[%d]  %s=[%d]", name1, as1->isTable, name2, as2->isTable);
+    differencesFound = TRUE;
+    }
+else if (as1->isSimple != as2->isSimple)
+    {
+    verbose(verboseLevel,"isSimple does not match: %s=[%d]  %s=[%d]", name1, as1->isSimple, name2, as2->isSimple);
+    differencesFound = TRUE;
+    }
+else
+    {
+    if (!as1->isTable)
+	{
+	errAbort("asCompareObjLists only supports Table .as objects at this time.");
+	}
+    for (col1 = as1->columnList, col2 = as2->columnList; 
+	 col1 != NULL && col2 != NULL && checkCount < numColumnsToCheck; 
+	 col1 = col1->next, col2 = col2->next, ++checkCount)
+	{
+	if (!sameOk(col1->name, col2->name))
+	    {
+	    verbose(verboseLevel,"column #%d names do not match: %s=[%s]  %s=[%s]\n"
+		, checkCount+1, name1, col1->name, name2, col2->name);
+	    differencesFound = TRUE;
+	    break;
+	    }
+	else if (col1->isSizeLink != col2->isSizeLink)
+	    {
+	    verbose(verboseLevel,"column #%d isSizeLink do not match: %s=[%d]  %s=[%d]\n"
+		, checkCount+1, name1, col1->isSizeLink, name2, col2->isSizeLink);
+	    differencesFound = TRUE;
+	    break;
+	    }
+	else if (col1->isList != col2->isList)
+	    {
+	    verbose(verboseLevel,"column #%d isList do not match: %s=[%d]  %s=[%d]\n"
+		, checkCount+1, name1, col1->isList, name2, col2->isList);
+	    differencesFound = TRUE;
+	    break;
+	    }
+	else if (col1->isArray != col2->isArray)
+	    {
+	    verbose(verboseLevel,"column #%d isArray do not match: %s=[%d]  %s=[%d]\n"
+		, checkCount+1, name1, col1->isArray, name2, col2->isArray);
+	    differencesFound = TRUE;
+	    break;
+	    }
+	else if (!sameOk(col1->lowType->name, col2->lowType->name))
+	    {
+	    verbose(verboseLevel,"column #%d type names do not match: %s=[%s]  %s=[%s]\n"
+		, checkCount+1, name1, col1->lowType->name, name2, col2->lowType->name);
+	    differencesFound = TRUE;
+	    break;
+	    }
+	else if (col1->fixedSize != col2->fixedSize)
+	    {
+	    verbose(verboseLevel,"column #%d fixedSize do not match: %s=[%d]  %s=[%d]\n"
+		, checkCount+1, name1, col1->fixedSize, name2, col2->fixedSize);
+	    differencesFound = TRUE;
+	    break;
+	    }
+	else if (!sameOk(col1->linkedSizeName, col2->linkedSizeName))
+	    {
+	    verbose(verboseLevel,"column #%d linkedSizeName do not match: %s=[%s]  %s=[%s]\n"
+		, checkCount+1, name1, col1->linkedSizeName, name2, col2->linkedSizeName);
+	    differencesFound = TRUE;
+	    break;
+	    }
+	}
+    if (!differencesFound && checkCount < numColumnsToCheck)
+	errAbort("Unexpected error in asCompareObjLists: asked to compare %d columns in %s and %s, but only found %d in one or both asObjects."
+	    , numColumnsToCheck, name1, name2, checkCount);
+    }
+if (differencesFound)
+    {
+    if (abortOnDifference)
+    	errAbort("asObjects differ.");
+    else
+    	verbose(verboseLevel,"asObjects differ. Matching field count=%d\n", checkCount);
+    }
+if (retNumColumnsSame)
+    *retNumColumnsSame = checkCount;
+return (!differencesFound);
+}