src/hg/autoSql/autoSql.doc aeb54d14c416a7b9f87ae8b4e34866021fcf8fd5

aeb54d14c416a7b9f87ae8b4e34866021fcf8fd5
kent
  Fri Apr 19 16:41:40 2013 -0700
Correcting English a little.
diff --git src/hg/autoSql/autoSql.doc src/hg/autoSql/autoSql.doc
index 43109d6..364d72b 100644
--- src/hg/autoSql/autoSql.doc
+++ src/hg/autoSql/autoSql.doc
@@ -1,346 +1,346 @@
 OVERVIEW
 
 AutoSQL is a program which automatically generates SQL and C
 code for saving and loading objects to a database based on
 an object specification.  The specification langauge is
 a bit quirky, but it has proven effective for many jobs.
 
 A SIMPLE EXAMPLE
 
 Imagine a very simple address book that just stored name,
 street address, zip-code and state.  A specification for
 this would be:
 
 table addressBook
 "A simple address book"
     (
     string name;  "Name - first or last or both, we don't care"
     lstring address;  "Street address"
     string city;  "City"
     uint zipCode;  "A zip code is always positive, so can be unsigned"
     char[2] state;  "Just store the abbreviation for the state"
     )
 
 This looks like a bit of a hybrid between a C structure and a
 SQL table definition. This is because I was switching between C and
 SQL when I made the AutoSQL language. My mind is incapable 
 of holding three things in it at once. When it tries to hold 
 two things, it mixes them up!
 
 This produces the SQL table definition:
 
     #A simple address book
     CREATE TABLE addressBook (
 	name varchar(255) not null,	# Name - first or last or both, we don't care
 	address longblob not null,	# Street address
 	city varchar(255) not null,	# City
 	zipCode int unsigned not null,	# A zip code is always positive, so can be unsigned
 	state char(2) not null,	# Just store the abbreviation for the state
 		  #Indices
 	PRIMARY KEY(name)
     );
 
 and the following C structure definition:
 
 struct addressBook
 /* A simple address book */
     {
     struct addressBook *next;  /* Next in singly linked list. */
     char *name;	/* Name - first or last or both, we don't care */
     char *address;	/* Street address */
     char *city;	/* City */
     unsigned zipCode;	/* A zip code is always positive, so can be unsigned */
     char state[3];	/* Just store the abbreviation for the state */
     };
 
 Typically in C you access a single row of a SQL database at
 a time.  The row is returned as an array of strings.  It is
 up to the C program to convert the ascii representation of
 numbers to binary numbers.  This is not hard work, but after
 you've typed in twenty or thirty lines that look something
 like:
       point->x = atoi(row[1]);
       point->y = atoi(row[2]);
 you'll appreciate the following two routines that AutoSQL
 generates for you:
 
     void addressBookStaticLoad(char **row, struct addressBook *ret);
     /* Load a row from addressBook table into ret.  The contents of ret will
      * be replaced at the next call to this function. */
 
     struct addressBook *addressBookLoad(char **row);
     /* Load a addressBook from row fetched with select * from addressBook
      * from database.  Dispose of this with addressBookFree(). */
 
 The first routine is typically used when you just want to process one
 item at a time.  It doesn't allocate any dynamic memory, and so is quite
 fast.  The second routine saves the structure to dynamic memory.  Since
 the C structure always includes a "next" field, you easily use this
 routine to build a list of address book entries.
 
 The only problem with using dynamic memory, is that you have to remember
 to free it.  While AutoSQL can't remember to free things for you, it
 can generate routines to free a single dynamically allocated structure,
 or a list of dynamically allocates structures.  That's what the next
 two routines do:
 
     void addressBookFree(struct addressBook **pEl);
     /* Free a single dynamically allocated addressBook such as created
      * with addressBookLoad(). */
 
     void addressBookFreeList(struct addressBook **pList);
     /* Free a list of dynamically allocated addressBook's */
 
 Reading structures without having to write code to load them up a
 field at a time is nice, but sometimes you need to write structures
 too.  AutoSQL assumes that you'll either want to save the structure
 in a tab-delimited or in a comma-delimited format.  It generates a
 routine that can do either:
 
     void addressBookOutput(struct addressBook *el, FILE *f, char sep, char lastSep);
     /* Print out addressBook.  Separate fields with sep. Follow last field with lastSep. */
 
 and macros that make it convenient to do commas or tabs:
 
     #define addressBookTabOut(el,f) addressBookOutput(el,f,'\t','\n');
     /* Print out addressBook as a line in a tab-separated file. */
 
     #define addressBookCommaOut(el,f) addressBookOutput(el,f,',',',');
     /* Print out addressBook as a comma separated list including final comma. */
 
 The last routine AutoSQL generates reads comma separated lists.  While you
 are unlikely to call this routine directly yourself,  fields more complicated
 than simple strings or integers get saved in the database as comma separated
 lists.  This routine allows AutoSQL to have objects that contain other objects.
 
 TYPES OF OBJECTS
 
 AutoSQL has three types of objects:  
 	object - objects that appear in lists.  A next
 	         pointer is automatically inserted as the first field in
 		 the C structure corresponding to an object.  What look
 		 like arrays in the .as file will be linked lists in memory.
 	simple - objects that don't appear in lists.  Arrays in the .as file
 	         will also be arrays in memory.  At one point simple objects
 		 could not include strings or variable-sized arrays, but this
 		 limitation has been lifted.
 	table  - like objects, but the program generates a SQL as well
 	         as a C definition.
 
 Simple objects differ from other objects in how the program treats
 array declarations.  In the field declaration:
        simple point[3] triangle;  "A three sided figure"
 the three points are stored in memory as a C array.  If this were
 declared instead as
        object point[3] triangle;  "A three sided figure"
 the three points would be stored in memory as a singly linked list.
 
 TYPES OF FIELDS
 
 The following basic field are supported.  
     int - 32 bit signed integer
     uint - 32 bit unsigned integer
     short - 16 bit signed integer
     ushort - 16 bit unsigned integer
     byte - 8 bit signed integer
     ubyte - 8 bit unsigned integer
     bigint - 64 bit integer
     float - single precision IEEE floating point
     char - 8 bit character (can only be used in an array)
     string - variable length string up to 255 bytes long.
     lstring - variable length string up to 2 billion bytes long.
     enum - enumerated type that can have a single symbolic value.
     set - set type column that can have multiple symbolic value.
 
 In addition the simple, object, and table types can be used as
 fields.
 
 FIXED LENGTH AND VARIABLE ARRAY DECLARATIONS
 
 An array can be declared as either fixed size or variable size.  
 A variable sized array is declared by putting a field name
 inside of the brackets in the array declaration.  This field
 must be defined before the array.   
 
 A MORE COMPLICATED EXAMPLE
 
 Imagine that you've just built an amazing 3D modeling program.
 The only problem is that now you need to save the structures in
 a database.  Here is a way you might build the database with
 AutoSQL:
 
 simple point
 "A three dimensional point"
     (
     float x;  "Horizontal coordinate"
     float y;  "Vertical coordinate"
     float z;  "In/out of screen coordinate"
     )
 
 simple color
 "A red/green/blue format color"
     (
     ubyte red;  "Red value 0-255"
     ubyte green; "Green value 0-255"
     ubyte blue;  "Blue value 0-255"
     )
 
 object face
 "A face of a three dimensional solid"
     (
     simple color color;  "Color of this face"
     int pointCount;    "Number of points in this polygon"
     uint[pointCount] points;   "Indices of points that make up face in polyhedron point array"
     )
 
 table polyhedron
 "A solid three dimensional object"
     (
     int faceCount;  "Number of faces"
     object face[faceCount] faces; "List of faces"
     int pointCount; "Number of points"
     simple point[pointCount] points; "Array of points"
     )
 
 Saving this as "threeD.as"  and running:
    autoSql threeD.as threeD
 would end up generating 393 lines of bug free (I think!) C code and
 14 lines of SQL for the investment of 33 lines of specification.
 Considering that autoSQL itself is about 1200 lines long, I'll only
 have to use it on 4 projects to break even!  For you on the other
 hand it's a very good deal indeed....
 
 ENUM AND SET FIELDS:
 
 The enum and set fields generate SQL enum and set fields, which provide
 efficient symbolic values.  The symbolic values must be a valid C
 identifier.  When loaded from a string, the set value is a comma separated
 string of the values in the set.  C enums are generated for the values defined
 in the fields.  For example:
 
 table symbolCols
 "example of enum and set symbolic columns"
     (
     int id;                                          "unique id"
     enum(male, female) sex;                          "enumerated column"
     set(cProg,javaProg,pythonProg,awkProg) skills;   "set column"
     )
 
 Generates the following types and fields:
 enum symTestSex
     {
     symTestMale = 0,
     symTestFemale = 1,
     };
 enum symTestSkills
     {
     symTestCProg = 0x0001,
     symTestJavaProg = 0x0002,
     symTestPythonProg = 0x0004,
     symTestAwkProg = 0x0008,
     };
 struct symTest
 /* test of enum and set symbolic columns */
     {
     struct symTest *next;  /* Next in singly linked list. */
     int id;	/* unique id */
     enum symTestSex sex;	/* enumerated column */
     unsigned skills;	/* set column */
     };
 
 INDEXES and AUTOINCREMENT
 
 You can explicitly add indexes to fields in AutoSQL.  If you don't explicitly add an index
 it will assume the first field is the primary key and index it as such.  To add an index
 include one of the keywords 'primary' (for primary key) 'unique' (for an index on a field
 where all values are unique) or 'index' (for fields that may contain duplicate values).  It is 
 possible to just index the first part of a field by including the number of characters.  
 
 An integer (signed or unsigned, small or large) can be make an auto-increment column,  where
 the database automatically assigns an incrementing numerical value to it, by including the
 'auto' key word after the index if any.
 
-Some examples with indexes and autoincrement:
+An example with indexes and autoincrement:
 
 table addressBook
 "A simple address book"
     (
     uint id primary auto; "Autoincrementing primary key for this record."
     string name unique;  "Name - first or last or both, we don't care, except it must be unique"
     lstring address;  "Street address"
     string city index[12];  "City - indexing just first 12 character"
     uint zipCode index;  "A zip code is always positive, so can be unsigned"
     char[2] state index;  "Just store the abbreviation for the state"
     )
 
 
 GRAMMER
 
 The grammer for AutoSQL is:
 
 declarationList:   
 	declaration
         declarationList declaration
 	
 declaration:  
 	declareType declareName comment '(' fieldList ')'
 
 declareType:  
 	'simple'
 	'object'
 	'table'
 
 declareName:  
 	name 
         name indexType
         name 'auto'
         name indexType 'auto'
 
 indexType:
         'primary'
         'index'
         'unique'
 
 comment: 
 	quotedString
 
 fieldList:
 	field
 	fieldList field
 
 field:
         fieldType fieldName ';' comment
         fieldType '[' fieldSize ']'  name ';' comment
         fieldType '(' fieldValues ')'  name ';' comment
 
 fieldName:
 	name
 
 fieldValues:
         name, ..., name
 
 fieldType:
         'int'
 	'uint'
 	'short'
 	'ushort'
 	'byte'
 	'ubyte'
 	'float'
 	'char'
 	'string'
 	'lstring'
 	'enum'
 	'set'
 	declareType declareName
 
 fieldSize:
 	number
 	fieldName
 
 name:  A series of letters, digits, or '_', starting with a letter.
 
 number: A series of digits
 
 quotedString: Any text that is surrounded by double quotes