d8feabb353b3c2650facea4afd08c86bb56e5549 kent Fri Apr 6 16:51:54 2012 -0700 Moving autoSql and autoDtd and autoXml back to just under hg. A little autoSql -django fix. diff --git src/utils/autoXml/autoXml.doc src/utils/autoXml/autoXml.doc deleted file mode 100644 index 5d698b1..0000000 --- src/utils/autoXml/autoXml.doc +++ /dev/null @@ -1,190 +0,0 @@ -AUTOXML OVERVIEW - -AutoXML generates C code for an XML parser given -a XML DTD file. It will generate a structure -for each 'element' in the DTD, and populate the -structure with fields for each attribute of the structure. -By default it will generate a parser that ignores -elements and attributes not in the DTD, but otherwise -is a 'validating' parser. If you use the -picky -flag it will be fully validating. - -The AutoXML parser will load the entire file into -memory. If this is a problem you'll have to resort -to the lower level 'xap' parser, which is much like -the commonly used 'expat' parser, but a bit faster. - -A SHORT XML AND DTD TUTORIAL - -If you find yourself befuddled by all the acronyms -so far you're probably new to XML. Here's a brief -description. XML stands for eXtensible Markup Language. -It's a tag-based format. A simple example of -an XML doc might be: - <POLYGON id="square"> - <DESCRIPTION>This is soooo square man</DESCRIPTION> - <POINT x="0" y="0" -> - <POINT x="0" y="1" -> - <POINT x="1" y="1" -> - <POINT x="1" y="0" -> - </POLYGON> -Everything in XML lives between <TAG></TAG> pairs. -A tag may have associated text, attributes and subtags. -In the example above POLYGON has the subtags DESCRIPTION -and POINT, the attribute id, and no text. DESCRIPTON has -the text "This is soooo square man" and no subtags or -attributes. POINT has the attributes x and y. POINT -also illustrates a little XML shortcut - tags containing -only attributes can be written <TAG att="something" -> -as a shortcut for <TAG att="something"></TAG>. - -XML is much like HTML, but has significant differences as -well. All attributes must be enclosed in quotes in XML, -while quotes are optional in HTML. Tags must strictly -nest in XML, while HTML allows tags to be opened but not -closed. The tags in HTML are predefined. In XML the -definition of tags is up to you. - -Tags can be defined two ways in XML - by a DTD file or -by an XML schema. There are pros and cons of each -method. DTD files are relatively simple, and are recognized -by a wide variety of parsers and XML browsers. On the -other hand DTD files can't express that a certain attribute -has to be numerical. XML schemas are more complex. They -are themselves written in a type of XML, which is nice in -some ways. They are not as widely supported yet. Currently -autoXml only works with DTD files with some modest extensions. - -Here is an DTD file which would describe the POLYGON format -above: - -<!ELEMENT POLYGON (DESCRIPTION? POINT+)> -<!ATTLIST POLYGON id CDATA #REQUIRED> - -<!ELEMENT DESCRIPTION (#PCDATA)> - -<!ELEMENT POINT> -<!ATTLIST POINT x CDATA #REQUIRED> -<!ATTLIST POINT y CDATA #REQUIRED> -<!ATTLIST POINT z CDATA "0"> - -The DTD has two major types of definitions - ELEMENTs and ATTLISTs -(or attributes). An element definition includes the name of -the element and an optional parethesized list of subelements. -The subelements must be defined elsewhere in the DTD with the -exception of the #PCDATA subelement, which is used to indicate -that the element can have text between it's tags. Each subelement -may be followed by one of the following characters: - ? - the subelement is optional - + - the subelement occurs at least once - * - the subelement occurs 0 or more times -if there is no following character the subelement occurs exactly -once. - -The ATTLIST defines an attribute and associates it with an element. -It is good style to keep ATTLISTs together with their ELEMENT. -Here are the fields in an ATTLIST: - element - name of element this is associated with - name - name of this attribute - type - generally CDATA. Can be a reference or date, but these - are not supported by autoXml. - default - this contains a default value to be used if the attribute - is not present. The keyword #REQUIRED in this field means - that the attribute must be present. The keyword #IMPLIED - means that it's ok for this attribute to be missing (in which - case it will have a NULL or zero value after it is read by - autoXML). - -There's a third type of tag in a DTD file, the ENTITY. This tag is -used more or less as a macro definition. An example of a ENTITY is - <!ENTITY % address "street,city,state,zip"> -After this ENTITY is defined you can type %address; with the same effect -as typing street,city,state,zip. - - -AUTOXML EXTENSIONS AND LIMITS - -One disadvantage of DTDs is that all types are strings. This is not -convenient for a language like C where numerical and string types are -handled very differently, and indeed where numerical types can be handled -much more effiently than string types. To get around this we make use -of some predefined XML entities. In an ATTLIST you can use the -entities %INT; and %FLOAT; which will map to C int and double types. -Instead of #PCDATA you can use the entities %INTEGER; and %REAL; -for the same effect. The %INTEGER; and %REAL; entities are used by -NCBI as well as UCSC. As far as I can tell NCBI doesn't have definitions -for numerical attributes. - -Currently AutoXML can't handle external DTDs or DTDs that reference -other DTDs. - - -AUTOXML CODE GENERATION - -The polygon.dtd file here: - -<!ELEMENT POLYGON (DESCRIPTION? POINT+)> -<!ATTLIST POLYGON id CDATA #REQUIRED> -<!ELEMENT DESCRIPTION (#PCDATA)> -<!ELEMENT POINT> -<!ATTLIST POINT x %FLOAT; #REQUIRED> -<!ATTLIST POINT y %FLOAT; #REQUIRED> -<!ATTLIST POINT z %FLOAT; "0"> - -and the command line: - autoXml polygon.dtdx poly - -Generates poly.h as follows - -/* poly.h autoXml generated file */ -#ifndef POLY_H -#define POLY_H - -struct polyPolygon - { - struct polyPolygon *next; - char *id; /* Required */ - struct polyDescription *polyDescription; /** Optional (may be NULL). **/ - struct polyPoint *polyPoint; /** Non-empty list required. **/ - }; - -void polyPolygonSave(struct polyPolygon *obj, int indent, FILE *f); -/* Save polyPolygon to file. */ - -struct polyPolygon *polyPolygonLoad(char *fileName); -/* Load polyPolygon from file. */ - -struct polyDescription - { - struct polyDescription *next; - char *text; - }; - -void polyDescriptionSave(struct polyDescription *obj, int indent, FILE *f); -/* Save polyDescription to file. */ - -struct polyDescription *polyDescriptionLoad(char *fileName); -/* Load polyDescription from file. */ - -struct polyPoint - { - struct polyPoint *next; - double x; /* Required */ - double y; /* Required */ - double z; /* Defaults to 0 */ - }; - -void polyPointSave(struct polyPoint *obj, int indent, FILE *f); -/* Save polyPoint to file. */ - -struct polyPoint *polyPointLoad(char *fileName); -/* Load polyPoint from file. */ - -#endif /* POLY_H */ - -It generates a corresponding .c file as well. Each XML file has -to have a root object. In this case the root object is POLYGON -(our DTD as is won't let us have more than one polygon per file). -You can read an XML file that respects this DTD using the -polyPolygonLoad() function, and save it back out using the -polyPolygonSave.