src/README 056e69e891aeedd0c826cb2652ec8535955977bb

056e69e891aeedd0c826cb2652ec8535955977bb
kent
  Fri Apr 6 16:18:00 2012 -0700
Adding more information about source tree structure and revisiting list of useful library modules.
diff --git src/README src/README
index d1aec86..68e614f 100644
--- src/README
+++ src/README
@@ -1,51 +1,69 @@
-MAJOR MODULES
-
-Here is a list of some of the more useful modules in
-the library.  Unless noted the module is a .h file
-in the inc directory and a .c file in the lib
-directory.
+This file describes the most commonly used modules,
+and the code conventions used throughout the code base in
+and under this directory.
+
+SOURCE TREE ORGANIZATION
+
+The major source subdirectories of this source code are:
+o - lib - General purpose library routines, some with a biological bent,
+    many just generally useful for computing.  
+o - inc - Interfaces to the library modules.
+o - utils - Command line utility programs. Like the library a mix of
+    bioinformatically motivated, and general purpose.  
+o - hg - Stuff developed for the Human Genome Project and it's successors.
+    Much of the code in this directory requires MySQL.
+o - hg/lib - Human Genome Project specific libraries.
+o - hg/inc - Interfaces to the same libraries
+o - hg/hgTracks - The part of the UCSC Genome Browser that displays 
+    annotation tracks graphically.
+o - hg/hgc - The part of the Genome Browser that responds to a click
+    on an item in a track.
+o - hg/hgTrackUi - The part of the Genome Browser that allows users to configure
+    a particular track.
+o - hg/hgTables - The UCSC Table Browser
+o - jkOwnLib - Libraries that support blat, isPcr, gfClient, gfServer. 
+In general each program, either command line, or web CGI based, has its source in
+a different subdirectory.  For simple programs, like what is in utils, these often
+just have a single C module that is linked with the libraries.  For more complex
+programs, such as the hgTracks CGI, there may be multiple C source modules in the dir.
+
+COMMONLY USED LIBRARY MODULES
 
 o - common  - String handling, singly-linked list handling. 
     Other basic stuff every other module uses.
 o - hash - Simple but effective hash table routines.
 o - linefile - Line oriented file input, on some systems
     much faster than fgets().
+o - dystring - Dynamically sized strings in C.
 o - cheapcgi - Parses out cgi variables for scripts called
     from web pages.
 o - htmshell - Helps generate HTML output for scripts that
     are called from web pages or just want to make web
     pages.
+o - htmlPage - Read html pages, programatically submit html forms.
 o - memgfx - Creates a 256 color image in memory which
     can be drawn on, then saved as a .GIF file which
     can be encorperated into a web page.
-o - fuzzyFind - Align two pieces of DNA that are 
-    relatively similar (~80% base identity or better).
-    Works best when one sequence is less than 30,000
-    bases and the other less than 100,000 bases.
-o - patSpace and supStitch - Align longer pieces of
-    DNA.
-o - xensmall - Align two small pieces of dissimilar DNA.
-    (7 State Pairwise HMM)
-o - xenbig - Align two large pieces of dissimilar DNA.
-o - jksql - Interface to mySQL that frees resources on
-    exit and error conditions.
 o - dnautils and dnaseq - Simple utilities on DNA.
 o - fa - Read/write fasta format files.
-o - serv* and port* - Adapt the code to the peculiarities of
-    various web servers.
-
+o - basicBed - Functions for working with BED format files.
+o - psl - Functions for working with PSL (blat) format files.
+o - twoBit - Functions for working with twoBit DNA files.
+o - bPlusTree - Create/user B+ Tree indexes, the backbone of 
+    many databases.
+o - udc - URL Data Cache - code to locally cache remote files.
 
 CODE CONVENTIONS
 
 INDENTATION AND SPACING:
 
 The code follows an indentation convention that is a bit
 unusual for C.  Opening and closing braces are on
 a line by themselves and are indented at the same
 level as the block they enclose:
     if (someTest)
 	{
 	doSomething();
 	doSomethingElse();
 	}
 Each block of code is indented by 4 from the previous block.
@@ -73,33 +91,32 @@
 can be avoided by simplifying logic and by moving blocks into their own 
 functions. These are just some ways of avoiding long lines.
 
 NAMES
 
 Symbol names generally begin with a lower-case letter.  The second 
 and subsequent words in a name begin with a capital letter 
 to help visually separate the words.  Abbreviation of words 
 is strongly discouraged.  Words of five letters and less should
 generally not be abbreviated. If a word is abbreviated in 
 general it is abbreviated to the first three letters:
    tabSeparatedFile -> tabSepFile
 In some cases, for local variables abbreviating
 to a single letter for each word is ok:
    tabSeparatedFile -> tsf
-In rare, complex, cases you may treat the
-abbreviation itself as a word, and only the
-first letter is capitalized.
+In complex cases you may treat the abbreviation itself as a word, and 
+only the first letter is capitalized.
    genscanTabSeparatedFile -> genscanTsf
 Numbers are considered words.  You would
 represent "chromosome 22 annotations"
 as "chromosome22Annotations" or "chr22Ann."
 Note the capitalized 'A" after the 22.  Since both numbers and
 single letter words (or abbreviations) disrupt the visual flow
 of the word separation by capitalization, it is better to avoid
 these except at the end of the name.
 
 These naming rules apply to variables, constants, functions, fields,
 and structures.  They generally are used for file names, database tables,
 database columns, and C macros as well, though there is a bit less
 consistency there in the existing code base.
 
 ERROR HANDLING AND MEMORY ALLOCATION