src/README 1.23
1.23 2010/05/05 23:34:03 kent
Getting explicit about not abbreviating 5 letter words.
Index: src/README
===================================================================
RCS file: /projects/compbio/cvsroot/kent/src/README,v
retrieving revision 1.22
retrieving revision 1.23
diff -b -B -U 1000000 -r1.22 -r1.23
--- src/README 8 Jul 2008 21:23:25 -0000 1.22
+++ src/README 5 May 2010 23:34:03 -0000 1.23
@@ -1,324 +1,324 @@
CONTENTS AND COPYRIGHT
This directory contains the entire source tree for Jim Kent and the
UCSC Genome Bioinformatics Group's suite of biological analysis
and web display programs. All files are copyrighted, but license
is hereby granted for personal, academic, and non-profit use.
A license is also granted for the contents of the top level lib
directory for commercial users. Commercial users should contact
kent@soe.ucsc.edu for access to other modules.
Most users will only be interested in the inc and lib
directories, which contain the interfaces and implementations
to the library routines, and in a few specific applications.
The applications are scattered in other directories.
Many of them are web based. A few of them expect
the MySQL database to be around.
GENERAL INSTALL INSTRUCTIONS
1. Get the code. The best way to do this now for
Unix users is via CVS following the instructions at:
http://genome.ucsc.edu/admin/cvs.html
Or, fetch the entire source in a single file:
http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
Note futher documentation for the build process in your
unpacked source tree in src/product/README.*
Especially note README.building.source and the "Known problems"
for typical situations you may encounter.
2. Check that the environment variable MACHTYPE
exists on your system. It should exist on Unix/Linux.
(And making this on non-Unix systems is beyond
the scope of this README). The default MACHTYPE is often a
long string: "i386-redhat-linux-gnu"
which will not function correctly in this build environment.
It needs to be something simple such as one of:
i386 i686 sparc alpha x86_64 ppc etc ...
with no other alpha characters such as: -
To determine what your system reports itself as, try the
uname options: 'uname -m' or 'uname -p' or 'uname -a'
on your command line. If necessary set this environment variable.
Do this under the bash shell as so:
MACHTYPE=something
export MACHTYPE
or under tcsh as so:
setenv MACHTYPE something
and place this setting in your home directory .bashrc or .tcshrc
environment files so it will be set properly the next time you
login. Remember to "export" it as show here for the bash shell.
3. Make the directory ~/bin/$MACHTYPE which is
where the (non-web) executables will go.
Add this directory to your path.
4. Go to the jksrc/lib directory. If it doesn't
already exist do a mkdir $MACHTYPE.
5. Type make. On Alphas there will be
some warning messages about "crudeAli.c"
otherwise it should compile cleanly.
It's using gcc.
6. Go to jksrc/jkOwnLib and type make.
7. Go to the application you want to make and type
make. (If you're not sure, as a simple test
go to jksrc/utils/fixcr and type make,
then 'rehash' if necessary so your shell
can find the fixcr program in ~/bin/$(MACHTYPE).
The fixcr program changes Microsoft style
<CR><LF> line terminations to Unix style
<LF> terminations. Look at the "gotCr.c"
file in the fixCr directory, and then
do a "fixcr gotCr.c" on it.
INSTALL INSTRUCTIONS FOR BLAT
1. Follow the general install instructions above.
2. If you're on an alpha system do a:
setenv SOCKETLIB -lxnet
on Solaris do
setenv SOCKETLIB "-lsocket -lnsl"
on SunOS do
setenv SOCKETLIB "-lsocket -lnsl -lresolv"
on Linux you can skip this step.
3. Execute make in each of the following directories:
jksrc/gfServer
jksrc/gfClient
jksrc/blat
jksrc/utils/faToNib
INSTALL INSTRUCTIONS FOR CODE USING THE BROWSER DATABASE
(and other code in the jkSrc/hg subdirectory)
1. Follow the general install instructions above.
2. Make the environment variable MYSQLINC point to
where MySQL's include files are. (On my
system they are at /usr/include/mysql.)
While you're at it set the MYSQLLIBS
variable to point to something like
/usr/lib/mysql/libmysqlclient.a -lz
When available, the commands: mysql_config --include
and mysql_config --libs
will display the required arguments for these environment settings.
3. Execute make in jksrc/hg/lib
4. Execute make in the directory containing the
application you wish to build.
5. See also: http://genome.ucsc.edu/admin/jk-install.html
and more documentation in this source tree about setting up
a working browser in README files:
jksrc/product/README.building.source
jksrc/product/README.cvs.access
jksrc/product/README.mysql.setup
jksrc/product/README.install
jksrc/product/README.trackDb
jksrc/hg/makeDb/trackDb/README
There are numerous README files in the source tree describing
functions or modules in that area of the source tree.
MAJOR MODULES
Here is a list of some of the more useful modules in
the library. Unless noted the module is a .h file
in the inc directory and a .c file in the lib
directory.
o - common - String handling, singly-linked list handling.
Other basic stuff every other module uses.
o - hash - Simple but effective hash table routines.
o - linefile - Line oriented file input, on some systems
much faster than fgets().
o - cheapcgi - Parses out cgi variables for scripts called
from web pages.
o - htmshell - Helps generate HTML output for scripts that
are called from web pages or just want to make web
pages.
o - memgfx - Creates a 256 color image in memory which
can be drawn on, then saved as a .GIF file which
can be encorperated into a web page.
o - fuzzyFind - Align two pieces of DNA that are
relatively similar (~80% base identity or better).
Works best when one sequence is less than 30,000
bases and the other less than 100,000 bases.
o - patSpace and supStitch - Align longer pieces of
DNA.
o - xensmall - Align two small pieces of dissimilar DNA.
(7 State Pairwise HMM)
o - xenbig - Align two large pieces of dissimilar DNA.
o - jksql - Interface to mySQL that frees resources on
exit and error conditions.
o - dnautils and dnaseq - Simple utilities on DNA.
o - fa - Read/write fasta format files.
o - serv* and port* - Adapt the code to the peculiarities of
various web servers.
CODE CONVENTIONS
INDENTATION:
The code follows an indentation convention that is a bit
unusual for C. Opening and closing braces are on
a line by themselves and are indented at the same
level as the block they enclose:
if (someTest)
{
doSomething();
doSomethingElse();
}
Tab stops are set to 8. Each block of code is
indented by 4 from the previous block. (In the
vi editor set ts=8 set sw=4)
NAMES
Symbol names begin with a lower-case letter. The second
and subsequent words in a name begin with a capital letter
to help visually separate the words. Abbreviation of words
-is strongly discouraged. If a word is abbreviated in
+is strongly discouraged. Words of five letters and less should
+generally not be abbreviated. If a word is abbreviated in
general it is abbreviated to the first three letters:
tabSeparatedFile -> tabSepFile
In some cases, for local variables abbreviating
to a single letter for each word is ok:
tabSeparatedFile -> tsf
In rare, complex, cases you may treat the
abbreviation itself as a word, and only the
first letter is capitalized.
genscanTabSeparatedFile -> genscanTsf
Numbers are considered words. You would
represent "chromosome 22 annotations"
as "chromosome22Annotations" or "chr22Ann."
Note the capitalized 'A" after the 22.
These naming rules apply to variables, constants, functions, fields,
and structures. They generally are used for file names, database tables,
database columns, and C macros as well, though there is a bit less
consistency there in the existing code base.
-
ERROR HANDLING AND MEMORY ALLOCATION
Another convention is that errors are reported
at a fairly low level, and the programs simply
print an error message and abort. If you
need to catch errors underneath you see the
file errAbort.h and install an "abort handler".
Memory is generally allocated through "needMem"
(which aborts on failure to allocate) and the
macros "AllocVar" and "AllocArray". This
memory is initially set to zero, and the programs
very much depend on this fact.
COMMENTING
Every module should have a comment at the start of
a file that explains concisely what the module
does. Explanations of algorithms also belong
at the top of the file in most cases. Comments should
be of the /* */ form rather than the // form, which
is not yet portable across all C compilers in all platforms.
Structures should be commented following the pattern of this
example:
struct dyString
/* Dynamically resizable string that you can do formatted
* output to. */
{
struct dyString *next; /* Next in list. */
char *string; /* Current buffer. */
int bufSize; /* Size of buffer. */
int stringSize; /* Size of string. */
};
That is there is a comment describing the overall purpose
of the object between the struct name, and the opening brace,
and there is a short comment by each field. In many cases
these may not say much more than well-chosen field names,
but that's ok.
Almost any structure with more than three or four
elements includes a "next" pointer as its first
member, so that it can be part of a singly-linked
list. There's a whole set of routines (see
common.c and common.h) which work on singly-linked
lists where the next field comes first. Their
names all start with "sl."
Functions which work on a structure by convention begin with
the name of the structure, simulating an object-oriented
coding style. In general these functions are all grouped
in a file, in this case in dyString.c. Static functions in
this file need not have the prefix, though they may. Functions
have a comment between their prototype and the opening brace
as in this example:
char dyStringAppendC(struct dyString *ds, char c)
/* Append char to end of string. */
{
char *s;
if (ds->stringSize >= ds->bufSize)
dyStringExpandBuf(ds, ds->bufSize+256);
s = ds->string + ds->stringSize++;
*s++ = c;
*s = 0;
return c;
}
For short functions like this, the opening comment may be the only
comment. Longer functions should be broken into logical 'paragraphs'
with a comment at the start of each paragraph and blank lines
between paragraphs as in this example:
struct twoBit *twoBitFromDnaSeq(struct dnaSeq *seq, boolean doMask)
/* Convert dnaSeq representation in memory to twoBit representation.
* If doMask is true interpret lower-case letters as masked. */
{
int ubyteSize = packedSize(seq->size);
UBYTE *pt;
struct twoBit *twoBit;
DNA last4[4]; /* Holds few bases. */
DNA *dna;
int i, end;
/* Allocate structure and fill in name. */
AllocVar(twoBit);
pt = AllocArray(twoBit->data, ubyteSize);
twoBit->name = cloneString(seq->name);
twoBit->size = seq->size;
/* Convert to 4-bases per byte representation. */
dna = seq->dna;
end = seq->size - 4;
for (i=0; i<end; i += 4)
{
*pt++ = packDna4(dna+i);
}
/* Take care of conversion of last few bases. */
last4[0] = last4[1] = last4[2] = last4[3] = 'T';
memcpy(last4, dna+i, seq->size-i);
*pt = packDna4(last4);
/* Deal with blocks of N. */
twoBit->nBlockCount = countBlocksOfN(dna, seq->size);
if (twoBit->nBlockCount > 0)
{
AllocArray(twoBit->nStarts, twoBit->nBlockCount);
AllocArray(twoBit->nSizes, twoBit->nBlockCount);
storeBlocksOfN(dna, seq->size, twoBit->nStarts, twoBit->nSizes);
}
/* Deal with masking */
if (doMask)
{
twoBit->maskBlockCount = countBlocksOfLower(dna, seq->size);
if (twoBit->maskBlockCount > 0)
{
AllocArray(twoBit->maskStarts, twoBit->maskBlockCount);
AllocArray(twoBit->maskSizes, twoBit->maskBlockCount);
storeBlocksOfLower(dna, seq->size,
twoBit->maskStarts, twoBit->maskSizes);
}
}
return twoBit;
}
====================================================================
This file last updated: $Date$