src/hg/utils/tdbQuery/tdbQuery.doc 1.2
1.2 2009/12/03 21:04:30 kent
Adding to documentation.
Index: src/hg/utils/tdbQuery/tdbQuery.doc
===================================================================
RCS file: /projects/compbio/cvsroot/kent/src/hg/utils/tdbQuery/tdbQuery.doc,v
retrieving revision 1.1
retrieving revision 1.2
diff -b -B -U 4 -r1.1 -r1.2
--- src/hg/utils/tdbQuery/tdbQuery.doc 3 Dec 2009 20:28:58 -0000 1.1
+++ src/hg/utils/tdbQuery/tdbQuery.doc 3 Dec 2009 21:04:30 -0000 1.2
@@ -9,4 +9,60 @@
in a record, which if it exists specifies the parent track. A third type of inheritance
also exists, the "view" level. The trackDb system is explained in more detail in the
README file in the trackDb root directory.
+TdbQuery is a convenient way to see what's in this system, which has grown to the point where
+it is difficult to scan by eye. It procedes in three steps. First it parses the relevant
+.ra files, second it applies all of the inheritances so that a record includes all the fields
+from it's parents as well as the fields defined in the record itself, third it applies the
+query language statement to each record, printing the relevant part of each matching record.
+
+The query language, RQL, is much like SQL. A statement in this language is made up of multiple
+clauses: select, from, where, and limit. Some of the clauses are optional. An example of
+a statement with all the clauses is:
+ select track,shortLabel,type from hg18,hg19 where type = 'bedGraph 4' limit 10
+The select clause comes first, and specifies which fields should be printed. It is possible to use the wildcard characters * and ? in the select clause, which work like they do in the Unix file
+system. The statement
+ select * from hg18 where track='knownGene'
+selects all fields from the knownGene track. Unlike SQL, you can used more nuanced wildcards in
+this clause, so that
+ select *Label from hg18 where track='knownGene'
+will print all fields ending with Label. As with SQL you can count the number of records with
+ select count(*) from hg18
+and it is possible to add a where clause with count(*) as well.
+
+The from clause in RQL is a list of genome databases to query. This is quite different from SQL in
+that the different items in the where clause are just searched one after the other, rather than
+"joined" together. Wildcards are permitted in the from clause as well. The statement
+ select count(*) from *
+will count all records in all databases.
+
+The where clause is optional, but it can be complex. If present the "where" is followed by a
+logical expression. The select statement is only applied to records where the where clause
+evaluates to true. The items in the expression can be number, strings, and fields from the
+record. Numbers evaluate to true if they are non-zero. Strings evaluate to true if they are
+non-empty. Fields evaluate to TRUE if they exist in the record. This last is particularly
+useful in practice. The statement
+ select * from hg18 where autoScale
+for instance prints out all records where there is an autoScale field.
+
+Items in a where clause can be combined with comparison and logical operations. The comparison
+operators are = != > < >= <= and can be used with numbers and strings. For strings the greater
+than and less than operations apply to the alphabetical order (case sensitive) of the strings.
+String can also be compared to wildcards using the 'like' and 'not like' operations. As in
+SQL the wildcards are '%' for match any number of characters (the equivalent to '*' in
+field and file wildcards) and '_' to match a single character (the equivalent to '?'). The
+statement
+ select * from hg18 where shortLabel like "%Gene"
+returns all fields from all records where the shortLabel ends in Gene. The logical operators are
+"or" and "and" and "not" which have the usual meanings. The or operator has, as is traditional,
+lower precedence than the and operator. Because of this the following two statements are
+equivalent
+ select * from hg18 where a > 0 and a < 10 or b > 0 and b < 10
+ select * from hg18 where (a > 0 and a < 10) or (b > 0 and b < 10)
+The where clause also supports array indexes. These are used to fetch whitespace delimited
+words within a string. A common use is to specify the first word in the type line:
+ select * from hg18 where type[0] = "wig"
+
+The limit clause is very straightforward, it is just the word "limit" followed by a positive number
+which specifies the maximum number of records to return.
+