31dcb07904fb264e64f184905d0f5ad7ccc94f44 mmaddren Wed Sep 21 16:06:55 2011 -0700 more documentation for ucscgenomics diff --git python/lib/ucscgenomics/ra.py python/lib/ucscgenomics/ra.py index 9abc7dc..1bcb7c9 100644 --- python/lib/ucscgenomics/ra.py +++ python/lib/ucscgenomics/ra.py @@ -23,31 +23,32 @@ files the following holds true: somestanza.name = somestanza['metaObject'] = 'wgEncodeSomeStanzaName' Although the above is useful if you want one thing, it's usually more helpful to be able to loop and query on the stanza. To add a term named 'foobar' to every stanza in a ra file: for stanza in rafile.values(): stanza['foobar'] = 'some value' Note that I iterated over values. It can also be useful to iterate over a stanza's keys: for key in rafile.keys(): print key Note that ra files are order preserving. Added entries are appended to the - end of the file, and + end of the file. This allows you to print out a ra file easily: + print rafile Most of the time you don't want to do something with all stanzas though, instead you want to filter them. The included filter method allows you to specify two functions (or lambda expressions). The first is the 'where' predicate, which must take in one stanza, and return true/false depending on whether you want to take that stanza. The second is the 'select' predicate, which takes in the stanza, and returns some subset or superset of the stanza as a list. Using filter is preferable to for loops where there are no side effects, or to filter data before iterating over it as opposed to using if statements in the loop. To get all stanzas with one experiment ID for instance, we would do something like this: stanzas = rafile.filter(lambda s: s['expId'] == '123', lambda s: s) Note that you don't have to ensure 'expId' is in the stanza, it will silently fail. Let's look at another example, say you want to find all