8a484935dc91f2c3859e386f97a9ea1773302b72 mmaddren Fri Aug 26 14:48:31 2011 -0700 added minor fix for rafile diff --git python/programs/mkGeoPkg/README python/programs/mkGeoPkg/README index cfbb510..e14c00e 100644 --- python/programs/mkGeoPkg/README +++ python/programs/mkGeoPkg/README @@ -1,24 +1,15 @@ mkGeoPkg First an overview for programmers who have no idea what all this means: GEO collects data for permanant archival. The data we submit to them is contained as one Series. A Series is a set of Samples. A Sample corresponds to a small set of our Stanzas, or corresponds 1:1 with our expIds. A Series directly corresponds to our Composite. So one file in the MDB = one Composite (our term) = one Series (in the soft file) = at least 1 submission to GEO. Submit data to GEO for processing. This will generate a SOFT file for part or all of a composite, and a script with the aspera command to submit all the required files. The SOFT file will not be complete, and you will have to manually curate some data. To do this, replace all values with the value '[REPLACE]'. This script takes data from: -the composite's RA file in the mdb -the CV in alpha -the trackDb ra file -the MD5 sums from the downloads directory There is also a lot of hand-curated data in the top of the program, namely the DataType, which you might want to modify. -Usage: -mkGeoPkg database composite [expIdStart expIdLength] - -database: typically hg19 or potentially mm9, the database to look in -composite: the name of the composite we wish to submit -expIdStart: if you want to submit a part of the composite, this is the first expId to submit -expIdLength: - -mkGeoPkg hg19 wgEncodeCshlLongRnaSeq 143 10