ee4112cdcbb55495fac87b26cb9c77ab57baea79 ceisenhart Thu May 14 11:55:50 2015 -0700 Some basic notes about the assignment, I am piping stats directly into the file, and worry it will be overwritten. Hence, git diff --git cirmNotes.txt cirmNotes.txt new file mode 100644 index 0000000..279cf26 --- /dev/null +++ cirmNotes.txt @@ -0,0 +1,18 @@ +There is one file per cell, this many cells + 1,047 1047 52528 +there are this many transcripts per file + 173,260 866300 5352029 A43D1_1_1_L13_C17_IL3545-708-503/abundance.txt +Therefore an expected 181,403,220 exp scores +there are this many lines total with a actual expression score + 14,794,718 88768308 1244772164 allNonNull.txt +around 8 % of the scores are non 0 + +Need to identify a common intersection of the 173,260 transcripts across the 1,047 cell +lines + +Each cell line has a .txt file that stores the information as a tab +deliminated table. Each row corresponds to a single transcript +GOAL: Identify the rows that have a score across all cell lines +Picking a random file it has this many rows with 0 in them, +172756 +and therefore this many rows without 0, 504