0f25fca4ed573b955b6a34b8df5c3dcb92331058 hiram Thu Jul 13 18:30:47 2017 -0700 correct typo refs #18969 diff --git src/utils/crisprKmers/crisprKmers.c src/utils/crisprKmers/crisprKmers.c index c9793ae..ebbc08b 100644 --- src/utils/crisprKmers/crisprKmers.c +++ src/utils/crisprKmers/crisprKmers.c @@ -1,22 +1,22 @@ /* crisprKmers - find and annotate crispr sequences. */ /* Copyright (C) 2017 The Regents of the University of California * See README in this or parent directory for licensing information. */ /* Theory of operation: - a. scan given sequence (2bit of fa or fa.gz file) + a. scan given sequence (2bit or fa or fa.gz file) b. record all quide sequences, both positive and negative strands, on a linked list structure, 2bit encoding of the A C G T bases, with PAM sequence, strand and start coordinates, one linked list for each chromosome name. c. if a 'ranges' bed3 file is given, then divide up the linked list guide sequences into a 'query' list and a 'target' list. The 'query' list of guide sequences are those that have any overlap with the 'ranges' bed3 items. The 'target' list is an exclusive set of all the other guide sequences. d. Without 'ranges', the full list of sequences can be considerd as the 'query' sequences. e. Convert the linked list structures into memory arrays, get all the sequence data and start coordinates into arrays. This is much more efficient to work with the arrays than trying to run through the linked lists. The data happens to become duplicated as it