9fc4a45a0bc85ef2d4c397a557fd1d9cd7dee535 kent Thu Oct 29 11:11:51 2020 -0700 Documenting 'unroll' stanzas diff --git src/tabFile/tabToTabDir/tabToTabDir.doc src/tabFile/tabToTabDir/tabToTabDir.doc index f4570ea..7c04a71 100644 --- src/tabFile/tabToTabDir/tabToTabDir.doc +++ src/tabFile/tabToTabDir/tabToTabDir.doc @@ -69,23 +69,37 @@ If a more than one row of the input generates the same key in the output that is ok so long as all of the other fields that are generated agree as well. An exception for this is made for summary expressions. Summary expression all begin with the character '$'. The allowed summary expressions are $count - counts up number of input rows that yield this row $stats sourceExpression - creates comma separated list of all values and some statistics $list sourceExpression - creates comma separated list of unique values of sourceExpression If the source field starts with '@' then it is followed by a table name and is intepreted as the same value as the key field in the this table If there is a '?' in front of the column name it is taken to mean an optional field. if the corresponding source field does not exist then there's no error (and no output) for that column -In addition to the table stanza there can be a 'define' stanza that defines variables -that can be used in sourceFields for tables. This looks like: +In addition to the table stanza there can be a 'define' stanza at the start of the file +that defines variables that can be used in sourceFields for tables. This looks like: define variable1 sourceField1 variable2 sourceField2 The defines can be useful particularly when multiple tables of output want the same field. Though tabToTabDir encourages normalization, realistically it is used to fill in things for some pretty redundant formats. + +There is also a 'unroll' stanza that can be used to make up a table that unrolls comma-separated +list fields into a tables instead. The format is + unroll tableName id + field1 [expression1] + field2 [expression2] + ... + fieldN [expressionN] +where the expression rules follow the same logic as the they do for table stanzas. The +expressions for an unroll need to evaluate to the same comma separated list for each row +in the input table, and all fields must have the same number of values. Thus the +unroll stanza only works on a small subset of input fields. Nonetheless it is useful for +unpacking author lists and in some other cases as well. +