e59a222a6acfae7632d0de9f11cf72b787d890e7 kent Wed Sep 4 09:06:27 2019 -0700 Adding a doc file to this new utility. diff --git src/hg/sqlUpdateRelated/sqlUpdateRelated.doc src/hg/sqlUpdateRelated/sqlUpdateRelated.doc new file mode 100644 index 0000000..37ba03cb --- /dev/null +++ src/hg/sqlUpdateRelated/sqlUpdateRelated.doc @@ -0,0 +1,113 @@ +This utility, sqlUpdateRelated, was developed as a bridge between a database with +foreign keys and relationship tables and our usual local universe of tab-separated-files. +It takes as input a series of tab-separated files, with the fields that involve foreign +keys including some extra @ signs in their labels on the first line. The output is +an update to the database done in a fairly careful and controlled fashion. This output +generally will involve more fields and more tables than in the input. + +FOREIGN KEYS in a table + +Consider the case of a foreign key. In the database you might have a table +that looks like so + +table shoe_types +#id value +1 sandles +2 sneakers +3 loafers + +table shoe +#id type_id name +1 2 soft soul sneakers +2 1 havaina luxury thongs +3 3 urban polish + +We'd like to be able to specify new data for these two related tables with +something like: + +#type name +sneakers soft sole sneakers +sandles havaina luxury thongs +loafers urban polish + +We'd like to do this, because hey, we aren't in the database, we don't know what all the id's are, +and also one table is easir to write than two. With sqlUpdateRelated it's not quite as simple, but not much harder. We just have to encode a little info about the foreign key relationship in the +field name, which we do with @ signs: + + +#@type_id@shoe_types@value@id name +sneakers soft sole sneakers +sandles havaina luxury thongs +loafers urban polish + +The foreign key field starts with a single @ sign so we know it's not a normal field. +The remaining @ signs delimit the following parts of the extended field name: + 1 - field name in the main table + 2 - name of foreign table + 3 - field to look up in the foreign table + 4 - primary key (to output in main table in place of value in #3) in foreign table. + +MULTIPLE-MULTIPLE RELATIONSHIPS + +The case for handling multiple-multiple relationships is even more complicated because +rather than just looking up a key to use in place of a more descriptive name, you have +to do multiple lookups and in the end update the relationship table as well as the main +table. We need even more @'s to do this. Consider the case where we are trying to +track labs and people. Since people *can* belong to more than one lab, and labs generally +have more than one person in them, it is a multi=multi relationship. We might represent it +in the database as so + +table people +#id name email +1 Jane Doe janedoe@gmail.com +2 Jim Kent jimkent@ucsc.edu +3 Joe Postdoc igottaphd@yahoo.com + +table lab +#id lab_name institution +1 kent_lab_gi UC Santa Cruz +2 doe_lab_cte California Institute of Technology + +table lab_people_relationship +#id lab_id people_id +1 1 2 +2 1 3 +3 2 1 +3 2 3 + +Figuring all the little numbers to put in the lab_people_relationship table is again +not something that can be done outside of code that is actually loading the database, +but we want a generic loader. So, more @ signs to the rescue. We'll construct +fields that look like so: + @@lab_members@id@lab_people_relationship@lab_id@people_id@people@email@id +Where the initial @@ marks it as a multi-multi field, and the @ separated values are: + 1 - nativeFieldName - name you'd like to assign to multi-multi field (it isn't actually in db) + 2 - nativeKeyName - the field containing the primary key in the main table + 3 - relationalTable - the name of the relational table + 4 - relationalNativeField - the name of the field that keys into main table in relational table + 5 - relationalForeignField - the field that keys into foreign table in relational table + 6 - foreignTable - the foreign table + 7 - foreignFindName = the field to search in the foreign table + 8 - foreignKeyName = the primary key field in the foreign table + +We could update the lab table then like so: + +#lab_name institution @@lab_members@id@lab_people_relationship@lab_id@people_id@people@email@id +kent_lab_gi UC Santa Cruz jimkent@ucsc.edu,igottaphd@yahoo.com +doe_lab_cte California Institute of Technology janedoe@gmail.com,igottaphd@yahoo.com + +There are enough @'s that it is a bit awkward to remember them all and their order, but +at least the program will check that the tables and fields all exist for you before +proceeding which catches most mixups. + +CONDITIONAL UPDATES + +There's many times when you only want to insert a new row in the database if it doesn't +already exist. For this purpose we just need to put a ? in front of some unique field name. +In the above example, we might alter the first label slightly to get this effect, changing +lab_name into ?lab_name. In this case the lab (and the relationship) table would only be +updated for labs that did not already exist. + +EXAMPLES + +Please see the tests subdirectory and the doTest.csh script for an example.