8e151d5c596f18029f8b0f027902b66f720b76e1
galt
  Thu Jun 23 16:49:27 2016 -0700
adding a backup to hive of non-git-tracked files as suggested by Max.

diff --git src/hg/geoIp/README src/hg/geoIp/README
index 4044c6e..3a96a9f 100644
--- src/hg/geoIp/README
+++ src/hg/geoIp/README
@@ -1,193 +1,199 @@
 NEW genome-asia!
 
 Our previous update was from 2011 when genome-euro was first made,
 so we are going to update everything with fresh data.
 
 RESTARTING update from ground zero,
 all tables backed up and cleared out on hgFixed
 
 wget 'http://dev.maxmind.com/static/csv/codes/iso3166.csv' -O country.csv
 
 To convert csv to tab-separated (removing quotes):
 cat country.csv | gawk -F "," '{print $1 "\t" $2}' | sed 's/"//g' > country.tab
 
 
 wget 'http://dev.maxmind.com/static/csv/codes/country_continent.csv' -O countryToContinent.csv
 To convert csv to tab-separated:
 cat countryToContinent.csv | gawk -F "," '{print $1 "\t" $2}' > countryToContinent.tab
 
 https://gist.github.com/nobuti/3816985
 Manually made continent.csv
 It seems odd that nobody has an easy to download cvs or tab-delimited continents list.
 To convert csv to tab-separated:
 cat continent.csv | gawk -F "," '{print $1 "\t" $2}' > continent.tab
 
 I have my data, so now I load the tables:
 
 This is a little script that runs autoSql on each .as to get the .sql files
 However these .sql files are not checked in to git since they are 
 so easily generated.
 
 ./make-sql
 
 took the old gbNode.tab with US and euro,
 and added the japan record:
 vi gbNode.tab
 | 3    | genome-asia.ucsc.edu | JP          | Asian node of UCSC Genome Browser at the RIKEN institute of Japan | Asian Server  |
 changed "\" to tab char.
 
 made a script to create and load the .as tables
 some of them actually do not have any data yet,
 since the data is created by joins of other tables
 
 ./load-tables
 
 note that because the script is so simple
 and some of the tables do not have .tab files,
 they will give errors. We will fill the empty
 tables later by inserting from a join of other tables.
 
 gbNode
 geoIpNode
 geoIpCountry
 country
 continent
 countryToContinent
 
 Check the integrity of the continent/country tables:
 
 forgot to strip off the 1st row of the csv file.
 Cleanup left-over header row.
 delete from countryToContinent where  continentId = '"continent code"';
 
 DONE maybe I need to delete A1 and A2 which are weird special categories
 that are not real countries and do not have an assigned continent which is given the value "--".
 | A1 | Anonymous Proxy      |
 | A2 | Satellite Provider   |
 These turn out not to be a problem.
 
 
 mysql> select * from countryToContinent where countryId not in (select id from country);
 +-----------+-------------+
 | countryId | continentId |
 +-----------+-------------+
 | AN        | NA          |
 | FX        | EU          |
 +-----------+-------------+
 
 AN Netherlands Antilles
 
 insert into country values ('AN', 'Netherlands Antilles');
 
 insert into country values ('FX', 'France, Metropolitan');
 
 mysql> select * from countryToContinent where continentId not in (select id from continent);
 +-----------+-------------+
 | countryId | continentId |
 +-----------+-------------+
 | A1        | --          |
 | A2        | --          |
 | O1        | --          |
 +-----------+-------------+
 
 insert into continent values ('--','not applicable');
 
 mysql> select count(*) from geoIpCountry where countryId not in (select id from country);
 +----------+
 | count(*) |
 +----------+
 |        0 |
 +----------+
 
 --
 
 start off with crude continent-level assignments:
 
 make everything 1 by default
 
 insert into continentToNode select id, 1 from continent;
 
 Europe
 update continentToNode set node=2 where continentId = "EU";
 
 Asia
 update continentToNode set node=3 where continentId = "AS";
 
 Now we just can make a join
 
 insert into countryToNode (select countryId, node from countryToContinent cc, continentToNode cn where cc.continentId = cn.continentId);
 
 # fix ones requested by Bob to be included in euro instead of defaulting to america.
 # these are mostly around the middle east.
 update countryToNode set node=2 where countryId in ('AE','AM','BH','CY','GE','IL','IQ','JO','KW','LB','OM','PS','QA','SA','SY','YE');
 
 I compared the before and after updating with public_html/geoMap/test.csh at
  http://hgwdev.cse.ucsc.edu/~galt/geoMap/countryToNodeMap.html
 And it looked identical. So we did not lose anything.
 
 mysql> select distinct countryId from geoIpCountry where countryId not in (select distinct countryId from countryToNode);
 +-----------+
 | countryId |
 +-----------+
 | SS        |
 | CW        |
 | SX        |
 | BQ        |
 +-----------+
 
 mysql> select * from country where id in ('SS','CW','SX','BQ');
 +----+--------------+
 | id | name         |
 +----+--------------+
 | BQ | Bonaire      |
 | CW | Curacao      |
 | SS | South Sudan  |
 | SX | Sint Maarten |
 +----+--------------+
 
 
 So, what happened to these countries, and why are they not in countryToNode?
 Shall we add them?
 
 insert into countryToContinent values ('SS', 'AF');
 insert into countryToContinent values ('SX', 'NA');
 insert into countryToContinent values ('CW', 'NA');
 insert into countryToContinent values ('BQ', 'NA');
 
 insert into countryToNode values ('SS', '1');
 insert into countryToNode values ('SX', '1');
 insert into countryToNode values ('CW', '1');
 insert into countryToNode values ('BQ', '1');
 
 
 mysql> select distinct countryId from geoIpCountry where countryId not in (select distinct countryId from countryToNode);
 Empty set (0.13 sec)
 
 Re-ran test.csh and checked again:
 http://hgwdev.cse.ucsc.edu/~galt/geoMap/countryToNodeMap.html
 
 This means we should have full integrity.
 
 insert into geoIpNode (select ipStart, ipEnd, node from geoIpCountry gic, countryToNode ctn where gic.countryId=ctn.countryId);
 
 Query OK, 164390 rows affected (1.61 sec)
 Records: 164390  Duplicates: 0  Warnings: 0
 
 NOW IT IS READY!
 
 Note that in the future we may be putting other contries with other servers,
 like we did with the middle-east countries, overriding their default continent mappings
 in countryToNode.
 
 
 Not sure yet if I am going to do this again.
 It was what I did last time.
 The idea was to keep a backup of these important tables.
 That way perhaps hgFixed could be used when generating
 other sets of testing tables for redirect.
 
 TODO at the end:
 rename tables to real:
 gbNodeReal
 geoIpNodeReal
 
+Make an extra backup of just the git untracked files just in case:
+
+mkdir /hive/data/outside/geoIp/
+gitu | xargs -I X cp X /hive/data/outside/geoIp/
+
+