2d05d30ed4df1612d72ba84c812d004de935b122 angie Fri May 17 16:08:54 2024 -0700 Add lib module mmHash (memory-mapped hash), util tabToMmHash, and hgPhyloPlace support for using mmHash files instead of tab-separated files for metadata and name lookup. Using mmHash for name lookup saves about 50-55 seconds for SARS-CoV-2 hgPhyloPlace name/ID queries. diff --git src/lib/tests/input/mmHashTest.txt src/lib/tests/input/mmHashTest.txt new file mode 100644 index 0000000..1557a38 --- /dev/null +++ src/lib/tests/input/mmHashTest.txt @@ -0,0 +1,22 @@ +one potato +This is a key that is pretty long for a key, like longer than 256 bytes, just because sometimes names of things get longer than we ever expected them to. I made keyLen four bytes instead of two just because I had a nagging suspicion that using two bytes would bite us someday. So now I have to test that. I'm not going to test the use of ... wait... there is no keyLen, never mind that, it's just null-terminated strings because I'm not worrying about alignment for values. Anyway, to go with this really long key, guess what the value will be? Nothing! That's right, just the empty string. :) +two potato +three tomato +four +AUS/VIC5319/2020|MT971277.1|2020-07-19 MT971277.1 2020-07-19 Australia 29788 20F D.2 20F D.2 +Denmark/DCGC-137575/2021|OW251480.1|2021-07-23 OW251480.1 2021-07-23 Denmark 29890 21I (Delta) AY.75 21I (Delta) AY.75 +England/ALDP-361CAC8/2022|OV968409.1|2022-02-07 OV968409.1 2022-02-07 England 29873 21K (BA.1) BA.1.17.2 21K (BA.1) BA.1.17.2 +Denmark/DCGC-334019/2022|OW954385.2|2022-01-21 OW954385.2 2022-01-21 Denmark 29850 21L (BA.2) BA.2 21L (BA.2) BA.2 +Denmark/DCGC-633156/2022|OY803582.1|2022-12-27 OY803582.1 2022-12-27 Denmark 29870 22E (BQ.1) BQ.1 22E (BQ.1) BQ.1_17039 +England/DHSC-CY8EDR1/2022|2022-07-01 2022-07-01 England 22B (BA.5) BA.5.2 22B (BA.5) BA.5.2 +Denmark/DCGC-644770/2023|OY796809.1|2023-03-25 OY796809.1 2023-03-25 Denmark 29876 recombinant XBK.1 22D (BA.2.75) XBK.1 +Germany/IMS-10370-CVDP-25F60D9B-720D-4A01-BD21-83A5659036E6/2021|OU091586.1|2021-03-05 OU091586.1 2021-03-05 Germany 29695 20I (Alpha) B.1.1.7 20I (Alpha) B.1.1.7 +CHE/P1784_USZ22_Teilnehmer1/2021|ON982615.1|2021-11-16 ON982615.1 2021-11-16 Switzerland 29823 21J (Delta) AY.129 21J (Delta) AY.129 +Germany/IMS-10116-CVDP-BC831F0D-BD49-41F0-9113-79A2469882A0/2022|OY124840.1|2022-10-25 OY124840.1 2022-10-25 Germany 29626 22B (BA.5) BF.7 22B (BA.5) BF.7 +FRA/IHUCOVID-035731_Nova1/2020|ON278900.1|2020-08-31 ON278900.1 2020-08-31 France 29702 20A B.1.160 20A B.1.160 +Germany/IMS-10013-CVDP-95CDA5FF-EAF4-4BDB-8651-144B54ADB4D5/2022|OY110982.1|2022-02-23 OY110982.1 2022-02-23 Germany 29844 21K (BA.1) BA.1.1 21K (BA.1) BA.1 +Germany/IMS-10004-CVDP-052F898E-1CA5-4D55-A4CB-77BB224E7C56/2021|OV495278.1|2021-12-16 OV495278.1 2021-12-16 Germany 29768 21K (BA.1) BA.1.15 21K (BA.1) BA.1.15 +AUS/VIC5410/2020|MT971396.1|2020-07-06 MT971396.1 2020-07-06 Australia 29813 20F D.2 20F D.2 +Germany/IMS-10267-CVDP-F30CF4DB-2AF3-47E3-B6CA-C570C8D1B1EB/2022|OY134271.1|2022-06-24 OY134271.1 2022-06-24 Germany 29831 22A (BA.4) BA.4.6 22A (BA.4) BA.4.6 +FRA/IHUCOVID-074561/2022|OQ899545.1|2022-03 OQ899545.1 2022-03 France 29762 21L (BA.2) BA.2.3 21L (BA.2) BA.2.3 +Germany/IMS-10013-CVDP-16B14EBE-784E-4D6A-8DA5-349F6A606A50/2022|OX967280.1|2022-04-02 OX967280.1 2022-04-02 Germany 29820 21L (BA.2) BA.2 21L (BA.2) BA.2