Commits for angie

49fafa484293682f56ca08a0d902d8366fc476f7 Thu Sep 2 17:43:32 2021 -0700

Link to client-side UShER implementation ShUShER.
- src/hg/hgPhyloPlace/hgPhyloPlace.c - lines changed 5, context: html, text, full: html, text

dcdb342ff4f255973940339af65b1b7266684ef9 Thu Sep 2 17:45:48 2021 -0700

Add DRC -> Democratic Republic of the Congo.
- src/hg/utils/otto/sarscov2phylo/gisaidNameToCountry.pl - lines changed 1, context: html, text, full: html, text

86c9c0072ad12699dfdcf8232ad812f1982d23d7 Thu Sep 2 17:47:47 2021 -0700

NCBI Datasets now includes a biosample.jsonl file -- when it's stable we won't need EUtils anymore, yay!
- src/hg/utils/otto/sarscov2phylo/bioSampleJsonToTab.py - lines changed 185, context: html, text, full: html, text

7f0ad33bd004bc5616a9214534d74371f9f51940 Thu Sep 2 17:48:56 2021 -0700

Greatly increase tolerance for missing BioSample records. I reported this to NCBI; sounds like that's just the way it is with importing delays.
- src/hg/utils/otto/sarscov2phylo/gbMetadataAddBioSample.pl - lines changed 1, context: html, text, full: html, text

78a15f21fb3bfad23eac7217e6d831810558304e Thu Sep 2 17:50:15 2021 -0700

Adding note about COG-UK credits in BioSample text (although I should repeat with the new biosample.jsonl file).
- src/hg/utils/otto/sarscov2phylo/publicCredits.sh - lines changed 14, context: html, text, full: html, text

eeafd7b8b2ea5f34ee78bfe73b51ea7a1be5bce4 Thu Sep 2 17:51:46 2021 -0700

Add the same filters that we've been using on the combined tree because when GISAID samples are removed, some branches become very long. (Russ & Yatish request)
- src/hg/utils/otto/sarscov2phylo/extractPublicTree.sh - lines changed 2, context: html, text, full: html, text

d49c83bd184000554d36f1ffd9c07a89fb856352 Thu Sep 2 17:53:52 2021 -0700

Use the new bioSampleJsonToTab.py script but keep the EUtils stuff for now as a back up until biosample.jsonl looks stable and complete. Make a separate log file for gbMetadataAddBioSample.pl output for easier extraction of IDs to ping submitters / NCBI about. Watch out for duplicated sequences in fasta.
- src/hg/utils/otto/sarscov2phylo/getNcbi.sh - lines changed 15, context: html, text, full: html, text

1f64522294183ac3613b48b2d937fa8a666c9afa Thu Sep 2 17:55:45 2021 -0700

Move the --max-parsimony post-filtering into usher's new --max-parsimony-per-sample option. Add mechanism to exclude sequences that sneak past quality filters by faking reference alleles where they should have Ns.
- src/hg/utils/otto/sarscov2phylo/updateCombinedTree.sh - lines changed 5, context: html, text, full: html, text

c28af7c6bbe39c880c7e6dcc0f1c3ff2ee5b427c Fri Sep 3 12:11:22 2021 -0700

Grab lab from obscure location found by Nextstrain team in ncov-ingest PR#208.
- src/hg/utils/otto/sarscov2phylo/bioSampleJsonToTab.py - lines changed 11, context: html, text, full: html, text

ce891d7726cb07920b5072bb5b4ca9faf44c6e61 Fri Sep 3 16:39:47 2021 -0700

Translating protein coords to CDS might result in coords that fall off the end of CDS but not off the end of the transcript. Avoid showing 3'UTR results for a protein search by requiring results in the coding region for protein searches. refs #28107
- src/hg/lib/hgHgvs.c - lines changed 8, context: html, text, full: html, text
- src/hg/lib/tests/expected/hgvs/validTerms.txt - lines changed 2, context: html, text, full: html, text
- src/hg/lib/tests/hgvsTester.c - lines changed 8, context: html, text, full: html, text