6db87360227e298279ec6088c5e070f8c42e51b8 angie Tue Nov 14 17:11:45 2023 -0800 Masking some additional locations in BA.2 (& onward), BA.2.86 and XCK. diff --git src/hg/utils/otto/sarscov2phylo/branchSpecificMask.yml src/hg/utils/otto/sarscov2phylo/branchSpecificMask.yml index 0e3e5d7..1a51e1c 100644 --- src/hg/utils/otto/sarscov2phylo/branchSpecificMask.yml +++ src/hg/utils/otto/sarscov2phylo/branchSpecificMask.yml @@ -55,30 +55,33 @@ [ 221, 225 ], [ 230, 233 ], [ 241, 243 ], [ 245, 246 ], # deletions [ 11288, 11296 ], [ 21633, 21641 ], [ 28361, 28371 ], [ 29734, 29759 ], # 28877 and 28878 together are highly homoplasic in all of B.1.1 (28881-28883). # They seem to be found very consistently in P.1*, but pop up in many places in Alpha # and Omicrons. I haven't looked closely at the Alpha instances but they caused # some mini-Omicrons (https://github.com/cov-lineages/pango-designation/issues/988). # Possibly could also mask in B.1.1.7 and BA.1 but those are old news. [ 28877, 28878 ], # 3'UTR [ 29769, 29779 ], [ 29781, 29782 ] ] sites: [ # 5'UTR 103, 110, 119, 121, 154, 162, 164, 214, 228, 239, # amplicon dropout (there are so many more; omitting to avoid recombinant trouble) 22786, 22882, 23854, + # Recurrent multi-muts / misaligned insertions + # https://github.com/cov-lineages/pango-designation/issues/2327#issuecomment-1763481773 + 28245, 28251, 28254, # 3' UTR 29760, 29762, 29764, 29766, # only Luxembourg, made a mini-BA.2 29767, 29784, 29786, 29793, 29800, 29803 ] BA.2.75: # Inherits BA.2 masking but has so many problems with false reversions I'm adding a ton here. representative: India/WB-INSACOG-1931503209307/2022 reversions: [ # BA.2-level but not masked BA.2-wide to avoid messing up recombinants: G670T, T2790C, T3037C, T4321C, G9424A, T9534C, T9866C, T10029C, T10198C, G18163A, T19955C, G20055A, T21618C, G22200T, A22578G, T22674C, C22679T, T22686C, G22688A, A22775G, T22813G, A22992G, A22995C, C23013A, G23055A, T23063A, C23075T, G23403A, T23525C, G23599T, A23604C, T23948G, T24424A, A24469T, T25000C, T26270C, G26577C, T27807C, T28271A, C29510A, # BA.2.75-defining, very dropout-prone: @@ -86,30 +89,40 @@ # BA.2.75-defining, bad but maybe not quite as bad: T3796C, T3927C, T5183C, G12444A, A15451G, G22190A, A22331G, A22898G, G22942T, C23013A ] # False muts in re-placed (from BA.5) recombinants: # XBD: A26275G # XBP: G22331A, G22577C, G22898A, A26275G # XBR: A22190G, G22331A, G22577C, G22898A, A26275G # XBS: A22190G, G22331A, G22577C, G22898A, A26275G BN.1.2.3: # Inherits from BA.2.75 representative: England/QEUH-326228D4/2022 sites: [ 337 # https://github.com/cov-lineages/pango-designation/issues/2016#issuecomment-1626159006 ] +BA.2.86: + # Inherits from BA.2 + # @Over-There-Is requested 21610 - very messy indeed. + # https://github.com/sars-cov-2-variants/lineage-proposals/issues/606#issuecomment-1801095482 + # @aviczhl2 pointed out some recurring reversions: + # https://github.com/sars-cov-2-variants/lineage-proposals/issues/1072 + representative: OY747147.1 + sites: [ 21610 ] + reversions: [ T21711C, C22032T, A22033C, G22034A, A23012G, G26610A ] + BA.4: # BA.4 is placed on the BA.2 branch so it inherits all the BA.2 sites. representative: SouthAfrica/NICD-N41664/2022 ranges: [ [ 686, 694 ], [ 21765, 21770 ] ] BA.5: # BA.5 is placed on the BA.2 branch so it inherits all the BA.2 sites. representative: England/PHEP-YYFJPAM/2022 ranges: [ [ 21765, 21770 ] ] # Some of these should be reverted in recombinants, but we're pretty much past the point of # simultaneous Delta/Omicron and the noise from false reversions is so intolerable that we'll # just have to watch out for missing reversions when working with recombinants. # False muts in recombinants that were later re-placed in BA.2.75: # XBD: G12160A, T22917G, T23018G # XBP: G12160A, @@ -157,15 +170,24 @@ representative: England/LSPA-32578111/2022 reversions: [ T22317G ] XBB.1.5: # Inherits from XBB.1 # Don't believe reversions on 27915 once we're as far as XBB.1.* representative: England/BRBR-32671539/2022 reversions: [ T27915G ] XBC: # Inherits nothing! Should find out its deletions. # Cornelius Roemer requested to mask several reversions in # https://github.com/cov-lineages/pango-designation/issues/1100#issuecomment-1426502678 representative: Philippines/PH-VUI-142736/2022 reversions: [ G5584A, T13019C, T22329C, T25000C, C27718T, T28271A ] + +XCK: + # Inherits from XBB.1.5 + # @FedeGueli pointed out that the usher tree had a very flaky 29729. TL;DR mafft is counfounding + # two nearby deletions, a new 29726 and the old 29734-29759, and making a false subst by getting + # the deletion boundaries wrong. Mask 29729 here. + representative: USA/TX-CDC-QDX84451512/2023 + representativeBacktrack: 2 + sites: [ 29729 ]