3d835ba54509284ec08b63dc527bd114c9467afc angie Tue Nov 12 10:55:41 2019 -0800 Update counts of diffMajor, refIs{Minor,Rare,Singleton}, freqNotRefAlt after fixing VCF allele normalization in dbSnpJsonToTab. refs #23283 diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html index 9c771d1..4df4a35 100644 --- src/hg/makeDb/trackDb/human/dbSnp153Composite.html +++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html @@ -216,32 +216,32 @@ commonAll 12178426 12430253 Variant is "common", i.e. has a Minor Allele Frequency of at least 1% in all projects reporting frequencies. commonSome 20534330 20893174 Variant is "common", i.e. has a Minor Allele Frequency of at least 1% in some, but not all, projects reporting frequencies. diffMajor - 1378125 - 1399317 + 1377402 + 1398591 Different frequency sources have different major alleles. overlapDiffClass 106940656 109838613 This variant overlaps another variant with a different type/class. overlapSameClass 16890303 17228657 This variant overlaps another with the same type/class but different start/end. @@ -271,44 +271,44 @@

while others may indicate that the reference genome contains a rare variant or sequencing issue:

- - + + - - + + - - + +
keyword in data file (dbSnp153.bb) # in hg19# in hg38description
refIsAmbiguous 101 111 The reference genome allele contains an IUPAC ambiguous base (e.g. 'R' for 'A or G', or 'N' for 'any base').
refIsMinor3277722336478832694513356557 The reference genome allele is not the major allele in at least one project.
refIsRare142937166192135265158562 The reference genome allele is rare (i.e. allele frequency < 1%).
refIsSingleton44382564913670948859 The reference genome allele has never been observed in a population sequencing project reporting frequencies.
refMismatch 4 33 The reference genome allele reported by dbSNP differs from the GenBank assembly sequence. This is very rare and in all cases observed so far, the GenBank assembly has an 'N' while the RefSeq assembly used by dbSNP has a less ambiguous character such as 'R'.

and others may indicate an anomaly or problem with the variant data:

@@ -333,32 +333,32 @@ clusterError 113678 126973 This variant has the same start, end and class as another variant; they probably should have been merged into one variant. freqIsAmbiguous 7649 7749 At least one allele reported by at least one project that reports frequencies contains an IUPAC ambiguous base. freqNotRefAlt - 25413 - 39038 + 16950 + 30615 At least one allele reported by at least one project that reports frequencies does not match any of the reference or alternate alleles listed by dbSNP. multiMap 561309 132015 This variant has been mapped to more than one distinct genomic location.

Data Sources and Methods

dbSNP has collected genetic variant reports from researchers worldwide for