54452ec022a6073410955c04e110a1784f71fb57 angie Wed Nov 13 17:37:34 2019 -0800 dbSnp153: add new ucscNote otherMapErr for mappings with the same rs# as a mapping w/inconsistent SPDI in BadCoords/Map Err subtrack. refs #23283 diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html index b5f89b0..a2dba17 100644 --- src/hg/makeDb/trackDb/human/dbSnp153Composite.html +++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html @@ -40,33 +40,36 @@ There are some exceptions in which a variant is mapped to more than one reference sequence, but not culled into this set: <ul> <li>A variant may appear in both X and Y pseudo-autosomal regions (PARs) without being included in this set. </li> <li>A variant may also appear in a main chromosome as well as an alternate haplotype or fix patch sequence assigned to that chromosome. </li> </ul> </li> </ul> </p> <p> A fifth subtrack highlights coordinate ranges to which dbSNP mapped a variant but with genomic -coordinates that are not self-consistent, i.e. different coordinate ranges were provided when describing different alleles, which can occur due to a bug with mapping variants from one assembly sequence to another when there is an indel difference between the assembly sequences: +coordinates that are not internally consistent, i.e. different coordinate ranges were provided +when describing different alleles. This can occur due to a bug with mapping variants from one +assembly sequence to another when there is an indel difference between the assembly sequences: <ul> - <li><b>Map Err (153)</b>: around 120 thousand for hg19 and 149 thousand for hg38. + <li><b>Map Err (153)</b>: around 120 thousand mappings of 55 thousand distinct rs IDs for hg19 + and 149 thousand mappings of 86 thousand distinct rs IDs for hg38. </ul> </p> <h2>Interpreting and Configuring the Graphical Display</h2> <p> SNVs and pure deletions are displayed as boxes covering the affected base(s). Pure insertions are drawn as single-pixel tickmarks between the base before and the base after the insertion. </p><p> Insertions and/or deletions in repetitive regions may be represented by a half-height box showing uncertainty in placement, followed by a full-height box showing the number of deleted bases, or a full-height tickmark to indicate an insertion. When an insertion or deletion falls in a repetitive region, the placement may be ambiguous. For example, if the reference genome contains "TAAAG" but some individuals have "TAAG" at the same location, then the variant is a deletion of a single @@ -351,30 +354,41 @@ contains an IUPAC ambiguous base.</td> </tr> <tr> <td>freqNotRefAlt</td> <td class="number">17684</td> <td class="number">32150</td> <td>At least one allele reported by at least one project that reports frequencies does not match any of the reference or alternate alleles listed by dbSNP.</td> </tr> <tr> <td>multiMap</td> <td class="number">562157</td> <td class="number">132051</td> <td>This variant has been mapped to more than one distinct genomic location.</td> </tr> + <tr> + <td>otherMapErr</td> + <td class="number">113416</td> + <td class="number">203580</td> + <td>At least one other mapping of this variant has erroneous coordinates. + The mapping(s) with erroneous coordinates are excluded from this track + and are included in the Map Err subtrack. Sometimes despite this mapping + having legal coordinates, there may still be an issue with this mapping's + coordinates and alleles; you may want to click through to dbSNP to compare + the initial submission's coordinates and alleles. + </tr> </table> <h2>Data Sources and Methods</h2> <p> dbSNP has collected genetic variant reports from researchers worldwide for <a href="https://ncbiinsights.ncbi.nlm.nih.gov/2019/10/07/dbsnp-celebrates-20-years/" target=_blank>over 20 years</a>. Since the advent of next-generation sequencing methods and the population sequencing efforts that they enable, dbSNP has grown exponentially, requiring a new data schema, computational pipeline, web infrastructure, and download files. (Holmes <em>et al.</em>) The same challenges of exponential growth affected UCSC's presentation of dbSNP variants, so we have taken the opportunity to change our internal representation and import pipeline. Most notably, flanking sequences are no longer provided by dbSNP,