6b0d68657267f1e02c47d4224ea62446bbbb2ba0 max Fri May 22 06:55:52 2026 -0700 small non-AI changes to the html docs pages of the long-read SV tracks diff --git src/hg/makeDb/trackDb/human/dbVarNr.html src/hg/makeDb/trackDb/human/dbVarNr.html new file mode 100644 index 00000000000..7e45efdd262 --- /dev/null +++ src/hg/makeDb/trackDb/human/dbVarNr.html @@ -0,0 +1,175 @@ +<h2>Description</h2> + +<p> +This track shows the full <b>non-redundant (NR) structural variant catalog</b> +curated by <a href="https://www.ncbi.nlm.nih.gov/dbvar/" target="_blank">NCBI +dbVar</a>: deletions, duplications, and insertions aggregated across more than +150 studies (e.g. 1000 Genomes Phase 3, Simons Genome Diversity Project, +ClinGen, ClinVar) into a single consolidated set per variant type. In the +source release, each type (DEL, DUP, INS) is distributed separately; for this +track all three are merged into one bigBed so they can be filtered and +browsed together. As of the current build there are ~4.6 million records +(2.3M deletions, 0.6M duplications, 1.7M insertions).</p> + +<p> +Each record represents a <i>unique genomic placement</i>. When multiple +submitted structural variants (ssv/nsv) have the same coordinates on the +reference, dbVar collapses them into one NR record and the record's +<i>variantCount</i> field counts how many were merged. Only exact +coordinate matches are collapsed; partial overlaps keep separate rows.</p> + +<h3>What merges into each type</h3> + +<ul> + <li><b>Deletions</b>: alu_deletion, copy_number_loss, deletion, + herv_deletion, line1_deletion, sva_deletion (and a small number of + copy_number_variation entries annotated as loss).</li> + <li><b>Duplications</b>: copy_number_gain, copy_number_variation, + duplication, tandem_duplication.</li> + <li><b>Insertions</b>: alu_insertion, insertion, line1_insertion, + mobile_element_insertion, novel_sequence_insertion, sva_insertion.</li> +</ul> + +<h3>Subsets</h3> + +<p> +dbVar ships three overlapping, clinically-oriented subsets of each NR +catalog, and each record here is tagged with its memberships via the +<i>subsets</i> field:</p> +<ul> + <li><b>common</b>: variant has population-frequency evidence in at + least one submitting study (i.e. it is observed in healthy + individuals at appreciable frequency).</li> + <li><b>pathogenic</b>: variant is backed by at least one ClinVar + submission asserting it to be disease-causing.</li> + <li><b>somatic</b>: variant is of somatic (typically tumor) origin, + not germline.</li> +</ul> +<p> +Most NR records are neither common nor curated as pathogenic/somatic; +their <i>subsets</i> field is empty. A record can belong to multiple +subsets simultaneously (e.g. both <i>common</i> and <i>pathogenic</i>) +when different studies contribute different calls at the same +placement.</p> + +<h3>Length fields and bin sizes</h3> + +<p>Each record carries two numeric length fields:</p> +<ul> + <li><b>svLen</b> – reference-span length in base pairs + (<tt>chromEnd - chromStart</tt>). For DEL/DUP this is the + deleted or duplicated region size; for INS it is typically 1 bp + (the breakpoint anchor).</li> + <li><b>insLen</b> – for INS only, the length of the inserted + sequence (dbVar's <tt>max_insertion_length</tt>; equal to + <tt>min_insertion_length</tt> when only one variant is merged). + Always 0 for DEL/DUP.</li> +</ul> + +<p> +On top of <i>svLen</i>, dbVar also pre-bins each record into one of +three reference-span buckets stored in the <i>binSize</i> column:</p> +<ul> + <li><b>small</b>: < 50 bp</li> + <li><b>medium</b>: 50 bp to < 1 Mb</li> + <li><b>large</b>: ≥ 1 Mb</li> +</ul> +<p> +Use the numeric <i>svLen</i> filter for arbitrary length cutoffs and +the categorical <i>binSize</i> filter for the standard buckets. The +bed <i>score</i> is derived from <i>binSize</i> +(small = 100, medium = 500, large = 1000) so dense-mode shading +emphasises larger events.</p> + +<h2>Display conventions</h2> + +<p>Items are colored by SV type:</p> +<ul> + <li><span style="background-color:rgb(220,50,32);color:white;padding:1px 6px">DEL</span> deletion</li> + <li><span style="background-color:rgb(0,120,200);color:white;padding:1px 6px">DUP</span> duplication</li> + <li><span style="background-color:rgb(0,160,0);color:white;padding:1px 6px">INS</span> insertion (drawn at the reference insertion site)</li> +</ul> + +<p>The item label is the first dbVar variant ID for the record (an +<tt>nssv*</tt>, <tt>nsv*</tt>, or <tt>essv*</tt> accession). When a +placement merges multiple IDs, the full list is stored in the +<i>variants</i> field on the details page and linked to the dbVar +variant page. Similarly, when an NR record aggregates calls from +multiple studies/methods/platforms, those columns are +semicolon-separated lists.</p> + +<h2>Filters</h2> + +<p>The track configuration page exposes these filters:</p> +<ul> + <li><b>SV Type</b> – multi-select among DEL / DUP / INS. Hides + variants of unchecked types.</li> + <li><b>SV Length on reference</b> (<i>svLen</i>) – numeric + range filter on <tt>chromEnd - chromStart</tt> in bp. For INS the + reference span is 1 bp; use <i>insLen</i> for the actual + inserted-sequence length.</li> + <li><b>Insertion sequence length</b> (<i>insLen</i>) – numeric + range, INS only.</li> + <li><b>Bin size</b> – multi-select among small / medium / large + as defined above. Useful for hiding sub-50bp indels or + restricting to mega-base events.</li> + <li><b>Subset membership</b> – multi-select among common, + pathogenic, somatic. By default all three are selected, which + preserves every record (including records with empty subsets); + unchecking one still shows untagged records but hides variants + that carry <i>only</i> the unchecked flag. To see only the + pathogenic subset, uncheck the other two <b>and</b> also filter + out the majority untagged records by a per-record filter such as + <i>variantCount</i>.</li> + <li><b>Merged subvariants</b> (<i>variantCount</i>) – numeric + range filter on the number of submitted SVs that share this + exact placement. <tt>variantCount = 1</tt> means the NR record + has a single submitter match (the most common case); higher + values indicate the placement has been reported many times across + studies. Increase the minimum to focus on well-replicated calls.</li> +</ul> + +<h2>Data Access</h2> + +<p> +The data can be explored interactively in table format with the +<a href="../cgi-bin/hgTables">Table Browser</a> or the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed +programmatically through our <a href="https://api.genome.ucsc.edu">API</a>, +track=<i>dbVarNr</i>.</p> + +<p>The bigBed is available from our download server at +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/dbVar/nr.bb" target="_blank"> +hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/dbVar/nr.bb</a>. The upstream source +TSV / BED / BEDPE files (released monthly) are available from the +<a href="https://github.com/ncbi/dbvar/tree/master/Structural_Variant_Sets/Nonredundant_Structural_Variants" target="_blank"> +NCBI dbVar GitHub repository</a> and the +<a href="https://ftp.ncbi.nlm.nih.gov/pub/dbVar/sandbox/sv_datasets/nonredundant/" target="_blank"> +dbVar FTP site</a>.</p> + +<h2>Credits</h2> + +<p> +Thanks to the NCBI dbVar team for curating, merging, and releasing +the non-redundant structural-variant datasets on a monthly cadence.</p> + +<h2>References</h2> + +<p> +Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J, +Chen C, Maguire M, Corbett M, Zhou G, Paschall J, Ananiev V, Flicek P, +Church DM. +<a href="https://doi.org/10.1093/nar/gks1213" target="_blank"> +dbVar and DGVa: public archives for genomic structural variation</a>. +<em>Nucleic Acids Res</em>. 2013 Jan;41(Database issue):D936-D941. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/23193291" target="_blank">23193291</a></p> + +<p>NCBI dbVar: <i>Human Non-Redundant Reference Datasets to Help +Interpret Structural Variants</i>. NCBI Insights, 27 Sep 2018. +<a href="https://ncbiinsights.ncbi.nlm.nih.gov/2018/09/27/dbvar-human-nonredundant-reference-datasets-interpret-structural-variants/" target="_blank"> +ncbiinsights.ncbi.nlm.nih.gov</a>.</p> + +<p>Phan L, Jin Y, Zhang H, Qiang W, Shekhtman E, Shao D, <em>et al</em>. +<i>ALFA: Allele Frequency Aggregator</i>. In: +<a href="https://www.ncbi.nlm.nih.gov/books/NBK269031/" target="_blank"> +NCBI Handbook</a>.</p>