2e0addd016cfcbf61485b90d8980a8d75be622c2
lrnassar
  Sun Jun 14 00:10:06 2026 -0700
lrSv: sync description-page counts to the deduped data; drop Kim PD from the supertrack page. refs #36258

After the QA dedup, update the SV counts cited on the description pages to the
unique (post-dedup) totals for the tracks served, while leaving the upstream
release/paper counts in the Methods sections:
decodeSv     133,886 -> 119,453 displayed
gustafsonSv  113,696 -> 113,159 displayed
chirmade101  87,183  -> 87,068  displayed
aou1k        541,049 -> 540,155 displayed
hprc2v21Sv   596,063 -> 549,649 (hg38) and 608,435 -> 541,176 (hs1), throughout
(no upstream publication), incl. recomputed nested-snarl counts
lrSv.html: update the Available Datasets table count cells to match, set the
lrSvAll merged cell to 2,317,508 (post Kim PD removal), and remove the Kim PD
Brain row, blurb and reference from the supertrack page (the track is staged on
dev/alpha only, kept out of the merge and the description, and is not released).

diff --git src/hg/makeDb/trackDb/human/decodeSv.html src/hg/makeDb/trackDb/human/decodeSv.html
index e5940527f8c..21dddbcebb1 100644
--- src/hg/makeDb/trackDb/human/decodeSv.html
+++ src/hg/makeDb/trackDb/human/decodeSv.html
@@ -1,112 +1,113 @@
 <h2>Description</h2>
 <p>
 This track shows high-confidence structural variants (SVs) identified by
 Oxford Nanopore long-read sequencing of 3,622 Icelanders recruited through
 the deCODE genetics population cohort. The release contains 133,886 SVs
 (55,649 deletions, 75,050 insertions and 3,187 combined insertion/deletion
 events). Variants are site-level (no per-sample genotypes) and have been
 filtered to a high-confidence subset validated in the accompanying
 population-scale analysis.
 </p>
 <p>
 Note that this release does not include allele counts or allele frequencies:
 each row represents a site that was called with high confidence in the
 cohort, but the number of carrier samples is not provided, so the track
 cannot be filtered by AF/AC.
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Items are colored by SV type:
 <ul>
 <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
 <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
 <li><span style="color: rgb(140,0,200);">Combined insertion/deletion (INSDEL)</span> - purple</li>
 </ul>
 </p>
 <p>
 Insertions are placed at the insertion site with a width of 1 bp; deletions
 span the deleted interval; INSDEL events span the affected reference region
 and have SVLEN=0 because the reference and alternate alleles differ in both
 sequence and length. Filters are available for SV type and SV length.
 </p>
 <p>
 Where a variant falls inside an annotated tandem-repeat region, the detail
 page also shows the coordinates of that region (TRRBEGIN / TRREND from the
 source VCF), which can be useful context for repeat-mediated insertions and
 deletions.
 </p>
 
 <h2>Methods</h2>
 <p>
 Beyter et al. 2021 performed Oxford Nanopore long-read sequencing of 3,622
 Icelanders recruited through deCODE genetics and detected a median of
 22,636 SVs per individual (13,353 insertions and 9,474 deletions). Across
 the cohort they derived a set of 133,886 reliably genotyped SV alleles,
 imputed those alleles into 166,281 chip-typed Icelanders, and tested them
 for association with disease and quantitative traits (notably including a
 rare <i>PCSK9</i> deletion associated with lower LDL-cholesterol and a
 multi-allelic 57-bp VNTR in <i>ACAN</i> associated with adult height). The
-track shown here displays the 133,886 high-confidence SV sites: 55,649
-deletions, 75,050 insertions and 3,187 combined insertion/deletion events.
+track shown here displays 119,453 unique high-confidence SV sites (exact-duplicate
+records present in the release have been collapsed): 41,216 deletions, 75,050
+insertions and 3,187 combined insertion/deletion events.
 The release is site-only (no per-sample genotypes or allele frequencies),
 so the track cannot be filtered by AF/AC.
 </p>
 <p>
 The VCF <tt>ont_sv_high_confidence_SVs.sorted.vcf.gz</tt> was downloaded
 from the deCODE genetics
 <a href="https://github.com/DecodeGenetics/LRS_SV_sets" target="_blank">
 LRS_SV_sets</a> GitHub repository.
 </p>
 <p>
 The step-by-step build commands (download, format conversion, bigBed build)
 are recorded in the UCSC makeDoc for this track container:
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
 doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
 <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
 makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively in table format with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from there
 to spreadsheet or tab-sep tables. From scripts, the data can be accessed
 through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>decodeSv</i>.
 </p>
 <p>
 The annotation is stored as a bigBed file that can be downloaded from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
 download server</a> as <tt>decodeSv.bb</tt>. Individual regions or the whole
 annotation can be obtained with the <tt>bigBedToBed</tt> utility, available
 from our
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">utilities
 page</a>. Example:
 <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/decodeSv.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
 </p>
 <p>
 The original VCF is available from the deCODE genetics
 <a href="https://github.com/DecodeGenetics/LRS_SV_sets" target="_blank">LRS_SV_sets</a>
 GitHub repository.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to the deCODE genetics team and the Icelandic study participants for
 making this dataset publicly available.
 </p>
 
 <h2>References</h2>
 
 
 <p>
 Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA,
 Kristmundsdottir S, Mehringer S, Hardarson MT <em>et al</em>.
 <a href="https://doi.org/10.1038/s41588-021-00865-4" target="_blank">
 Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in
 human diseases and other traits</a>.
 <em>Nat Genet</em>. 2021 Jun;53(6):779-786.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33972781" target="_blank">33972781</a>
 </p>