2e0addd016cfcbf61485b90d8980a8d75be622c2 lrnassar Sun Jun 14 00:10:06 2026 -0700 lrSv: sync description-page counts to the deduped data; drop Kim PD from the supertrack page. refs #36258 After the QA dedup, update the SV counts cited on the description pages to the unique (post-dedup) totals for the tracks served, while leaving the upstream release/paper counts in the Methods sections: decodeSv 133,886 -> 119,453 displayed gustafsonSv 113,696 -> 113,159 displayed chirmade101 87,183 -> 87,068 displayed aou1k 541,049 -> 540,155 displayed hprc2v21Sv 596,063 -> 549,649 (hg38) and 608,435 -> 541,176 (hs1), throughout (no upstream publication), incl. recomputed nested-snarl counts lrSv.html: update the Available Datasets table count cells to match, set the lrSvAll merged cell to 2,317,508 (post Kim PD removal), and remove the Kim PD Brain row, blurb and reference from the supertrack page (the track is staged on dev/alpha only, kept out of the merge and the description, and is not released). diff --git src/hg/makeDb/trackDb/human/hprc2v21Sv.html src/hg/makeDb/trackDb/human/hprc2v21Sv.html index 9d3955f33c8..196796a19a3 100644 --- src/hg/makeDb/trackDb/human/hprc2v21Sv.html +++ src/hg/makeDb/trackDb/human/hprc2v21Sv.html @@ -1,31 +1,31 @@

Description

A pangenome graph holds many human genomes at once. Sequence that the genomes share collapses onto common paths, and the places where they differ show up as bubbles in the graph. This track shows the structural variants found in version 2.1 of the Human Pangenome Reference Consortium (HPRC) minigraph-cactus graph, which was built from haplotype-resolved PacBio HiFi assemblies of 233 samples. Only larger events are shown here: insertions and deletions of at least 50 bp. HPRC produces one variant file per reference path, so the events are measured against GRCh38 on hg38 and against T2T-CHM13 on hs1, and each assembly shows its own native callset.

-On hg38 there are about 596,000 such alleles (roughly 448,000 insertions and -148,000 deletions). On hs1 there are about 608,000 (roughly 363,000 -insertions and 245,000 deletions). The two sets are not lifted between +On hg38 there are about 550,000 such alleles (roughly 422,000 insertions and +128,000 deletions). On hs1 there are about 541,000 (roughly 348,000 +insertions and 193,000 deletions). The two sets are not lifted between assemblies; the counts differ because an insertion against one reference can be a deletion against the other.

Display Conventions and Configuration

Items are colored by SV type:

  Insertion (INS)
  Deletion (DEL)

@@ -49,32 +49,33 @@ alignments_v2.0.csv.

We started from the per-reference files provided by the HPRC graph team, hprc-v2.1-mc-grch38.gref95.ro.vcf.gz for hg38 and hprc-v2.1-mc-chm13.gref95.ro.vcf.gz for hs1. These are the raw vg deconstruct output: each graph bubble is one multi-allelic record with its graph traversals attached, and there are no per-allele type or length fields. To turn a file into a track, we compared every alternate allele to the reference allele after trimming the sequence they share at each end. An allele was kept when the net length change was at least 50 bp, and labeled an insertion when the alternate is longer or a deletion when it is shorter. At this size no balanced, equal-length substitutions came up, and the files carry no inversion calls, so the track has only insertions and -deletions. On hg38, 596,063 alleles were kept (43,580 at nested snarl -levels); on hs1, 608,435 (75,809 nested). Because these files are not broken +deletions. On hg38, 549,649 alleles were kept (40,678 at nested snarl +levels); on hs1, 541,176 (70,200 nested), after removing byte-identical +duplicate records. Because these files are not broken down into atomic indels, one bubble can appear as a single large allele rather than several small ones, so the counts are not comparable to a wave-decomposed callset. Allele counts, frequencies and sample counts come straight from the VCF.

The conversion script and autoSql schema are in makeDb/scripts/lrSv and the build steps are in the makeDoc at doc/hg38/lrSv.txt.

Data Access