f058c8fe4601b223ff47468eb3525c05ccd03850 max Wed Apr 22 09:17:17 2026 -0700 srSv: new short-read SV supertrack, split out of lrSv Move the three short-read SV/CNV subtracks (abelSv, onekg3202Sr, tommoJpCnv) out of the Long-read SV supertrack into a new sibling supertrack srSv (Short-read SVs), so the lrSv collection contains only long-read callsets. Filter fields (svType, svLen, insLen, AC) are mirrored at the srSv supertrack level to keep the UX parallel to lrSv. - trackDb: new human/srSv.ra with the three subtrack stanzas and updated /gbdb/$D/srSv/... bigDataUrls; corresponding stanzas removed from human/lrSv.ra. human/trackDb.ra now includes srSv.ra. Also a new human/srSv.html overview page; the SR rows and SR-specific paragraphs removed from human/lrSv.html. - Scripts: abelSv/{abelSv.as,vcfToBed.py,build.sh} and lrSv/ {lrSv1kg3202Sr*, lrSvTommoJpCnvVcfToBedGraph.py} moved to scripts/srSv/ with git mv (history preserved) and renamed to drop the "lrSv" prefix. Internal path references in abelSvBuild.sh and abelSvVcfToBed.py updated. - makeDoc: doc/hg38/abelSv.txt renamed to doc/hg38/srSv.txt and extended with the onekg3202Sr and tommoJpCnv sections moved from lrSv.txt. lrSv.txt leaves a pointer. - Data: /hive/data/genomes/hg38/bed/{abelSv,lrSv/onekg3202sr, lrSv/tommoJpCnv} moved to /hive/data/genomes/hg38/bed/srSv/*. /gbdb/hg38/lrSv/{onekg3202sr.bb,tommoJpCnv{Loss,Gain}.bw} and /gbdb/hg38/abelSv/ removed and re-linked under /gbdb/hg38/srSv/. refs #36258 diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html index 11b1e7ea0b0..311baf88969 100644 --- src/hg/makeDb/trackDb/human/lrSv.html +++ src/hg/makeDb/trackDb/human/lrSv.html @@ -3,33 +3,33 @@ This track collection contains structural variant (SV) calls derived from long-read sequencing studies. Structural variants are genomic rearrangements larger than ~50 bp, including deletions, insertions, duplications, inversions, and translocations. Long-read sequencing technologies can span repetitive regions and resolve complex rearrangements that are difficult to detect with short-read methods.

Available Datasets

SV length statistics (min / median / max) are computed from the svLen field of each track, in base pairs. Some tracks include sites with svLen=0 (complex events where the reference and alternate alleles differ in sequence but not in length).

-All subtracks below are long-read callsets, except the last two rows -(CCDG 17,795 and 1KG 3202, both Illumina short-read), which are -included as short-read comparators. +For short-read structural-variant comparators (CCDG 17,795, 1KG 3202, +ToMMo 48K CNV) see the companion +Short-read SVs supertrack.

@@ -68,37 +68,30 @@ - - - - - - - @@ -175,50 +168,30 @@ - - - - - - - - - - - - - - - - - - - -
Dataset N samples Cohort / disease Sequencing SVs Min Median Max
CoLoRSdb 1,427148,375 2 177 49,171
ToMMo Japanese 333 (111 trios) Japanese, general population ONT 74,201 51 162 99,980
ToMMo 48K CNV48,874Japanese, general population (short-read comparator for ToMMo long-read SVs)Illumina short-read (GATK CNV, 1 kb bins, shown as two bigWigs)~2M bins with CNV carriers; not comparable to per-SV counts above
AoU 1K 1,027 All of Us, self-identified Black/African American PacBio HiFi 541,049 50 152 9,998
GA4K 502 Children's Mercy, pediatric rare disease probands + families PacBio HiFi74,552 50 160 190,088,222
SVatalog 101 101 Long-read WGS cohort for GWAS LD fine-mapping (SickKids) long-read 87,183 4 160 1,321,484
CCDG 17,795 (short-read)17,795NHGRI CCDG + PAGE + SGDP (short-read comparator)Illumina short-read737,998-1-1217,985,413
1KG 3202 (short-read)3,2021000 Genomes expanded cohort (short-read comparator)Illumina short-read173,3661314154,807,729

CoLoRSdb SVs (colorsDbSv)

Structural variants from the Consortium of Long-Read Sequencing database (CoLoRSdb), from 1,427 PacBio HiFi long-read whole-genome sequences. 426,239 SVs (insertions, deletions, inversions) called with pbsv and merged with Jasmine, with allele frequencies, genotype counts and Hardy-Weinberg statistics across the cohort.

Han 945 SVs (han945Sv)

Structural variants from 945 Han Chinese individuals. 111,288 SVs (deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR. @@ -241,40 +214,30 @@ Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT). 161,332 SVs annotated with SVAN, classifying insertions and deletions by mechanism of origin (mobile elements, VNTRs, processed pseudogenes, etc.). Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver. This is a separate dataset from the 1KG ONT 100 (Gustafson et al.) track above; the 1,019 samples here do not overlap with the 100 samples in that release.

ToMMo Japanese SVs (tommoJpSv)

Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical Megabank (ToMMo). 74,201 SVs (deletions and insertions) with trio-based Mendelian error rates and allele frequencies.

-

ToMMo 48K CNV SR (tommoJpCnv) - short-read comparator

-

-Short-read CNV comparator for the ToMMo long-read SV track above. -Per-1 kb-bin copy-number carrier counts from short-read whole-genome -sequencing of 48,874 Japanese individuals (jMorp 48KJPN-CNV Frequency -Panel, release 20230828), called with GATK CNV germline workflows. -Shown as a multiWig overlay: red = samples with copy-number loss -(CN<2) per bin, green = samples with gain (CN>2) per bin. -

-

AoU 1K SVs (aou1kSv)

Structural variants from 1,027 individuals from the All of Us (AoU) Research Program, sequenced with PacBio HiFi long reads. 541,049 SVs (insertions and deletions) with population-specific allele frequencies, gene annotations, and clinical trait associations.

GA4K SVs (ga4kSv)

Structural variants from 502 probands and family members enrolled in the Genomic Answers for Kids (GA4K) pediatric rare-disease program at Children's Mercy Research Institute, sequenced with PacBio HiFi long reads. 115,554 replicated SVs (deletions, insertions, duplications, inversions) called with pbsv and merged with JASMINE. The matched GA4K small-variant callset (SNVs @@ -325,40 +288,30 @@ incidental Lewy body disease, and healthy controls) sequenced with PacBio HiFi long reads. 74,552 high-confidence SVs (deletions, insertions, duplications, inversions) with per-cohort allele frequencies and case-control carrier-rate differentials, from Kim et al. 2026.

SVatalog 101 SVs (chirmade101Sv)

Structural variants from 101 long-read whole-genome sequences released alongside the GWAS SVatalog tool (Chirmade et al. 2026). 87,183 SVs (deletions, insertions, duplications, inversions and complex events) annotated with gene overlaps, ClinGen / gnomAD constraint scores, OMIM / ClinVar / DGV / Decipher regional annotations.

-

1KG 3202 SVs (onekg3202Sr) - short-read comparator

-

-This is a short-read dataset, included for comparison only. -Structural variants from the expanded 1000 Genomes cohort of 3,202 -Illumina NovaSeq short-read whole genomes (Byrska-Bishop et al. 2022), -called with the GATK-SV / svtools pipeline. 173,366 SVs (DEL, INS, DUP, -INV, CPX, CNV, CTX) with per-superpopulation allele frequencies. Useful -for contrasting short-read vs. long-read SV breakpoints and for spotting -variants unique to long-read data. -

Data Access

Each subtrack has its own documentation page with details on how to download and intersect the underlying annotations.

References

Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y et al. Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility. Nat Commun. 2025 Feb 10;16(1):1494.