526213b2893134217a300ff913e11b4e98d67991 max Mon Apr 20 08:50:10 2026 -0700 lrSv: add cpc1Sv and aprSv pangenome SV subtracks (hg38, hs1) cpc1Sv: 97,205 SVs from the CPC + HPRC Phase 1 pangenome (Gao et al 2023, Nature; PMID 37316654) built on T2T-CHM13v2, with 53 Chinese and 47 HPRC samples. Each graph snarl site is shown as one item with alt alleles classified by length delta (INS/DEL/CPX, 50 bp threshold) and collapsed. aprSv: 103,077 SVs from the Arabic Pangenome Reference (Nassir et al. 2025, Nat Commun; PMID 40707445) built on T2T-CHM13v2 from 53 UAE-resident Arab individuals. Same multi-allele classification as cpc1Sv, with alt alleles iterated within each multi-allelic row. Both tracks load natively on hs1 and are lifted to hg38 with hs1ToHg38.over.chain.gz. refs #36258 diff --git src/hg/makeDb/trackDb/human/cpc1Sv.html src/hg/makeDb/trackDb/human/cpc1Sv.html new file mode 100644 index 00000000000..e3f5e3f01a7 --- /dev/null +++ src/hg/makeDb/trackDb/human/cpc1Sv.html @@ -0,0 +1,120 @@ +
+This track displays structural variants (SVs) — deletions, insertions, and +complex substitutions of at least 50 bp — identified by the Chinese +Pangenome Consortium (CPC) from a pangenome graph built from 58 core samples +representing 36 Chinese minority ethnic groups, jointly with 47 samples from +Phase 1 of the Human Pangenome Reference Consortium (HPRC). After +decomposition of the graph bubbles, each distinct graph site (snarl) is +displayed as one variant record, with genotypes aggregated across 105 +samples.
+ ++A pangenome is a graph that represents many genomes simultaneously, letting +variants that are missing from a single linear reference be captured and +typed directly. Because the CPC pangenome was built on the T2T-CHM13v2 +assembly, variants are shown natively on the hs1 browser and lifted to hg38 +using the UCSC hs1ToHg38.over.chain.gz chain. About 16% of the +97,205 hs1 sites did not lift over cleanly (usually in highly repetitive +regions added to T2T-CHM13).
+ +Items are colored by SV type:
++Each bed item spans from the start of the REF allele to its end on the +reference. Pure insertions (where REF is a single base) therefore appear +as narrow single-base marks; DELs and CPX items span the affected reference +interval.
+ ++The name field is the graph snarl ID (two node identifiers separated +by strand arrows, e.g. >2541>2547). It is stable across the +graph but has no meaning outside the CPC pangenome graph file.
+ ++The source VCF was decomposed with bcftools norm -m -any, so each +graph snarl appears as one VCF row per alternative allele (a single +bubble in the graph may have 2-20+ alt paths). For display, all alternative +alleles sharing the same snarl ID are collapsed into one track item:
+Available filters:
++The CPC assemblies were produced from PacBio HiFi long-read sequencing +(mean ~30× coverage) with hifiasm +in trio or Hi-C-phased mode, then combined with HPRC Phase 1 assemblies and +built into a variation graph with pggb/Minigraph-Cactus. +Bubbles in the graph were decomposed into variant records with +vcfwave, +producing the source VCF used here. For this UCSC track, the decomposed +VCF was parsed, filtered to variants with an allele-length delta of at +least 50 bp, and collapsed by graph snarl ID (see the build documentation +linked below for details).
+ +The data can be explored interactively with the +Table Browser or +Data Integrator, and accessed from +scripts via our API +(track=cpc1Sv).
+ +For automated download, the bigBed files are at + +http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/cpc1.bb (native) and + +http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/cpc1.bb (lifted). +Use bigBedToBed to extract features: e.g. +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/cpc1.bb -chrom=chr21 -start=0 -end=100000000 stdout
+ +The original pangenome VCF is distributed by the Chinese Pangenome +Consortium; see the + +CPC Phase I repository.
+ +Thanks to the Chinese Pangenome Consortium and the HPRC Phase 1 team +for producing and releasing the combined pangenome and its decomposed +variant calls.
+ ++Gao Y, Yang X, Chen H, Tan X, Yang Z, Deng L, Wang B, Kong S, Li S, Cui Y et al. + +A pangenome reference of 36 Chinese populations. +Nature. 2023 Jul;619(7968):112-121. +PMID: 37316654; PMC: PMC10322713 +
+