bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/cpc1Sv.html src/hg/makeDb/trackDb/human/cpc1Sv.html index 470d401eb9d..167e890b81d 100644 --- src/hg/makeDb/trackDb/human/cpc1Sv.html +++ src/hg/makeDb/trackDb/human/cpc1Sv.html @@ -63,40 +63,66 @@
Available filters:
-The CPC assemblies were produced from PacBio HiFi long-read sequencing -(mean ~30× coverage) with hifiasm -in trio or Hi-C-phased mode, then combined with HPRC Phase 1 assemblies and -built into a variation graph with pggb/Minigraph-Cactus. -Bubbles in the graph were decomposed into variant records with -vcfwave, -producing the source VCF used here. For this UCSC track, the decomposed -VCF was parsed, filtered to variants with an allele-length delta of at -least 50 bp, and collapsed by graph snarl ID (see the build documentation -linked below for details).
+Gao et al. 2023 generated PacBio HiFi long reads (mean ~30.65x, +Sequel II/IIe platforms) for 58 QC-passed samples representing 36 +minority Chinese ethnic groups, complemented with Illumina short reads +and Oxford Nanopore ultralong reads. Haplotype-phased de novo assemblies +were produced with +hifiasm +v0.16.1 (116 high-quality haplotype assemblies retained after QC) and +combined with 47 HPRC Phase 1 assemblies into a single variation graph +built on T2T-CHM13v2 with the Minigraph-Cactus pipeline (Minigraph v0.19 +for the SV skeleton, Cactus v2.1.1 base alignment, hal2vg). +Graph bubbles were decomposed into variant records with vcfwave +and normalized with bcftools norm -m -any, yielding the source +VCF (CPC.HPRC.Phase1.processed.SVs.normed.vcf.gz). The upstream +Gao et al. release identified 78,072 SVs across the combined 105-sample +graph. For this track we restrict to the 58 CPC samples (columns matching +HIFI032* or RY*), recompute AC/AN/NS from those columns +only, drop snarls with no CPC carrier (HPRC-specific sites), filter to +alts with ≥50 bp REF/ALT length difference, and collapse by graph snarl +ID. The final track contains 46,092 snarl sites on hs1; the hg38 version +is lifted with the UCSC hs1ToHg38.over.chain.gz chain (36,030 +sites, 10,062 did not lift). + ++The source VCF is distributed by the + +Chinese-Pangenome-Consortium-Phase-I GitHub repository.
+ ++The step-by-step build commands (CPC-only recount, liftOver, snarl +collapse, bigBed build) are recorded in the UCSC makeDoc for this track +container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv. +
The data can be explored interactively with the Table Browser or Data Integrator, and accessed from scripts via our API (track=cpc1Sv).
For automated download, the bigBed files are at http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/cpc1.bb (native) and http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/cpc1.bb (lifted). Use bigBedToBed to extract features: e.g.