6b0d68657267f1e02c47d4224ea62446bbbb2ba0 max Fri May 22 06:55:52 2026 -0700 small non-AI changes to the html docs pages of the long-read SV tracks diff --git src/hg/makeDb/trackDb/human/cpc1Sv.html src/hg/makeDb/trackDb/human/cpc1Sv.html index 167e890b81d..b5d40795e95 100644 --- src/hg/makeDb/trackDb/human/cpc1Sv.html +++ src/hg/makeDb/trackDb/human/cpc1Sv.html @@ -1,85 +1,85 @@

Description

-This track displays structural variants (SVs) — deletions, insertions, -and complex substitutions of at least 50 bp — identified by the +This track displays structural variants (SVs) at least 50 bp long +(deletions, insertions, and complex substitutions) identified by the Chinese Pangenome Consortium (CPC) in 58 samples representing 36 Chinese minority ethnic groups.

The upstream release combined the 58 CPC samples with 47 samples from Phase 1 of the Human Pangenome Reference Consortium (HPRC) into a single pangenome graph built on the T2T-CHM13v2 assembly with Minigraph-Cactus. For this track we recomputed allele counts (AC), allele numbers (AN) and sample counts (NS) using only the 58 CPC sample columns (those with HIFI032* or RY* prefixes in the source VCF) and dropped all snarls that no CPC sample carries (HPRC-specific SVs). To see the HPRC data on its own, use the HPRC SV tracks elsewhere in this collection.

A pangenome is a graph that represents many genomes simultaneously, letting variants that are missing from a single linear reference be captured and typed directly. Variants are shown natively on the hs1 browser and lifted to hg38 using the UCSC hs1ToHg38.over.chain.gz chain. The track contains 46,092 snarl sites on hs1 and 36,030 lifted to hg38 (10,062 did not lift, typically in T2T-added repetitive regions).

-

Display conventions

+

Display Conventions and Configuration

Items are colored by SV type:

Each bed item spans from the start of the REF allele to its end on the reference. Pure insertions (where REF is a single base) therefore appear as narrow single-base marks; DELs and CPX items span the affected reference interval.

The name field is the graph snarl ID (two node identifiers separated by strand arrows, e.g. >2541>2547). It is stable across the graph but has no meaning outside the CPC pangenome graph file.

-

Collapsing of multi-allelic sites

+

Collapsing of Multi-allelic Sites

The source VCF was decomposed with bcftools norm -m -any, so each graph snarl appears as one VCF row per alternative allele (a single bubble in the graph may have 2-20+ alt paths). For this track we first compute the CPC-only allele count per alt, drop any alt that no CPC sample carries, then collapse all remaining alts sharing the same snarl ID into one track item:

Filters

Available filters:

Methods

Gao et al. 2023 generated PacBio HiFi long reads (mean ~30.65x, Sequel II/IIe platforms) for 58 QC-passed samples representing 36 minority Chinese ethnic groups, complemented with Illumina short reads and Oxford Nanopore ultralong reads. Haplotype-phased de novo assemblies were produced with hifiasm v0.16.1 (116 high-quality haplotype assemblies retained after QC) and combined with 47 HPRC Phase 1 assemblies into a single variation graph