5e4ca58df1b5bfe554fe5cc3309a39736ca256ee
max
  Tue Apr 21 08:08:52 2026 -0700
cpc1Sv: restrict to the 58 CPC samples, drop HPRC-specific SVs

Rewrite lrSvCpc1VcfToBed.py to identify the 58 CPC sample columns by
name prefix (HIFI032* or RY*), recompute AC/AN/NS from those GT
columns only, and skip any snarl that no CPC sample carries. The
HPRC portion is already represented elsewhere in lrSv, so this keeps
the track population-consistent with its label.

Rebuild results: 46,092 snarl sites on hs1 (down from 97,205 when
combined with HPRC), 36,030 lifted to hg38 (down from 81,261;
10,062 unmapped). Updates cpc1Sv.html, lrSv.ra labels, and the
makeDoc.

refs #36258

diff --git src/hg/makeDb/trackDb/human/cpc1Sv.html src/hg/makeDb/trackDb/human/cpc1Sv.html
index e3f5e3f01a7..470d401eb9d 100644
--- src/hg/makeDb/trackDb/human/cpc1Sv.html
+++ src/hg/makeDb/trackDb/human/cpc1Sv.html
@@ -1,64 +1,71 @@
 <h2>Description</h2>
 
 <p>
-This track displays structural variants (SVs) — deletions, insertions, and
-complex substitutions of at least 50 bp — identified by the Chinese
-Pangenome Consortium (CPC) from a pangenome graph built from 58 core samples
-representing 36 Chinese minority ethnic groups, jointly with 47 samples from
-Phase 1 of the Human Pangenome Reference Consortium (HPRC). After
-decomposition of the graph bubbles, each distinct graph site (snarl) is
-displayed as one variant record, with genotypes aggregated across 105
-samples.</p>
+This track displays structural variants (SVs) &mdash; deletions, insertions,
+and complex substitutions of at least 50 bp &mdash; identified by the
+Chinese Pangenome Consortium (CPC) in 58 samples representing 36 Chinese
+minority ethnic groups.</p>
+
+<p>
+The upstream release combined the 58 CPC samples with 47 samples from
+Phase 1 of the Human Pangenome Reference Consortium (HPRC) into a single
+pangenome graph built on the T2T-CHM13v2 assembly with Minigraph-Cactus.
+For this track we recomputed allele counts (AC), allele numbers (AN) and
+sample counts (NS) using only the 58 CPC sample columns (those with
+<tt>HIFI032*</tt> or <tt>RY*</tt> prefixes in the source VCF) and dropped
+all snarls that no CPC sample carries (HPRC-specific SVs). To see the
+HPRC data on its own, use the HPRC SV tracks elsewhere in this collection.</p>
 
 <p>
 A pangenome is a graph that represents many genomes simultaneously, letting
 variants that are missing from a single linear reference be captured and
-typed directly. Because the CPC pangenome was built on the T2T-CHM13v2
-assembly, variants are shown natively on the hs1 browser and lifted to hg38
-using the UCSC <tt>hs1ToHg38.over.chain.gz</tt> chain. About 16% of the
-97,205 hs1 sites did not lift over cleanly (usually in highly repetitive
-regions added to T2T-CHM13).</p>
+typed directly. Variants are shown natively on the hs1 browser and lifted
+to hg38 using the UCSC <tt>hs1ToHg38.over.chain.gz</tt> chain. The track
+contains 46,092 snarl sites on hs1 and 36,030 lifted to hg38 (10,062 did
+not lift, typically in T2T-added repetitive regions).</p>
 
 <h2>Display conventions</h2>
 
 <p>Items are colored by SV type:</p>
 <ul>
   <li><span style="background-color:rgb(0,0,200);color:white;padding:1px 6px">INS</span> insertion (net ALT longer by &ge;50 bp)</li>
   <li><span style="background-color:rgb(200,0,0);color:white;padding:1px 6px">DEL</span> deletion (net REF longer by &ge;50 bp)</li>
   <li><span style="background-color:rgb(230,140,0);color:white;padding:1px 6px">CPX</span> complex substitution (similar-length REF and ALT but at least one &ge;50 bp)</li>
   <li><span style="background-color:rgb(120,120,120);color:white;padding:1px 6px">MIXED</span> snarl whose collapsed alt alleles belong to different classes</li>
 </ul>
 
 <p>
 Each bed item spans from the start of the REF allele to its end on the
 reference. Pure insertions (where REF is a single base) therefore appear
 as narrow single-base marks; DELs and CPX items span the affected reference
 interval.</p>
 
 <p>
 The <i>name</i> field is the graph snarl ID (two node identifiers separated
 by strand arrows, e.g. <tt>&gt;2541&gt;2547</tt>). It is stable across the
 graph but has no meaning outside the CPC pangenome graph file.</p>
 
 <h2>Collapsing of multi-allelic sites</h2>
 
 <p>
 The source VCF was decomposed with <tt>bcftools norm -m -any</tt>, so each
 graph snarl appears as one VCF row per alternative allele (a single
-bubble in the graph may have 2-20+ alt paths). For display, all alternative
-alleles sharing the same snarl ID are collapsed into one track item:</p>
+bubble in the graph may have 2-20+ alt paths). For this track we first
+compute the CPC-only allele count per alt, drop any alt that no CPC sample
+carries, then collapse all remaining alts sharing the same snarl ID into
+one track item:</p>
 <ul>
   <li><b>SV type</b> is the common class of all alts, or <tt>MIXED</tt> if
       they disagree (for example one alt is a DEL and another is an INS).</li>
   <li><b>SV length</b> is the maximum |len(ALT) − len(REF)| across alts.</li>
   <li><b>Allele count</b> is the sum of the per-alt allele counts.</li>
   <li><b>Number of alts</b> records how many alternative alleles were merged.</li>
 </ul>
 
 <h2>Filters</h2>
 
 <p>Available filters:</p>
 <ul>
   <li><b>SV type</b> — any combination of INS, DEL, CPX, MIXED.</li>
   <li><b>SV length</b> — maximum allele-length difference.</li>
   <li><b>Allele frequency</b> and <b>allele count</b> across the combined