bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/hprc2Sv.html src/hg/makeDb/trackDb/human/hprc2Sv.html index ac1ba979642..11a002d914b 100644 --- src/hg/makeDb/trackDb/human/hprc2Sv.html +++ src/hg/makeDb/trackDb/human/hprc2Sv.html @@ -1,111 +1,118 @@ <h2>Description</h2> <p> This track shows structural variants (SVs) derived from the Human Pangenome Reference Consortium (HPRC) release-2 pangenome graph. The graph was built with minigraph-cactus from PacBio HiFi haplotype-resolved assemblies of 233 samples (including T2T-CHM13 and the diverse 1000 Genomes Project sample set). HPRC releases one VCF per reference path (GRCh38 and T2T-CHM13); we display both natively on the corresponding UCSC assembly (hg38 and hs1). Variants were extracted from the graph with <tt>vg deconstruct</tt> and decomposed into atomic alleles with <tt>vcfwave</tt> (WFA2-lib). </p> <p> The hg38 track contains 1,483,114 SV-sized alleles (length ≥ 50 bp) split by type: 1,106,190 insertions, 192,597 deletions, 178,178 complex alleles and 6,149 inversions. The hs1 track is built from the parallel T2T-CHM13 wave VCF. Each row carries the allele count, allele frequency, number of samples with data and the snarl-nesting level of the variant in the pangenome decomposition tree. </p> <h2>Display Conventions and Configuration</h2> <p> Items are colored by SV type: <ul> <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li> <li><span style="color: rgb(140,0,200);">Complex alleles (COMPLEX)</span> - purple</li> <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li> </ul> </p> <p> Insertions are placed at the insertion site with a width of 1 bp; deletions, complex alleles and inversions span the affected reference interval. Filters are available for SV type, SV length, allele frequency and snarl level (0 = top-level bubble; higher values are nested within parent bubbles). </p> <h2>Methods</h2> <p> -The HPRC v2.0 minigraph-cactus pangenome was downloaded from the HPRC S3 -release bucket: -<ul> -<li>hg38: -<a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-grch38.sv.gfa.gz" target="_blank"><tt>hprc-v2.0-mc-grch38.sv.gfa.gz</tt></a> (graph) and -<a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-grch38.wave.vcf.gz" target="_blank"><tt>hprc-v2.0-mc-grch38.wave.vcf.gz</tt></a> (wave-decomposed VCF)</li> -<li>hs1: -<a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-chm13.wave.vcf.gz" target="_blank"><tt>hprc-v2.0-mc-chm13.wave.vcf.gz</tt></a></li> -</ul> -The VCF is the result of running <tt>vg deconstruct</tt> on the graph with -the corresponding reference path (GRCh38 or T2T-CHM13) and then -<tt>vcfwave</tt> / WFA2-lib to split complex multi-allelic records into -atomic alleles with per-allele TYPE and LEN fields. +HPRC release-2 is an open data release (not yet accompanied by a formal +peer-reviewed publication) built from PacBio HiFi haplotype-resolved +assemblies of 233 samples, including T2T-CHM13 and a diverse 1000 Genomes +Project panel. The pangenome graph was built with Minigraph-Cactus against +both GRCh38 and T2T-CHM13 reference paths; variants were extracted from +the graph with <tt>vg deconstruct</tt> and then decomposed into atomic +alleles with <tt>vcfwave</tt> / WFA2-lib, yielding per-allele TYPE and LEN +fields. For this track, each ALT in the wave VCF was emitted as its own +BED row, retaining alleles with |LEN| ≥ 50 bp or the <tt>INV</tt> flag; +allele counts, frequencies, sample counts and snarl levels are taken +directly from the per-allele INFO fields. On hg38 this yields 1,483,114 +SV-sized alleles (1,106,190 insertions, 192,597 deletions, 178,178 complex +alleles and 6,149 inversions); the hs1 track is built from the parallel +T2T-CHM13 wave VCF. Sample-list and assembly provenance for the graph are +maintained at HPRC in +<a href="https://github.com/human-pangenomics/hprc_intermediate_assembly/blob/main/data_tables/pangenomes/alignments_v2.0.csv" target="_blank"> +hprc_intermediate_assembly/<tt>alignments_v2.0.csv</tt></a>. </p> <p> -For display here, the wave VCF was streamed and each ALT was emitted as -its own BED row. Alleles were retained if their absolute length was -≥ 50 bp or if the record carried the <tt>INV</tt> flag (inversions may -be shorter). Allele counts, frequencies, and sample counts are taken -directly from the per-allele INFO fields. +The HPRC v2.0 Minigraph-Cactus graph and wave-decomposed VCFs were +downloaded from the HPRC S3 release bucket: +<a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-grch38.wave.vcf.gz" target="_blank"> +hprc-v2.0-mc-grch38.wave.vcf.gz</a> (hg38) and +<a href="https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-chm13.wave.vcf.gz" target="_blank"> +hprc-v2.0-mc-chm13.wave.vcf.gz</a> (hs1). </p> <p> -A pointer to both the GRCh38 and CHM13 pangenome files (and the list of -assemblies that went into the graph) is maintained by HPRC at -<a href="https://github.com/human-pangenomics/hprc_intermediate_assembly/blob/main/data_tables/pangenomes/alignments_v2.0.csv" - target="_blank">human-pangenomics/hprc_intermediate_assembly -<tt>alignments_v2.0.csv</tt></a>, which links to both the hg38 and -CHM13/hs1 VCFs (and the underlying graph files) used for this track. +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a> and +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hs1/lrSv.txt" target="_blank"> +doc/hs1/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> The data can be explored interactively in table format with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed programmatically through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>hprc2Sv</i>. </p> <p> The bigBed is available from our download server for both assemblies: <ul> <li>GRCh38: <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hprc2.bb" target="_blank"> hg38 hprc2.bb</a></li> <li>T2T-CHM13: <a href="http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/hprc2.bb" target="_blank"> hs1 hprc2.bb</a></li> </ul> Example: <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hprc2.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>. </p> <p> The original pangenome graph and the wave-decomposed VCF are available from the HPRC public S3 bucket, as linked from the <a href="https://humanpangenome.org/hprc-data-release-2/" target="_blank">HPRC release-2 announcement</a>. </p> <h2>Credits</h2> <p> Thanks to the Human Pangenome Reference Consortium for building and publicly releasing the release-2 minigraph-cactus pangenome. </p> <h2>References</h2> <p> HPRC release-2 data is not yet described in a formal peer-reviewed publication. See the Human Pangenome Project release announcement for background and data-access details: <a href="https://humanpangenome.org/hprc-data-release-2/" target="_blank"> HPRC data release 2</a>. </p>