bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/aprSv.html src/hg/makeDb/trackDb/human/aprSv.html index 255e1ae719d..ac90c471788 100644 --- src/hg/makeDb/trackDb/human/aprSv.html +++ src/hg/makeDb/trackDb/human/aprSv.html @@ -29,48 +29,73 @@

Each item spans from the start of REF to its end on the reference. The name field is the graph snarl ID (e.g. <951452<1012008), which identifies the variant site in the APR pangenome graph.

Per-site alt-allele aggregation

The source VCF is multi-allelic: a single graph snarl appears as one row with a comma-separated ALT list. For this track, each ALT is classified individually using the 50 bp threshold, and the row is emitted as a single bed item with:

svType — the common class, or MIXED if alts disagree;
svLen — the maximum |len(ALT)-len(REF)| across alts that passed;
alleleCount — sum of per-alt allele counts (AC) that passed;
svLen — reference span (chromEnd - chromStart);
insLen — maximum inserted-sequence length across passing INS alts (0 otherwise);
AC — sum of per-alt allele counts (AC) that passed;
numAlts — number of alt alleles that passed the 50 bp filter.

Rows whose alts are all smaller than 50 bp are not shown.

Methods

-The APR pangenome was assembled from 53 individuals sequenced with an -average 35× PacBio HiFi, 54× ultralong ONT and 65× Hi-C -coverage, producing haplotype-phased de novo assemblies with N50 > 124 Mb. -The pangenome graph was built with Minigraph-Cactus v2.7.2 seeded on -CHM13v2 (backbone) and GRCh38; variants were extracted and deconstructed -from the graph. For this UCSC track, the decomposed VCF was parsed, -filtered to alt alleles with ≥50 bp REF/ALT length difference, and -merged per snarl site. See the build documentation in the kent source -tree at src/hg/makeDb/doc/hg38/lrSv.txt for details.

+Nassir et al. 2025 built the Arabic Pangenome Reference (APR) from 53 +UAE-resident Arab individuals drawn from eight countries, sequenced with +~35x PacBio HiFi on Sequel IIe/Revio (30-h movies), ~54x Oxford Nanopore +ultralong reads on R10.4.1 PromethION flow cells (96-h runs), and ~65x +Hi-C (Illumina NovaSeq 6000). Haplotype-phased de novo assemblies were +produced with hifiasm v0.19.5 (primary) and Verkko v1.3.1 (for +comparison), with a median N50 of 124 Mb. The pangenome graph was built +with Minigraph-Cactus seeded on T2T-CHM13v2 and augmented with GRCh38, +and SVs were extracted by graph deconstruction. The released decomposed +VCF (apr_review_v1_2902_chm13.vcf.gz) contains ~21 million +variants on CHM13v2 contigs; after filtering to alt alleles with ≥50 bp +length difference and collapsing the alts of each snarl into a single +site, the APR SV track is obtained. Variants are shown natively on hs1 +and lifted to hg38 with the UCSC hs1ToHg38.over.chain.gz chain +(variants not lifting cleanly are omitted from the hg38 version).

+ +

+The source APR VCF was downloaded from the Mohammed Bin Rashid +University SharePoint page, + +mbru.ac.ae/the-arab-pangenome-reference; the accompanying project +source code is at + +github.com/muddinmbru/arab_pangenome_reference.

+ +

+The step-by-step build commands (download, graph-VCF conversion, liftOver, +bigBed build) are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv. +

Data Access

The data can be explored interactively with the Table Browser or Data Integrator, and accessed from scripts via our API (track=aprSv).

For automated download, the bigBed files are at http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/apr.bb (native) and http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/apr.bb (lifted).