6b0d68657267f1e02c47d4224ea62446bbbb2ba0 max Fri May 22 06:55:52 2026 -0700 small non-AI changes to the html docs pages of the long-read SV tracks diff --git src/hg/makeDb/trackDb/human/aprSv.html src/hg/makeDb/trackDb/human/aprSv.html index ac90c471788..107e2e2ef62 100644 --- src/hg/makeDb/trackDb/human/aprSv.html +++ src/hg/makeDb/trackDb/human/aprSv.html @@ -1,62 +1,62 @@

Description

-This track displays structural variants (SVs) — deletions, insertions, and -complex substitutions of at least 50 bp — from the Arabic Pangenome +This track displays structural variants (SVs), at least 50 bp long +(deletions, insertions, and complex substitutions), from the Arabic Pangenome Reference (APR), a pangenome graph built from 53 UAE-resident Arab individuals drawn from eight countries (UAE, Saudi Arabia, Oman, Jordan, Egypt, Morocco, Syria, Yemen). Each bubble in the graph that contains an SV-sized alternative allele is shown as a single variant site, with allele counts aggregated across the 53 samples (the GRCh38 reference haplotype, present as an extra sample column in the source VCF, is excluded from the aggregation).

The APR pangenome was built on the T2T-CHM13v2 reference. Variants are shown natively on the hs1 browser and lifted to hg38 using the UCSC hs1ToHg38.over.chain.gz chain; variants that do not lift cleanly (often in T2T-added euchromatic sequence) are omitted from the hg38 version of the track.

Display conventions

Display Conventions and Configuration

Items are colored by SV type:

INS insertion (net ALT longer by ≥50 bp)
DEL deletion (net REF longer by ≥50 bp)
CPX complex substitution (similar-length REF and ALT but at least one ≥50 bp)
MIXED snarl whose alt alleles belong to different classes

Each item spans from the start of REF to its end on the reference. The name field is the graph snarl ID (e.g. <951452<1012008), which identifies the variant site in the APR pangenome graph.

Per-site alt-allele aggregation

Per-site Alt-allele Aggregation

The source VCF is multi-allelic: a single graph snarl appears as one row with a comma-separated ALT list. For this track, each ALT is classified individually using the 50 bp threshold, and the row is emitted as a single bed item with:

svType — the common class, or MIXED if alts disagree;
svLen — reference span (chromEnd - chromStart);
insLen — maximum inserted-sequence length across passing INS alts (0 otherwise);
AC — sum of per-alt allele counts (AC) that passed;
numAlts — number of alt alleles that passed the 50 bp filter.
svType: the common class, or MIXED if alts disagree;
svLen: reference span (chromEnd - chromStart);
insLen: maximum inserted-sequence length across passing INS alts (0 otherwise);
AC: sum of per-alt allele counts (AC) that passed;
numAlts: number of alt alleles that passed the 50 bp filter.

Rows whose alts are all smaller than 50 bp are not shown.

Methods

Nassir et al. 2025 built the Arabic Pangenome Reference (APR) from 53 UAE-resident Arab individuals drawn from eight countries, sequenced with ~35x PacBio HiFi on Sequel IIe/Revio (30-h movies), ~54x Oxford Nanopore ultralong reads on R10.4.1 PromethION flow cells (96-h runs), and ~65x Hi-C (Illumina NovaSeq 6000). Haplotype-phased de novo assemblies were produced with hifiasm v0.19.5 (primary) and Verkko v1.3.1 (for comparison), with a median N50 of 124 Mb. The pangenome graph was built with Minigraph-Cactus seeded on T2T-CHM13v2 and augmented with GRCh38, and SVs were extracted by graph deconstruction. The released decomposed