526213b2893134217a300ff913e11b4e98d67991 max Mon Apr 20 08:50:10 2026 -0700 lrSv: add cpc1Sv and aprSv pangenome SV subtracks (hg38, hs1) cpc1Sv: 97,205 SVs from the CPC + HPRC Phase 1 pangenome (Gao et al 2023, Nature; PMID 37316654) built on T2T-CHM13v2, with 53 Chinese and 47 HPRC samples. Each graph snarl site is shown as one item with alt alleles classified by length delta (INS/DEL/CPX, 50 bp threshold) and collapsed. aprSv: 103,077 SVs from the Arabic Pangenome Reference (Nassir et al. 2025, Nat Commun; PMID 40707445) built on T2T-CHM13v2 from 53 UAE-resident Arab individuals. Same multi-allele classification as cpc1Sv, with alt alleles iterated within each multi-allelic row. Both tracks load natively on hs1 and are lifted to hg38 with hs1ToHg38.over.chain.gz. refs #36258 diff --git src/hg/makeDb/trackDb/human/aprSv.html src/hg/makeDb/trackDb/human/aprSv.html new file mode 100644 index 00000000000..255e1ae719d --- /dev/null +++ src/hg/makeDb/trackDb/human/aprSv.html @@ -0,0 +1,103 @@ +
+This track displays structural variants (SVs) — deletions, insertions, and +complex substitutions of at least 50 bp — from the Arabic Pangenome +Reference (APR), a pangenome graph built from 53 UAE-resident Arab +individuals drawn from eight countries (UAE, Saudi Arabia, Oman, Jordan, +Egypt, Morocco, Syria, Yemen). Each bubble in the graph that contains an +SV-sized alternative allele is shown as a single variant site, with allele +counts aggregated across the 53 samples (the GRCh38 reference haplotype, +present as an extra sample column in the source VCF, is excluded from the +aggregation).
+ ++The APR pangenome was built on the T2T-CHM13v2 reference. Variants are +shown natively on the hs1 browser and lifted to hg38 using +the UCSC hs1ToHg38.over.chain.gz chain; variants that do not lift +cleanly (often in T2T-added euchromatic sequence) are omitted from the +hg38 version of the track.
+ +Items are colored by SV type:
+Each item spans from the start of REF to its end on the reference. +The name field is the graph snarl ID (e.g. <951452<1012008), +which identifies the variant site in the APR pangenome graph.
+ ++The source VCF is multi-allelic: a single graph snarl appears as one row +with a comma-separated ALT list. For this track, each ALT is classified +individually using the 50 bp threshold, and the row is emitted as a single +bed item with:
+Rows whose alts are all smaller than 50 bp are not shown.
+ ++The APR pangenome was assembled from 53 individuals sequenced with an +average 35× PacBio HiFi, 54× ultralong ONT and 65× Hi-C +coverage, producing haplotype-phased de novo assemblies with N50 > 124 Mb. +The pangenome graph was built with Minigraph-Cactus v2.7.2 seeded on +CHM13v2 (backbone) and GRCh38; variants were extracted and deconstructed +from the graph. For this UCSC track, the decomposed VCF was parsed, +filtered to alt alleles with ≥50 bp REF/ALT length difference, and +merged per snarl site. See the build documentation in the kent source +tree at src/hg/makeDb/doc/hg38/lrSv.txt for details.
+ +The data can be explored interactively with the +Table Browser or +Data Integrator, and accessed from +scripts via our API +(track=aprSv).
+ +For automated download, the bigBed files are at + +http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/apr.bb (native) and + +http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/apr.bb (lifted).
+ ++The original APR pangenome VCF and assemblies can be downloaded from + +https://www.mbru.ac.ae/the-arab-pangenome-reference/, +and the project source code is at + +https://github.com/muddinmbru/arab_pangenome_reference.
+ +Thanks to the Arabic Pangenome Reference team at Mohammed Bin Rashid +University (Dubai), led by Mohammed Uddin, for producing and releasing +the pangenome and its variant calls.
+ ++Nassir N, Almarri MA, Kumail M, Mohamed N, Balan B, Hanif S, AlObathani M, Jamalalail B, Elsokary H, +Kondaramage D et al. + +A draft UAE-based Arab pangenome reference. +Nat Commun. 2025 Jul 24;16(1):6747. +PMID: 40707445; PMC: PMC12290100 +
+