526213b2893134217a300ff913e11b4e98d67991 max Mon Apr 20 08:50:10 2026 -0700 lrSv: add cpc1Sv and aprSv pangenome SV subtracks (hg38, hs1) cpc1Sv: 97,205 SVs from the CPC + HPRC Phase 1 pangenome (Gao et al 2023, Nature; PMID 37316654) built on T2T-CHM13v2, with 53 Chinese and 47 HPRC samples. Each graph snarl site is shown as one item with alt alleles classified by length delta (INS/DEL/CPX, 50 bp threshold) and collapsed. aprSv: 103,077 SVs from the Arabic Pangenome Reference (Nassir et al. 2025, Nat Commun; PMID 40707445) built on T2T-CHM13v2 from 53 UAE-resident Arab individuals. Same multi-allele classification as cpc1Sv, with alt alleles iterated within each multi-allelic row. Both tracks load natively on hs1 and are lifted to hg38 with hs1ToHg38.over.chain.gz. refs #36258 diff --git src/hg/makeDb/trackDb/human/aprSv.html src/hg/makeDb/trackDb/human/aprSv.html new file mode 100644 index 00000000000..255e1ae719d --- /dev/null +++ src/hg/makeDb/trackDb/human/aprSv.html @@ -0,0 +1,103 @@ +<h2>Description</h2> + +<p> +This track displays structural variants (SVs) — deletions, insertions, and +complex substitutions of at least 50 bp — from the Arabic Pangenome +Reference (APR), a pangenome graph built from 53 UAE-resident Arab +individuals drawn from eight countries (UAE, Saudi Arabia, Oman, Jordan, +Egypt, Morocco, Syria, Yemen). Each bubble in the graph that contains an +SV-sized alternative allele is shown as a single variant site, with allele +counts aggregated across the 53 samples (the GRCh38 reference haplotype, +present as an extra sample column in the source VCF, is excluded from the +aggregation).</p> + +<p> +The APR pangenome was built on the T2T-CHM13v2 reference. Variants are +shown natively on the <b>hs1</b> browser and lifted to <b>hg38</b> using +the UCSC <tt>hs1ToHg38.over.chain.gz</tt> chain; variants that do not lift +cleanly (often in T2T-added euchromatic sequence) are omitted from the +hg38 version of the track.</p> + +<h2>Display conventions</h2> + +<p>Items are colored by SV type:</p> +<ul> + <li><span style="background-color:rgb(0,0,200);color:white;padding:1px 6px">INS</span> insertion (net ALT longer by ≥50 bp)</li> + <li><span style="background-color:rgb(200,0,0);color:white;padding:1px 6px">DEL</span> deletion (net REF longer by ≥50 bp)</li> + <li><span style="background-color:rgb(230,140,0);color:white;padding:1px 6px">CPX</span> complex substitution (similar-length REF and ALT but at least one ≥50 bp)</li> + <li><span style="background-color:rgb(120,120,120);color:white;padding:1px 6px">MIXED</span> snarl whose alt alleles belong to different classes</li> +</ul> + +<p>Each item spans from the start of REF to its end on the reference. +The name field is the graph snarl ID (e.g. <tt><951452<1012008</tt>), +which identifies the variant site in the APR pangenome graph.</p> + +<h2>Per-site alt-allele aggregation</h2> + +<p> +The source VCF is multi-allelic: a single graph snarl appears as one row +with a comma-separated ALT list. For this track, each ALT is classified +individually using the 50 bp threshold, and the row is emitted as a single +bed item with:</p> +<ul> + <li><b>svType</b> — the common class, or <tt>MIXED</tt> if alts disagree;</li> + <li><b>svLen</b> — the maximum |len(ALT)-len(REF)| across alts that passed;</li> + <li><b>alleleCount</b> — sum of per-alt allele counts (AC) that passed;</li> + <li><b>numAlts</b> — number of alt alleles that passed the 50 bp filter.</li> +</ul> +<p>Rows whose alts are all smaller than 50 bp are not shown.</p> + +<h2>Methods</h2> + +<p> +The APR pangenome was assembled from 53 individuals sequenced with an +average 35× PacBio HiFi, 54× ultralong ONT and 65× Hi-C +coverage, producing haplotype-phased de novo assemblies with N50 > 124 Mb. +The pangenome graph was built with Minigraph-Cactus v2.7.2 seeded on +CHM13v2 (backbone) and GRCh38; variants were extracted and deconstructed +from the graph. For this UCSC track, the decomposed VCF was parsed, +filtered to alt alleles with ≥50 bp REF/ALT length difference, and +merged per snarl site. See the build documentation in the kent source +tree at <tt>src/hg/makeDb/doc/hg38/lrSv.txt</tt> for details.</p> + +<h2>Data Access</h2> + +<p>The data can be explored interactively with the +<a href="../cgi-bin/hgTables">Table Browser</a> or +<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed from +scripts via our <a href="https://api.genome.ucsc.edu">API</a> +(track=<i>aprSv</i>).</p> + +<p>For automated download, the bigBed files are at +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/apr.bb" target="_blank"> +http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/apr.bb</a> (native) and +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/apr.bb" target="_blank"> +http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/apr.bb</a> (lifted).</p> + +<p> +The original APR pangenome VCF and assemblies can be downloaded from +<a href="https://www.mbru.ac.ae/the-arab-pangenome-reference/" target="_blank"> +https://www.mbru.ac.ae/the-arab-pangenome-reference/</a>, +and the project source code is at +<a href="https://github.com/muddinmbru/arab_pangenome_reference" target="_blank"> +https://github.com/muddinmbru/arab_pangenome_reference</a>.</p> + +<h2>Credits</h2> + +<p>Thanks to the Arabic Pangenome Reference team at Mohammed Bin Rashid +University (Dubai), led by Mohammed Uddin, for producing and releasing +the pangenome and its variant calls.</p> + +<h2>References</h2> + + +<p> +Nassir N, Almarri MA, Kumail M, Mohamed N, Balan B, Hanif S, AlObathani M, Jamalalail B, Elsokary H, +Kondaramage D <em>et al</em>. +<a href="https://doi.org/10.1038/s41467-025-61645-w" target="_blank"> +A draft UAE-based Arab pangenome reference</a>. +<em>Nat Commun</em>. 2025 Jul 24;16(1):6747. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40707445" target="_blank">40707445</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12290100/" target="_blank">PMC12290100</a> +</p> +