c81011d4a8f57db347e15aa1248c501b2c8a6fea lrnassar Mon Jun 1 13:16:15 2026 -0700 QA fixes for the lrSv long-read SV supertrack: labels and description cleanups. refs #36258 Trim six subtrack longLabels to the 85-char limit (ga4kSv, hprc2Sv, hgsvc2Sv, chirmade101Sv, cpc1Sv, and lrSvAll; the lrSvAll change is also made in the lrSvMergeAll.py generator so a re-run reproduces it). Standardize the APR dataset name to "Arab Pangenome Reference (APR)" across lrSv.ra, lrSv.html, aprSv.html, and the makeDoc comment (was a mix of "Arabic" and "UAE UPR"). lrSv1kgOnt.html: state per-assembly SV counts (hg38 lifted 148,375 vs hs1 native 161,332, each with its own type breakdown) and encode non-ASCII author names as numeric entities. hgsvc3Sv.html: correct the hg38 counts to match the served bigBed (176,231 DEL+INS, 176,531 total). colorsDbSv.html: use $db in the hgdownload path so it resolves on hs1 as well as hg38. cpc1Sv.html: encode a Unicode minus sign as a numeric entity. diff --git src/hg/makeDb/trackDb/human/aprSv.html src/hg/makeDb/trackDb/human/aprSv.html index 107e2e2ef62..283e82d44b8 100644 --- src/hg/makeDb/trackDb/human/aprSv.html +++ src/hg/makeDb/trackDb/human/aprSv.html @@ -1,20 +1,20 @@ <h2>Description</h2> <p> This track displays structural variants (SVs), at least 50 bp long -(deletions, insertions, and complex substitutions), from the Arabic Pangenome +(deletions, insertions, and complex substitutions), from the Arab Pangenome Reference (APR), a pangenome graph built from 53 UAE-resident Arab individuals drawn from eight countries (UAE, Saudi Arabia, Oman, Jordan, Egypt, Morocco, Syria, Yemen). Each bubble in the graph that contains an SV-sized alternative allele is shown as a single variant site, with allele counts aggregated across the 53 samples (the GRCh38 reference haplotype, present as an extra sample column in the source VCF, is excluded from the aggregation).</p> <p> The APR pangenome was built on the T2T-CHM13v2 reference. Variants are shown natively on the <b>hs1</b> browser and lifted to <b>hg38</b> using the UCSC <tt>hs1ToHg38.over.chain.gz</tt> chain; variants that do not lift cleanly (often in T2T-added euchromatic sequence) are omitted from the hg38 version of the track.</p> @@ -39,31 +39,31 @@ with a comma-separated ALT list. For this track, each ALT is classified individually using the 50 bp threshold, and the row is emitted as a single bed item with:</p> <ul> <li><b>svType</b>: the common class, or <tt>MIXED</tt> if alts disagree;</li> <li><b>svLen</b>: reference span (chromEnd - chromStart);</li> <li><b>insLen</b>: maximum inserted-sequence length across passing INS alts (0 otherwise);</li> <li><b>AC</b>: sum of per-alt allele counts (AC) that passed;</li> <li><b>numAlts</b>: number of alt alleles that passed the 50 bp filter.</li> </ul> <p>Rows whose alts are all smaller than 50 bp are not shown.</p> <h2>Methods</h2> <p> -Nassir et al. 2025 built the Arabic Pangenome Reference (APR) from 53 +Nassir et al. 2025 built the Arab Pangenome Reference (APR) from 53 UAE-resident Arab individuals drawn from eight countries, sequenced with ~35x PacBio HiFi on Sequel IIe/Revio (30-h movies), ~54x Oxford Nanopore ultralong reads on R10.4.1 PromethION flow cells (96-h runs), and ~65x Hi-C (Illumina NovaSeq 6000). Haplotype-phased de novo assemblies were produced with hifiasm v0.19.5 (primary) and Verkko v1.3.1 (for comparison), with a median N50 of 124 Mb. The pangenome graph was built with Minigraph-Cactus seeded on T2T-CHM13v2 and augmented with GRCh38, and SVs were extracted by graph deconstruction. The released decomposed VCF (<tt>apr_review_v1_2902_chm13.vcf.gz</tt>) contains ~21 million variants on CHM13v2 contigs; after filtering to alt alleles with ≥50 bp length difference and collapsing the alts of each snarl into a single site, the APR SV track is obtained. Variants are shown natively on hs1 and lifted to hg38 with the UCSC <tt>hs1ToHg38.over.chain.gz</tt> chain (variants not lifting cleanly are omitted from the hg38 version).</p> @@ -97,31 +97,31 @@ <a href="http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/apr.bb" target="_blank"> http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/apr.bb</a> (native) and <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/apr.bb" target="_blank"> http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/apr.bb</a> (lifted).</p> <p> The original APR pangenome VCF and assemblies can be downloaded from <a href="https://www.mbru.ac.ae/the-arab-pangenome-reference/" target="_blank"> https://www.mbru.ac.ae/the-arab-pangenome-reference/</a>, and the project source code is at <a href="https://github.com/muddinmbru/arab_pangenome_reference" target="_blank"> https://github.com/muddinmbru/arab_pangenome_reference</a>.</p> <h2>Credits</h2> -<p>Thanks to the Arabic Pangenome Reference team at Mohammed Bin Rashid +<p>Thanks to the Arab Pangenome Reference team at Mohammed Bin Rashid University (Dubai), led by Mohammed Uddin, for producing and releasing the pangenome and its variant calls.</p> <h2>References</h2> <p> Nassir N, Almarri MA, Kumail M, Mohamed N, Balan B, Hanif S, AlObathani M, Jamalalail B, Elsokary H, Kondaramage D <em>et al</em>. <a href="https://doi.org/10.1038/s41467-025-61645-w" target="_blank"> A draft UAE-based Arab pangenome reference</a>. <em>Nat Commun</em>. 2025 Jul 24;16(1):6747. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40707445" target="_blank">40707445</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12290100/" target="_blank">PMC12290100</a> </p>