c81011d4a8f57db347e15aa1248c501b2c8a6fea lrnassar Mon Jun 1 13:16:15 2026 -0700 QA fixes for the lrSv long-read SV supertrack: labels and description cleanups. refs #36258 Trim six subtrack longLabels to the 85-char limit (ga4kSv, hprc2Sv, hgsvc2Sv, chirmade101Sv, cpc1Sv, and lrSvAll; the lrSvAll change is also made in the lrSvMergeAll.py generator so a re-run reproduces it). Standardize the APR dataset name to "Arab Pangenome Reference (APR)" across lrSv.ra, lrSv.html, aprSv.html, and the makeDoc comment (was a mix of "Arabic" and "UAE UPR"). lrSv1kgOnt.html: state per-assembly SV counts (hg38 lifted 148,375 vs hs1 native 161,332, each with its own type breakdown) and encode non-ASCII author names as numeric entities. hgsvc3Sv.html: correct the hg38 counts to match the served bigBed (176,231 DEL+INS, 176,531 total). colorsDbSv.html: use $db in the hgdownload path so it resolves on hs1 as well as hg38. cpc1Sv.html: encode a Unicode minus sign as a numeric entity. diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt index 67d923f950d..99df493cc80 100644 --- src/hg/makeDb/doc/hg38/lrSv.txt +++ src/hg/makeDb/doc/hg38/lrSv.txt @@ -366,31 +366,31 @@ # delta with a 50 bp threshold (INS, DEL, CPX, or dropped), collapses all # alts of one snarl ID into a single row (MIXED when types disagree), # and writes 16-column bed rows with AC/AN/AF and NS. bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvCpc1Build.sh # hs1 bigBed: 97,205 sites (4.7 MB) # hg38 lifted: 81,261 sites (4.1 MB), 15,944 unmapped # Symlinks for both assemblies mkdir -p /gbdb/hs1/lrSv /gbdb/hg38/lrSv ln -sf /hive/data/genomes/hg38/bed/lrSv/cpc1/cpc1.hs1.bb /gbdb/hs1/lrSv/cpc1.bb ln -sf /hive/data/genomes/hg38/bed/lrSv/cpc1/cpc1.hg38.bb /gbdb/hg38/lrSv/cpc1.bb ########## # 2026-04-20 Claude max -# Arabic Pangenome Reference (APR) SVs +# Arab Pangenome Reference (APR) SVs # Paper: Nassir et al. 2025, Nat Commun, PMID 40707445 # Data : https://www.mbru.ac.ae/the-arab-pangenome-reference/ # (SharePoint download page under APR Nuclear/Pangenome) # Source: apr_review_v1_2902_chm13.vcf.gz (1.5 GB, 21M variants, # contigs named chrN with CHM13v2 lengths, multi-allelic rows). mkdir -p /hive/data/genomes/hg38/bed/lrSv/apr cd /hive/data/genomes/hg38/bed/lrSv/apr # (VCF placed here by the user from the MBRU SharePoint download) # Run converter + liftOver + bigBed for both hs1 (native) and hg38 (lifted). # The script iterates the comma-separated ALT alleles of each row, # classifies each by length delta (>=50 bp -> INS, <=-50 bp -> DEL, # |d|<50 and max(len)>=50 -> CPX, else drop), then emits one row per