bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/ga4kSv.html src/hg/makeDb/trackDb/human/ga4kSv.html index 0a4b668a46e..1f6ef9d049e 100644 --- src/hg/makeDb/trackDb/human/ga4kSv.html +++ src/hg/makeDb/trackDb/human/ga4kSv.html @@ -23,47 +23,62 @@
  • Deletions (DEL) - red
  • Insertions (INS) - blue
  • Duplications (DUP) - green
  • Inversions (INV) - orange
  • Insertions are placed at the insertion site with a width of 1 bp; deletions, duplications and inversions span the affected interval. Filters are available for SV type, SV length, carrier-sample count and allele frequency. The detail page also shows the total number of samples genotyped at each site.

    Methods

    -Samples were sequenced on PacBio Revio and Sequel II instruments with HiFi -chemistry. Single-sample SV callsets were produced with pbsv and then merged -across the cohort with JASMINE v1.1.4 (jasmine --output-genotypes), -which clusters equivalent SVs across samples and writes a site-level multi-sample -VCF. +The Genomic Answers for Kids (GA4K) program at Children's Mercy Research +Institute is a longitudinal pediatric rare-disease initiative described in +Cohen et al. 2022. GA4K probands and their families are sequenced with +PacBio HiFi long reads (Revio and Sequel II), and the 502-sample GA4K +PacBio SV release (pb_joint_merged.sv.vcf.gz) is produced by +running +pbsv per sample and merging with +JASMINE +v1.1.4 (--output-genotypes). The merged site-level VCF is +filtered to SVs replicated in at least two independent observations +(either matching a second unrelated CMH individual in the same Jasmine +cluster, or matching an SV in the deCODE Icelandic or HPRC callsets via + +svpack match). The released catalog contains 115,554 replicated SVs +(52,564 deletions, 58,219 insertions, 4,408 duplications and 363 +inversions) with recomputed carrier counts (SVC), total sample counts +(SVN) and allele frequencies (SVF = SVC/SVN).

    -To reduce false positives, the merged VCF was filtered to retain only SVs that -were replicated in at least two independent observations: either (1) matching a -second SV from another unrelated Children's Mercy (CMH) individual within the -same Jasmine cluster, or (2) matching an SV from the Decode Icelandic or Human -Pangenome Reference Consortium (HPRC) callsets using -svpack match with default settings. +The source VCF was cloned from the Children's Mercy Research Institute +GA4K GitHub repository, + +github.com/ChildrensMercyResearchInstitute/GA4K +(pacbio_sv_vcf/pb_joint_merged.sv.vcf.gz).

    -Carrier counts (SVC), total sample counts (SVN) and allele frequencies -(SVF = SVC/SVN) were recomputed on the replicated callset. +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv.

    Data Access

    The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-sep tables. From scripts, the data can be accessed through our API, track=ga4kSv.

    For automated download and analysis, the annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called ga4kSv.bb.