bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/ga4kSv.html src/hg/makeDb/trackDb/human/ga4kSv.html index 0a4b668a46e..1f6ef9d049e 100644 --- src/hg/makeDb/trackDb/human/ga4kSv.html +++ src/hg/makeDb/trackDb/human/ga4kSv.html @@ -23,47 +23,62 @@ <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li> <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> <li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li> <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li> </ul> </p> <p> Insertions are placed at the insertion site with a width of 1 bp; deletions, duplications and inversions span the affected interval. Filters are available for SV type, SV length, carrier-sample count and allele frequency. The detail page also shows the total number of samples genotyped at each site. </p> <h2>Methods</h2> <p> -Samples were sequenced on PacBio Revio and Sequel II instruments with HiFi -chemistry. Single-sample SV callsets were produced with pbsv and then merged -across the cohort with JASMINE v1.1.4 (<tt>jasmine --output-genotypes</tt>), -which clusters equivalent SVs across samples and writes a site-level multi-sample -VCF. +The Genomic Answers for Kids (GA4K) program at Children's Mercy Research +Institute is a longitudinal pediatric rare-disease initiative described in +Cohen et al. 2022. GA4K probands and their families are sequenced with +PacBio HiFi long reads (Revio and Sequel II), and the 502-sample GA4K +PacBio SV release (<tt>pb_joint_merged.sv.vcf.gz</tt>) is produced by +running <a href="https://github.com/PacificBiosciences/pbsv" target="_blank"> +pbsv</a> per sample and merging with +<a href="https://github.com/mkirsche/Jasmine" target="_blank">JASMINE</a> +v1.1.4 (<tt>--output-genotypes</tt>). The merged site-level VCF is +filtered to SVs replicated in at least two independent observations +(either matching a second unrelated CMH individual in the same Jasmine +cluster, or matching an SV in the deCODE Icelandic or HPRC callsets via +<a href="https://github.com/PacificBiosciences/svpack" target="_blank"> +svpack match</a>). The released catalog contains 115,554 replicated SVs +(52,564 deletions, 58,219 insertions, 4,408 duplications and 363 +inversions) with recomputed carrier counts (SVC), total sample counts +(SVN) and allele frequencies (SVF = SVC/SVN). </p> <p> -To reduce false positives, the merged VCF was filtered to retain only SVs that -were replicated in at least two independent observations: either (1) matching a -second SV from another unrelated Children's Mercy (CMH) individual within the -same Jasmine cluster, or (2) matching an SV from the Decode Icelandic or Human -Pangenome Reference Consortium (HPRC) callsets using -<tt>svpack match</tt> with default settings. +The source VCF was cloned from the Children's Mercy Research Institute +GA4K GitHub repository, +<a href="https://github.com/ChildrensMercyResearchInstitute/GA4K" target="_blank"> +github.com/ChildrensMercyResearchInstitute/GA4K</a> +(<tt>pacbio_sv_vcf/pb_joint_merged.sv.vcf.gz</tt>). </p> <p> -Carrier counts (SVC), total sample counts (SVN) and allele frequencies -(SVF = SVC/SVN) were recomputed on the replicated callset. +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> The data can be explored interactively in table format with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from there to spreadsheet or tab-sep tables. From scripts, the data can be accessed through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>ga4kSv</i>. </p> <p> For automated download and analysis, the annotation is stored in a bigBed file that can be downloaded from <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our download server</a>. The file for this track is called <tt>ga4kSv.bb</tt>.