bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/gustafsonSv.html src/hg/makeDb/trackDb/human/gustafsonSv.html index 65923b24b32..e31e5a7bc66 100644 --- src/hg/makeDb/trackDb/human/gustafsonSv.html +++ src/hg/makeDb/trackDb/human/gustafsonSv.html @@ -30,40 +30,60 @@ <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> <li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li> <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li> </ul> </p> <p> Insertions are placed at the insertion site with a width of 1 bp; deletions, duplications and inversions span the affected reference interval. Filters are available for SV type, SV length and carrier-sample count. The detail page also shows the number of per-caller calls supporting each site (VARCALLS) and whether the source caller marked the breakpoints as precise. </p> <h2>Methods</h2> <p> -Long-read whole-genome sequencing was performed on 100 1000 Genomes -samples with ONT R9.4.1 pores at a median coverage of ~37x and read N50 -of ~54 kb. Reads were aligned to GRCh38 with minimap2 and, for a subset, -with the CARD pipeline. De novo assemblies were produced with Flye and -with Shasta/Hapdup. Per-sample structural variant calls were generated -with five independent methods (Sniffles2, cuteSV, SVIM on alignments; -hapdiff on Flye and on Shasta/Hapdup assemblies) and merged across -callers with Jasmine in two stages: first within each sample -(intra-sample) to build per-sample consensus SVs, then across all 100 -samples to produce the shared site-level callset used here. +Gustafson et al. 2024 performed Oxford Nanopore long-read sequencing on +100 samples from the 1000 Genomes Project (all five superpopulations and +19 subpopulations) using R9.4.1 flow cells, at a median per-sample +coverage of ~37x and read N50 of ~54 kb. Per-sample SV calls were +generated through the Napu pipeline with five independent methods: three +alignment-based callers (Sniffles2, cuteSV and SVIM run on minimap2 +alignments to GRCh38) and two assembly-based callers (hapdiff run on Flye +and on Shasta/Hapdup assemblies). The five per-sample VCFs were merged +with <a href="https://github.com/mkirsche/Jasmine" target="_blank">Jasmine</a> +in two stages (intra-sample consensus, then cross-sample merge). The +released confident site-level callset is defined as variants supported by +hapdiff and at least two unique alignment-based callers, yielding 113,696 +SVs (63,177 insertions, 49,704 deletions, 744 inversions, 71 +duplications). SV counts per sample and multicaller concordance were +benchmarked against the HPRC Sniffles2 truth and the GIAB HG002 Tier1 +region with Truvari v4.1.0. +</p> +<p> +The source Jasmine-merged VCF was downloaded from the 1000 Genomes ONT S3 +bucket: +<a href="https://s3.amazonaws.com/1000g-ont/Gustafson_etal_2024_preprint_SUPPLEMENTAL/20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz" target="_blank"> +<tt>20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz</tt></a>. +</p> +<p> +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> The data can be explored interactively in table format with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed programmatically through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>gustafsonSv</i>. </p> <p> The bigBed is available from <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our download server</a> as <tt>gustafson.bb</tt>. Example: <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/gustafson.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.