9a11061ca6b40fe16bdfd09b1af53192f6c7c85b max Tue Apr 21 08:13:02 2026 -0700 lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3 Subtrack stanzas for these SV callsets landed in earlier commits but the conversion scripts and per-track HTML description pages were never added; trackDb therefore had no doc to serve. This commit catches up. Docs (new): - colorsDbSv.html CoLoRSdb 1,427-sample long-read SVs - gustafsonSv.html 1KG ONT 100 (Gustafson 2024, PMID 39358015) - hgsvc2Sv.html HGSVC2 (Ebert 2021, PMID 33632895) - hprc2Sv.html HPRC release-2 pangenome SVs (no PMID yet; see humanpangenome.org/hprc-data-release-2/) - onekg3202Sr.html 1KG 3202 Illumina SHORT-READ GATK-SV (Byrska-Bishop 2022, PMID 36055201) Scripts (new): - lrSvGustafson.as / lrSvGustafsonVcfToBed.py - lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py (merges insdel + inv tables) - lrSvHprc2.as / lrSvHprc2VcfToBed.py (streams wave-decomposed VCF, explodes multi-allelic rows, filters to SV-sized or INV) - lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py HGSVC3 also on hs1: - hgsvc3Sv.html: note that the hs1 build is native (not lifted): HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and released separate annotation tables per reference. Added the T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb download link to Data Access. - doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers back to doc/hg38/lrSv.txt for the full process. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/gustafsonSv.html src/hg/makeDb/trackDb/human/gustafsonSv.html new file mode 100644 index 00000000000..65923b24b32 --- /dev/null +++ src/hg/makeDb/trackDb/human/gustafsonSv.html @@ -0,0 +1,96 @@ +<h2>Description</h2> +<p> +This track shows structural variants (SVs) from Oxford Nanopore long-read +whole-genome sequencing of 100 individuals in the 1000 Genomes Project, +as released by the 1000 Genomes Project ONT Sequencing Consortium and +described in Gustafson et al. 2024. The cohort spans all five 1000 +Genomes superpopulations and 19 subpopulations. Samples were sequenced +with ONT R9.4.1 pores at ~37x coverage with median read N50 of ~54 kb. +</p> +<p> +The track contains 113,696 SVs (63,177 insertions, 49,704 deletions, +744 inversions, 71 duplications). Each variant was called by up to five +independent methods (three alignment-based: Sniffles2, cuteSV, SVIM; +and assembly-based hapdiff on Flye or Shasta/Hapdup assemblies) and then +merged across callers and samples with Jasmine to produce a +cross-sample consensus catalog. +</p> +<p> +This 100-sample Gustafson cohort is distinct from the Vienna +1000-Genomes-ONT release (<a href="hgTrackUi?g=lrSv1kgOnt">1KG ONT SVs</a>), +which uses different samples, pore chemistry and callers; the two +releases share neither samples nor calls. +</p> + +<h2>Display Conventions and Configuration</h2> +<p> +Items are colored by SV type: +<ul> +<li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li> +<li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> +<li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li> +<li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li> +</ul> +</p> +<p> +Insertions are placed at the insertion site with a width of 1 bp; deletions, +duplications and inversions span the affected reference interval. Filters +are available for SV type, SV length and carrier-sample count. The detail +page also shows the number of per-caller calls supporting each site +(VARCALLS) and whether the source caller marked the breakpoints as precise. +</p> + +<h2>Methods</h2> +<p> +Long-read whole-genome sequencing was performed on 100 1000 Genomes +samples with ONT R9.4.1 pores at a median coverage of ~37x and read N50 +of ~54 kb. Reads were aligned to GRCh38 with minimap2 and, for a subset, +with the CARD pipeline. De novo assemblies were produced with Flye and +with Shasta/Hapdup. Per-sample structural variant calls were generated +with five independent methods (Sniffles2, cuteSV, SVIM on alignments; +hapdiff on Flye and on Shasta/Hapdup assemblies) and merged across +callers with Jasmine in two stages: first within each sample +(intra-sample) to build per-sample consensus SVs, then across all 100 +samples to produce the shared site-level callset used here. +</p> + +<h2>Data Access</h2> +<p> +The data can be explored interactively in table format with the +<a href="../cgi-bin/hgTables">Table Browser</a> or the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed +programmatically through our <a href="https://api.genome.ucsc.edu">API</a>, +track=<i>gustafsonSv</i>. +</p> +<p> +The bigBed is available from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our +download server</a> as <tt>gustafson.bb</tt>. Example: +<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/gustafson.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>. +</p> +<p> +The original VCF is available from the 1000 Genomes ONT S3 bucket: +<a href="https://s3.amazonaws.com/1000g-ont/Gustafson_etal_2024_preprint_SUPPLEMENTAL/20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz" target="_blank"> +20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz</a>. +</p> + +<h2>Credits</h2> +<p> +Thanks to Gustafson and colleagues and the 1000 Genomes Project ONT +Sequencing Consortium for releasing this dataset. +</p> + +<h2>References</h2> + + +<p> +Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, +Richmond PA, De Coster W <em>et al</em>. +<a href="http://genome.cshlp.org/lookup/pmidlookup?view=long&pmid=39358015" target="_blank"> +High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive +catalog of human genetic variation</a>. +<em>Genome Res</em>. 2024 Nov 20;34(11):2061-2073. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39358015" target="_blank">39358015</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610458/" target="_blank">PMC11610458</a> +</p> +