bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/onekg3202Sr.html src/hg/makeDb/trackDb/human/onekg3202Sr.html index b0bc17186ad..920bd02667a 100644 --- src/hg/makeDb/trackDb/human/onekg3202Sr.html +++ src/hg/makeDb/trackDb/human/onekg3202Sr.html @@ -38,47 +38,69 @@ Insertions are placed at the insertion site; deletions, duplications, inversions, complex and copy-number variants span the affected reference interval. Translocations show only the chr1-side breakpoint; the partner chromosome is reported on the detail page.

Filters are available for SV type, SV length, overall allele frequency, population-max allele frequency and per-population AFs (African and European). The detail page also shows heterozygous / homozygous-alternate carrier counts, the set of upstream SV callers, the upstream pipeline source and the VCF FILTER status.

Methods

-The 1000 Genomes expanded cohort was sequenced on Illumina NovaSeq 6000 -at ~30x coverage with 2x150 bp reads. Structural variants were called -with the GATK-SV cohort pipeline and merged with svtools; novel insertions -were re-genotyped to produce the integrated callset used here -(1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz). -Allele frequencies were computed genome-wide and per-population -(AFR, AMR, EAS/ASN, EUR, SAS/SAN). +Byrska-Bishop et al. 2022 sequenced the 3,202-sample expanded 1000 +Genomes Project cohort (2,504 original unrelated samples plus 698 samples +that complete 602 parent-child trios) on Illumina NovaSeq 6000 at ~30x +coverage with 2x150 bp reads. SNVs and indels were called with GATK +HaplotypeCaller. SVs were discovered and integrated from three analytic +pipelines - +GATK-SV, +svtools and Absinthe - through a machine-learning integration model; +novel insertions were re-genotyped to produce the freeze V3 callset with +added allele frequencies (*.wAF.vcf.gz). The final ensemble +callset contains 173,366 SVs across seven classes: 90,259 DELs, 49,693 +INSs, 28,242 DUPs, 920 INVs, 3,568 complex SVs (CPX), 673 multi-allelic +CNVs and 11 inter-chromosomal translocations (CTX), with AC, AN, AF and +per-superpopulation AFs (AFR, AMR, EAS/ASN, EUR, SAS/SAN).

Why a short-read track in a long-read collection? Short-read SV -callsets such as this one generally have high precision for deletions -and duplications but miss many insertions, repeat expansions and -variants in complex/low-mappability regions that long-read technologies -can resolve. Displaying this callset alongside the long-read tracks in -this collection makes it easier to spot variants that are unique to -long-read data or that have substantially different breakpoints when -called from short reads. +callsets such as this one generally have high precision for deletions and +duplications but miss many insertions, repeat expansions and variants in +complex/low-mappability regions that long-read technologies can resolve. +Displaying this callset alongside the long-read tracks in this collection +makes it easier to spot variants that are unique to long-read data or +that have substantially different breakpoints when called from short +reads. +

+The freeze V3 VCF +1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz was +downloaded from the + +IGSR 1000 Genomes Illumina SV integration folder. +

+The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=onekg3202Sr.

The bigBed is available from our download server as onekg3202sr.bb. Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/onekg3202sr.bb -chrom=chr21 -start=0 -end=100000000 stdout.