bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/hgsvc2Sv.html src/hg/makeDb/trackDb/human/hgsvc2Sv.html index 500ba0a04f9..994ff8c8441 100644 --- src/hg/makeDb/trackDb/human/hgsvc2Sv.html +++ src/hg/makeDb/trackDb/human/hgsvc2Sv.html @@ -45,50 +45,63 @@
  • RefSeq Gene Overlaps: bases of overlap with CDS, 5'/3' UTRs, introns, non-coding RNAs, and +/- 5 kb windows around each gene.
  • Gene Constraint: maximum gnomAD pLI and minimum LOEUF upper bound for genes overlapping the SV.
  • Reference Context: cytoband, segmental-duplication overlap, whether the SV falls in a Tandem Repeat Finder region.
  • Carrier Haplotypes: full list of sample-haplotype IDs (e.g. HG00096-h1, HG00514-un) carrying the variant.
  • Inner Inversion Region (INV only): coordinates of the inner inverted sequence, distinct from the outer breakpoint interval.
  • Methods

    -HGSVC2 generated phased haplotype-resolved de novo assemblies for 32 -diploid samples across five 1000 Genomes superpopulations. Assemblies -were built from PacBio continuous long reads and HiFi reads and phased -with Strand-seq. Structural variants were discovered from each haplotype -assembly using PAV and validated with multiple orthogonal callers -(including PBSV, Bionano, DeepVariant, PAV-LRA, and others recorded in -per-site validation columns). The final SV set was merged to produce the -integrated callset used here. +Ebert et al. 2021 produced phased haplotype-resolved de novo assemblies for +32 diploid samples (64 unrelated haplotypes) across five 1000 Genomes +superpopulations on the PacBio Sequel II platform, using continuous +long-read sequencing (CLR, >40x) and high-fidelity sequencing (HiFi, +>20x). Single-cell Strand-seq data from the same samples were used to +phase the assemblies without parental trios, yielding N50 contigs >25 Mbp +at QV > 40. SVs were discovered from the two haplotype assemblies of +each sample with the Phased Assembly Variant (PAV) caller against GRCh38, +and candidate SVs were orthogonally supported by at least one of seven +other sources (read-based callers MELT, PBSV and PALMER; Bionano optical +mapping; breakpoint k-mer analysis; PAV replication with LRA). This +yielded the integrated nonredundant callset of 107,590 insertion/deletion +SVs and 316 inversions. Population-scale allele frequencies (POP_*_AF) were +obtained by graph-based re-genotyping of the HGSVC2 SVs into the +3,202-sample 1000 Genomes short-read cohort with PanGenie (insertions and +deletions only).

    -Population-scale allele frequencies (POP_*_AF) were derived by imputing -the HGSVC2 SVs back into the full 1000 Genomes short-read cohort. These -fields are only available for insertions and deletions. +For display, the HGSVC2 v2.0 freeze-4 annotation tables +variants_freeze4_sv_insdel.tsv.gz (111,330 DEL+INS) and +variants_freeze4_sv_inv.tsv.gz (416 INV) were downloaded from the + +IGSR HGSVC2 v2.0 integrated-callset directory and merged into a single +bigBed; type-specific columns (POP_*_AF for insdel, RGN_REF_INNER for +inversions) are empty on the detail page when they do not apply.

    -Two tables were merged for display here: -variants_freeze4_sv_insdel.tsv.gz (DEL + INS, 111,330 records) and -variants_freeze4_sv_inv.tsv.gz (INV, 416 records). Type-specific -columns (POP_*_AF for insdel, RGN_REF_INNER for inversions) are shown as -empty on the detail page when they do not apply. +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv.

    Data Access

    The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=hgsvc2Sv.

    The bigBed is available from our download server as hgsvc2.bb. Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc2.bb -chrom=chr21 -start=0 -end=100000000 stdout.