6b0d68657267f1e02c47d4224ea62446bbbb2ba0 max Fri May 22 06:55:52 2026 -0700 small non-AI changes to the html docs pages of the long-read SV tracks diff --git src/hg/makeDb/trackDb/human/hprc2JasmineSv.html src/hg/makeDb/trackDb/human/hprc2JasmineSv.html new file mode 100644 index 00000000000..bf95b3f88d7 --- /dev/null +++ src/hg/makeDb/trackDb/human/hprc2JasmineSv.html @@ -0,0 +1,106 @@ +

Description

+This track shows structural variants (SVs) called across the 231 HPRC v2 +haplotype-resolved assemblies and merged with +Jasmine +into a single non-redundant callset per assembly path. Each sample was +processed through 14 SV callers spanning read-mapping, assembly-based and +graph-based approaches; per-sample VCFs were then merged across samples +with Jasmine using both positional and sequence-identity criteria. +

+The hg38 track contains 335,494 merged SVs (insertions and deletions +≥ 30 bp). The hs1 track is built the same way from the T2T-CHM13 +calls. +

+ +

Display Conventions and Configuration

+Items are colored by SV type: +

Insertion (INS)
Deletion (DEL)

+Coordinates follow these conventions: +

Insertions are drawn as a 1 bp anchor: chromStart is +the reference base immediately before the inserted sequence (POS−1 +in 0-based BED), and chromEnd = chromStart + 1. +The inserted-sequence length is reported in the Insertion Length +(insLen) field; the on-screen feature width does not depend on it.
Deletions span the deleted reference interval. chromStart += POS−1, chromEnd = chromStart + |SVLEN|, so +the feature width on the browser equals the deletion length.

+The bigBed stores type, length and merge metadata; the explicit +inserted/deleted sequences are not carried over from the Jasmine-merged +VCF. +

+Filters are available for SV type, SV length, carrier sample count, carrier +frequency, the number of supporting callers and the specific callers +(e.g. require both PAV and dipcall). The Carrier Sample Count filter +operates on the SUPP field from Jasmine: the number of input samples in +which the SV was called. The Allele Number (AN) field is fixed at 231 +(the merged sample count); the carrier frequency is SUPP/231. Because +Jasmine collapses input genotypes, per-haplotype AC/AF are not preserved. +

+ +

Methods

+Per-sample SV calls were produced on the 231 HPRC v2 haplotype-resolved +assemblies using 14 SV callers: DELLY, DeBreak, DeepVariant, PAV, SVDSS, +SVIM, SVIM-asm, Sniffles2, cuteSV, cuteSV-asm, dipcall, longcallD, pbsv and +sawfish. The per-sample, multi-caller calls were harmonized into three +per-sample VCFs (one per pipeline: dipcall, PAV, longcallD); the +SOURCES field on each record records which pipelines contributed, +and CALLERS records the underlying callers in agreement. For this +track the harmonized per-sample VCFs were split per chromosome and +filtered to SV-sized records (|alt − ref| ≥ 30 bp), +keeping the explicit REF/ALT sequences. The per-chromosome files were +merged across samples with Jasmine's default sequence-aware mode using +--ignore_merged_inputs --normalize_type, so insertions at the +same position are collapsed both by location/length and by sequence +similarity (Jaccard k-mer comparison). +

+The per-chromosome VCFs are concatenated into one merged VCF per assembly, +then converted to bigBed. Build commands are recorded in the UCSC makeDoc +for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schema live in + +makeDb/scripts/lrSv (files starting with lrSvHprc2Jasmine). +

+ +

Data Access

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator, and accessed +programmatically through our API, +track=hprc2JasmineSv. +

+The bigBed is available from our download server for both assemblies: +

GRCh38: + +hg38 hprc2Jasmine.bb
T2T-CHM13: + +hs1 hprc2Jasmine.bb

+Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hprc2Jasmine.bb -chrom=chr21 -start=0 -end=100000000 stdout. +

+ +

Credits

+Thanks to Wen-Wei Liao, who ran all variant callers on the HPRC v2 +assemblies, and to the Ira Hall lab for the multi-caller HPRC v2 SV +callsets used as input here. This data set is not yet described in a +formal peer-reviewed publication; this track will be updated when the +manuscript becomes available. +