6b0d68657267f1e02c47d4224ea62446bbbb2ba0 max Fri May 22 06:55:52 2026 -0700 small non-AI changes to the html docs pages of the long-read SV tracks diff --git src/hg/makeDb/trackDb/human/hprc2JasmineSv.html src/hg/makeDb/trackDb/human/hprc2JasmineSv.html new file mode 100644 index 00000000000..bf95b3f88d7 --- /dev/null +++ src/hg/makeDb/trackDb/human/hprc2JasmineSv.html @@ -0,0 +1,106 @@ +
+This track shows structural variants (SVs) called across the 231 HPRC v2 +haplotype-resolved assemblies and merged with +Jasmine +into a single non-redundant callset per assembly path. Each sample was +processed through 14 SV callers spanning read-mapping, assembly-based and +graph-based approaches; per-sample VCFs were then merged across samples +with Jasmine using both positional and sequence-identity criteria. +
++The hg38 track contains 335,494 merged SVs (insertions and deletions +≥ 30 bp). The hs1 track is built the same way from the T2T-CHM13 +calls. +
+ ++Items are colored by SV type: +
+Coordinates follow these conventions: +
+Filters are available for SV type, SV length, carrier sample count, carrier +frequency, the number of supporting callers and the specific callers +(e.g. require both PAV and dipcall). The Carrier Sample Count filter +operates on the SUPP field from Jasmine: the number of input samples in +which the SV was called. The Allele Number (AN) field is fixed at 231 +(the merged sample count); the carrier frequency is SUPP/231. Because +Jasmine collapses input genotypes, per-haplotype AC/AF are not preserved. +
+ ++Per-sample SV calls were produced on the 231 HPRC v2 haplotype-resolved +assemblies using 14 SV callers: DELLY, DeBreak, DeepVariant, PAV, SVDSS, +SVIM, SVIM-asm, Sniffles2, cuteSV, cuteSV-asm, dipcall, longcallD, pbsv and +sawfish. The per-sample, multi-caller calls were harmonized into three +per-sample VCFs (one per pipeline: dipcall, PAV, longcallD); the +SOURCES field on each record records which pipelines contributed, +and CALLERS records the underlying callers in agreement. For this +track the harmonized per-sample VCFs were split per chromosome and +filtered to SV-sized records (|alt − ref| ≥ 30 bp), +keeping the explicit REF/ALT sequences. The per-chromosome files were +merged across samples with Jasmine's default sequence-aware mode using +--ignore_merged_inputs --normalize_type, so insertions at the +same position are collapsed both by location/length and by sequence +similarity (Jaccard k-mer comparison). +
++The per-chromosome VCFs are concatenated into one merged VCF per assembly, +then converted to bigBed. Build commands are recorded in the UCSC makeDoc +for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schema live in + +makeDb/scripts/lrSv (files starting with lrSvHprc2Jasmine). +
+ ++The data can be explored interactively in table format with the +Table Browser or the +Data Integrator, and accessed +programmatically through our API, +track=hprc2JasmineSv. +
++The bigBed is available from our download server for both assemblies: +
+Thanks to Wen-Wei Liao, who ran all variant callers on the HPRC v2 +assemblies, and to the Ira Hall lab for the multi-caller HPRC v2 SV +callsets used as input here. This data set is not yet described in a +formal peer-reviewed publication; this track will be updated when the +manuscript becomes available. +