src/hg/makeDb/doc/hg38/lrSv.txt f058c8fe4601b223ff47468eb3525c05ccd03850

f058c8fe4601b223ff47468eb3525c05ccd03850
max
  Wed Apr 22 09:17:17 2026 -0700
srSv: new short-read SV supertrack, split out of lrSv

Move the three short-read SV/CNV subtracks (abelSv, onekg3202Sr,
tommoJpCnv) out of the Long-read SV supertrack into a new sibling
supertrack srSv (Short-read SVs), so the lrSv collection contains
only long-read callsets. Filter fields (svType, svLen, insLen, AC)
are mirrored at the srSv supertrack level to keep the UX parallel
to lrSv.

- trackDb: new human/srSv.ra with the three subtrack stanzas and
updated /gbdb/$D/srSv/... bigDataUrls; corresponding stanzas
removed from human/lrSv.ra. human/trackDb.ra now includes
srSv.ra. Also a new human/srSv.html overview page; the SR rows
and SR-specific paragraphs removed from human/lrSv.html.
- Scripts: abelSv/{abelSv.as,vcfToBed.py,build.sh} and lrSv/
{lrSv1kg3202Sr*, lrSvTommoJpCnvVcfToBedGraph.py} moved to
scripts/srSv/ with git mv (history preserved) and renamed to
drop the "lrSv" prefix. Internal path references in abelSvBuild.sh
and abelSvVcfToBed.py updated.
- makeDoc: doc/hg38/abelSv.txt renamed to doc/hg38/srSv.txt and
extended with the onekg3202Sr and tommoJpCnv sections moved from
lrSv.txt. lrSv.txt leaves a pointer.
- Data: /hive/data/genomes/hg38/bed/{abelSv,lrSv/onekg3202sr,
lrSv/tommoJpCnv} moved to /hive/data/genomes/hg38/bed/srSv/*.
/gbdb/hg38/lrSv/{onekg3202sr.bb,tommoJpCnv{Loss,Gain}.bw} and
/gbdb/hg38/abelSv/ removed and re-linked under /gbdb/hg38/srSv/.

refs #36258

diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt
index 914cb1d001b..f19e8c2e813 100644
--- src/hg/makeDb/doc/hg38/lrSv.txt
+++ src/hg/makeDb/doc/hg38/lrSv.txt
@@ -98,55 +98,32 @@
 # VCF downloaded from jMorp:
 # https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jsv1-20211208-af
 # File: tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz
 # 74,201 SVs: 37,981 DEL, 36,220 INS
 # Site-only VCF, merged with SURVIVOR v1.0.6
 # Native GRCh38 coordinates (confirmed via contig headers)
 # Trio-based: 111 families, includes Mendelian error rates
 
 # Convert VCF to BED and build bigBed
 python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJpVcfToBed.py \
     tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz tommoJp.bed
 bedSort tommoJp.bed tommoJp.sorted.bed
 bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJp.as \
     -tab tommoJp.sorted.bed /hive/data/genomes/hg38/chrom.sizes tommoJp.bb
 
-##########
-# 2026-04-21 Claude max
-
-# ToMMo 48KJPN CNV Frequency Panel - short-read CNV comparator to the
-# long-read tommoJpSv track above. 48,874 Japanese individuals,
-# short-read WGS, GATK CNV germline workflow at 1 kb bin resolution.
-# Data page: https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jcnvv1-20230828
-
-mkdir -p /hive/data/genomes/hg38/bed/lrSv/tommoJpCnv
-cd /hive/data/genomes/hg38/bed/lrSv/tommoJpCnv
-wget https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jcnvv1-20230828/files/tommo-jcnvv1-20230828-GRCh38.vcf.gz
-
-# The VCF has one record per 1 kb non-N bin with per-ALT sample counts
-# (SC) for each observed CN state (CN0..CN5). The converter collapses
-# the per-CN counts into two per-bin values (samples with CN<2, samples
-# with CN>2) and writes two bedGraphs, skipping bins with no CNV
-# carrier. Displayed as a multiWig transparent overlay via trackDb
-# (loss red / gain green) so CNV carrier-count density is visible at
-# any zoom level. 2,006,905 bins kept.
-python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py \
-    tommo-jcnvv1-20230828-GRCh38.vcf.gz tommoJpCnvLoss.bg tommoJpCnvGain.bg
-sort -k1,1 -k2,2n tommoJpCnvLoss.bg > tommoJpCnvLoss.sorted.bg
-sort -k1,1 -k2,2n tommoJpCnvGain.bg > tommoJpCnvGain.sorted.bg
-bedGraphToBigWig tommoJpCnvLoss.sorted.bg /hive/data/genomes/hg38/chrom.sizes tommoJpCnvLoss.bw
-bedGraphToBigWig tommoJpCnvGain.sorted.bg /hive/data/genomes/hg38/chrom.sizes tommoJpCnvGain.bw
+# ToMMo 48K CNV short-read comparator moved to the srSv supertrack.
+# See doc/hg38/srSv.txt for that build.
 
 ##########
 # 2026-03-26 Claude max
 
 # Fourth subtrack: AoU 1K - SVs from 1,027 AoU individuals (PacBio HiFi)
 # Paper: Garimella et al. 2025, medRxiv, PMID 41256123
 # Data: Supplementary media-2 from preprint
 
 mkdir -p /hive/data/genomes/hg38/bed/lrSv/aou1k
 cd /hive/data/genomes/hg38/bed/lrSv/aou1k
 
 # Downloaded supplementary CSV from preprint (media-2.gz)
 # 541,049 SVs: 444,524 INS, 96,525 DEL (autosomes only)
 # Population-specific AFs (AFR, AMR, EAS, EUR, SAS)
 # Gene annotations (OMIM, disease, cancer, ACMG), regulatory elements
@@ -321,53 +298,32 @@
 wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/release/v2.0/integrated_callset/variants_freeze4_sv_inv.tsv.gz
 
 # Two annotation tables are complementary (same structure as HGSVC3): the
 # insdel table holds DEL + INS with POP_*_AF population allele frequencies
 # imputed back into the 1000 Genomes cohort; the inv table holds INV with
 # an RGN_REF_INNER column. The converter merges them into a single bigBed.
 
 python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHgsvc2TsvToBed.py \
     variants_freeze4_sv_insdel.tsv.gz \
     variants_freeze4_sv_inv.tsv.gz \
     hgsvc2.bed
 bedSort hgsvc2.bed hgsvc2.sorted.bed
 bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvHgsvc2.as \
     -tab hgsvc2.sorted.bed /hive/data/genomes/hg38/chrom.sizes hgsvc2.bb
 
-##########
-# 2026-04-20 Claude max
-
-# Twelfth subtrack: 1000 Genomes 3,202-sample Illumina SHORT-READ GATK-SV
-# release. Included in the lrSv collection solely as a short-read
-# comparator; it is NOT a long-read dataset.
-# Paper: Byrska-Bishop et al. 2022, Cell, PMID 36055201
-# Data: 1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz (IGSR FTP)
-
-mkdir -p /hive/data/genomes/hg38/bed/lrSv/onekg3202sr
-cd /hive/data/genomes/hg38/bed/lrSv/onekg3202sr
-wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20210124.SV_Illumina_Integration/1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz
-
-# 173,366 site-level SVs across 7 classes (DEL, INS, DUP, INV, CPX, CNV,
-# CTX) with AC/AN/AF and per-superpopulation AFs (AFR/AMR/ASN/EUR/SAN).
-# The converter extracts site-level INFO into bed9+, preserving the
-# FILTER column so users can see PASS vs LowQual / HWE / etc.
-
-python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSv1kg3202SrVcfToBed.py \
-    1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz onekg3202sr.bed
-bedSort onekg3202sr.bed onekg3202sr.sorted.bed
-bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSv1kg3202Sr.as \
-    -tab onekg3202sr.sorted.bed /hive/data/genomes/hg38/chrom.sizes onekg3202sr.bb
+# 1KG 3202 short-read comparator moved to the srSv supertrack.
+# See doc/hg38/srSv.txt for that build.
 
 ##########
 # 2026-04-20 Claude max
 
 # Thirteenth subtrack: HPRC release-2 pangenome SVs (233 samples).
 # No peer-reviewed publication yet; see HPRC release page:
 #   https://humanpangenome.org/hprc-data-release-2/
 # Sample list (alignments v2.0):
 #   https://github.com/human-pangenomics/hprc_intermediate_assembly/blob/main/data_tables/pangenomes/alignments_v2.0.csv
 
 mkdir -p /hive/data/genomes/hg38/bed/lrSv/hprc2
 cd /hive/data/genomes/hg38/bed/lrSv/hprc2
 
 # Pangenome graph (referenced in the doc html):
 wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-grch38.sv.gfa.gz