038bd0cd3f7c84ee984905608dfdd27d02cc61ec
max
  Tue Jun 2 05:19:51 2026 -0700
[Claude] lrSv1kLin: add 1000 Genomes linear long-read SV subtrack (1,218 samples, hg38+hs1)

Two native VCFs from the Eichler lab (GRCh38 and CHM13/T2T-CHM13v2), merged
with Truvari v5.2.0 and annotated with population-level AFs (EUR, AMR, EAS,
AFR, SAS). Track is alpha-only; not added to lrSvAll -- data not yet published.
hg38: 587,779 SVs; hs1: 614,522 SVs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

, refs #36258

diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt
index 67d923f950d..b5366f791f4 100644
--- src/hg/makeDb/doc/hg38/lrSv.txt
+++ src/hg/makeDb/doc/hg38/lrSv.txt
@@ -468,15 +468,54 @@
 #      MR), bedSort + bedToBigBed.
 cd /hive/data/genomes/hg38/bed/lrSv/hprc2jasmine
 bash splitVcfsFilterSv.sh
 bash processJasmineSvSeq.sh hg38
 bash processJasmineSvSeq.sh hs1
 bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2JasmineBuild.sh
 # hg38: 335,494 SVs merged (full 22 autosomes; chrX/chrY absent from inputs)
 # hs1:  (built same way from CHM13v2 per-sample calls)
 #
 # Note: an earlier symbolic-ALT pipeline (splitVcfs.sh + symbolizeVcfs.sh
 # + processJasmine.sh, output/) was used as a workaround for a Jasmine
 # NPE in sequence comparison. Once the inputs are pre-filtered to
 # SV-sized records the NPE no longer fires, so the current pipeline runs
 # Jasmine with its normal sequence-aware merging. The symbolic-pipeline
 # scripts and output/ tree are retained for comparison.
+
+##########
+# 2026-06-01 Claude max
+#
+# lrSv1kLin: 1000 Genomes linear long-read SVs from 1,218 individuals.
+# Two native VCFs (GRCh38 and CHM13/T2T-CHM13v2) provided by user from
+# dropbox (rclone copy mhaeussldropbox:1KG_LR_SVs/).
+# SVs merged with Truvari v5.2.0; population-level allele frequencies
+# (EUR, AMR, EAS, AFR, SAS) annotated with bcftools fill-tags.
+# Only DEL and INS variant types are present.
+# GRCh38: 587,779 SVs (196,369 DEL, 391,410 INS)
+# CHM13:  614,522 SVs
+# NOTE: data was received from the Eichler lab via email and has not been
+# published. Do NOT release this track and do NOT add it to lrSvAll until
+# a preprint or paper is available. HTML page is a placeholder.
+
+mkdir -p /hive/data/genomes/hg38/bed/lrSv/1k-lin
+cd /hive/data/genomes/hg38/bed/lrSv/1k-lin
+
+# Input VCFs in /hive/data/genomes/hg38/bed/lrSv/1k-lin/input/:
+#   GRCh38_INSDEL_1218.vcf.gz  (hg38 native, 587,779 SVs)
+#   CHM13_INSDEL_1218.vcf.gz   (hs1/CHM13 native, 614,522 SVs)
+
+python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSv1kLin1218VcfToBed.py \
+    input/GRCh38_INSDEL_1218.vcf.gz lin1218.hg38.bed
+bedSort lin1218.hg38.bed lin1218.hg38.sorted.bed
+bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSv1kLin1218.as \
+    -tab lin1218.hg38.sorted.bed /hive/data/genomes/hg38/chrom.sizes lin1218.hg38.bb
+
+python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSv1kLin1218VcfToBed.py \
+    input/CHM13_INSDEL_1218.vcf.gz lin1218.hs1.bed
+bedSort lin1218.hs1.bed lin1218.hs1.sorted.bed
+bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSv1kLin1218.as \
+    -tab lin1218.hs1.sorted.bed /hive/data/genomes/hs1/chrom.sizes lin1218.hs1.bb
+
+# Symlinks for both assemblies
+mkdir -p /gbdb/hg38/lrSv /gbdb/hs1/lrSv
+ln -sf /hive/data/genomes/hg38/bed/lrSv/1k-lin/lin1218.hg38.bb /gbdb/hg38/lrSv/lin1218.bb
+ln -sf /hive/data/genomes/hg38/bed/lrSv/1k-lin/lin1218.hs1.bb /gbdb/hs1/lrSv/lin1218.bb