src/hg/makeDb/trackDb/human/lrSv.html f058c8fe4601b223ff47468eb3525c05ccd03850

f058c8fe4601b223ff47468eb3525c05ccd03850
max
  Wed Apr 22 09:17:17 2026 -0700
srSv: new short-read SV supertrack, split out of lrSv

Move the three short-read SV/CNV subtracks (abelSv, onekg3202Sr,
tommoJpCnv) out of the Long-read SV supertrack into a new sibling
supertrack srSv (Short-read SVs), so the lrSv collection contains
only long-read callsets. Filter fields (svType, svLen, insLen, AC)
are mirrored at the srSv supertrack level to keep the UX parallel
to lrSv.

- trackDb: new human/srSv.ra with the three subtrack stanzas and
updated /gbdb/$D/srSv/... bigDataUrls; corresponding stanzas
removed from human/lrSv.ra. human/trackDb.ra now includes
srSv.ra. Also a new human/srSv.html overview page; the SR rows
and SR-specific paragraphs removed from human/lrSv.html.
- Scripts: abelSv/{abelSv.as,vcfToBed.py,build.sh} and lrSv/
{lrSv1kg3202Sr*, lrSvTommoJpCnvVcfToBedGraph.py} moved to
scripts/srSv/ with git mv (history preserved) and renamed to
drop the "lrSv" prefix. Internal path references in abelSvBuild.sh
and abelSvVcfToBed.py updated.
- makeDoc: doc/hg38/abelSv.txt renamed to doc/hg38/srSv.txt and
extended with the onekg3202Sr and tommoJpCnv sections moved from
lrSv.txt. lrSv.txt leaves a pointer.
- Data: /hive/data/genomes/hg38/bed/{abelSv,lrSv/onekg3202sr,
lrSv/tommoJpCnv} moved to /hive/data/genomes/hg38/bed/srSv/*.
/gbdb/hg38/lrSv/{onekg3202sr.bb,tommoJpCnv{Loss,Gain}.bw} and
/gbdb/hg38/abelSv/ removed and re-linked under /gbdb/hg38/srSv/.

refs #36258

diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html
index 11b1e7ea0b0..311baf88969 100644
--- src/hg/makeDb/trackDb/human/lrSv.html
+++ src/hg/makeDb/trackDb/human/lrSv.html
@@ -3,33 +3,33 @@
 This track collection contains structural variant (SV) calls derived from long-read sequencing
 studies. Structural variants are genomic rearrangements larger than ~50 bp, including
 deletions, insertions, duplications, inversions, and translocations. Long-read sequencing
 technologies can span repetitive regions and resolve complex rearrangements
 that are difficult to detect with short-read methods.
 </p>
 
 <h3>Available Datasets</h3>
 <p>
 SV length statistics (min / median / max) are computed from the <tt>svLen</tt>
 field of each track, in base pairs. Some tracks include sites with
 <tt>svLen=0</tt> (complex events where the reference and alternate alleles
 differ in sequence but not in length).
 </p>
 <p>
-All subtracks below are long-read callsets, except the last two rows
-(CCDG 17,795 and 1KG 3202, both Illumina short-read), which are
-included as short-read comparators.
+For short-read structural-variant comparators (CCDG 17,795, 1KG 3202,
+ToMMo 48K CNV) see the companion
+<a href="hgTrackUi?g=srSv">Short-read SVs</a> supertrack.
 </p>
 <table class="stdTbl">
 <tr>
   <th>Dataset</th>
   <th>N samples</th>
   <th>Cohort / disease</th>
   <th>Sequencing</th>
   <th>SVs</th>
   <th>Min</th>
   <th>Median</th>
   <th>Max</th>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=colorsDbSv">CoLoRSdb</a></td>
   <td>1,427</td>
@@ -68,37 +68,30 @@
   <td>148,375</td>
   <td>2</td>
   <td>177</td>
   <td>49,171</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=tommoJpSv">ToMMo Japanese</a></td>
   <td>333 (111 trios)</td>
   <td>Japanese, general population</td>
   <td>ONT</td>
   <td>74,201</td>
   <td>51</td>
   <td>162</td>
   <td>99,980</td>
 </tr>
-<tr>
-  <td><a href="hgTrackUi?g=tommoJpCnv">ToMMo 48K CNV</a></td>
-  <td>48,874</td>
-  <td>Japanese, general population (<b>short-read comparator</b> for ToMMo long-read SVs)</td>
-  <td><b>Illumina short-read</b> (GATK CNV, 1 kb bins, shown as two bigWigs)</td>
-  <td colspan="4">~2M bins with CNV carriers; not comparable to per-SV counts above</td>
-</tr>
 <tr>
   <td><a href="hgTrackUi?g=aou1kSv">AoU 1K</a></td>
   <td>1,027</td>
   <td>All of Us, self-identified Black/African American</td>
   <td>PacBio HiFi</td>
   <td>541,049</td>
   <td>50</td>
   <td>152</td>
   <td>9,998</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=ga4kSv">GA4K</a></td>
   <td>502</td>
   <td>Children's Mercy, pediatric rare disease probands + families</td>
   <td>PacBio HiFi</td>
@@ -175,50 +168,30 @@
   <td>74,552</td>
   <td>50</td>
   <td>160</td>
   <td>190,088,222</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=chirmade101Sv">SVatalog 101</a></td>
   <td>101</td>
   <td>Long-read WGS cohort for GWAS LD fine-mapping (SickKids)</td>
   <td>long-read</td>
   <td>87,183</td>
   <td>4</td>
   <td>160</td>
   <td>1,321,484</td>
 </tr>
-<tr>
-  <td><a href="hgTrackUi?g=abelSv">CCDG 17,795 (short-read)</a></td>
-  <td>17,795</td>
-  <td>NHGRI CCDG + PAGE + SGDP (<b>short-read comparator</b>)</td>
-  <td><b>Illumina short-read</b></td>
-  <td>737,998</td>
-  <td>-1</td>
-  <td>-1</td>
-  <td>217,985,413</td>
-</tr>
-<tr>
-  <td><a href="hgTrackUi?g=onekg3202Sr">1KG 3202 (short-read)</a></td>
-  <td>3,202</td>
-  <td>1000 Genomes expanded cohort (<b>short-read comparator</b>)</td>
-  <td><b>Illumina short-read</b></td>
-  <td>173,366</td>
-  <td>1</td>
-  <td>314</td>
-  <td>154,807,729</td>
-</tr>
 </table>
 
 <h3>CoLoRSdb SVs (<a href="hgTrackUi?g=colorsDbSv">colorsDbSv</a>)</h3>
 <p>
 Structural variants from the Consortium of Long-Read Sequencing database
 (CoLoRSdb), from 1,427 PacBio HiFi long-read whole-genome sequences.
 426,239 SVs (insertions, deletions, inversions) called with pbsv and
 merged with Jasmine, with allele frequencies, genotype counts and
 Hardy-Weinberg statistics across the cohort.
 </p>
 
 <h3>Han 945 SVs (<a href="hgTrackUi?g=han945Sv">han945Sv</a>)</h3>
 <p>
 Structural variants from 945 Han Chinese individuals. 111,288 SVs
 (deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR.
@@ -241,40 +214,30 @@
 Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT).
 161,332 SVs annotated with SVAN, classifying insertions and deletions by mechanism
 of origin (mobile elements, VNTRs, processed pseudogenes, etc.).
 Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver.
 This is a separate dataset from the 1KG ONT 100 (Gustafson et al.) track above;
 the 1,019 samples here do not overlap with the 100 samples in that release.
 </p>
 
 <h3>ToMMo Japanese SVs (<a href="hgTrackUi?g=tommoJpSv">tommoJpSv</a>)</h3>
 <p>
 Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical
 Megabank (ToMMo). 74,201 SVs (deletions and insertions) with trio-based Mendelian
 error rates and allele frequencies.
 </p>
 
-<h3>ToMMo 48K CNV SR (<a href="hgTrackUi?g=tommoJpCnv">tommoJpCnv</a>) - short-read comparator</h3>
-<p>
-<b>Short-read CNV comparator for the ToMMo long-read SV track above.</b>
-Per-1 kb-bin copy-number carrier counts from short-read whole-genome
-sequencing of 48,874 Japanese individuals (jMorp 48KJPN-CNV Frequency
-Panel, release 20230828), called with GATK CNV germline workflows.
-Shown as a multiWig overlay: red = samples with copy-number loss
-(CN&lt;2) per bin, green = samples with gain (CN&gt;2) per bin.
-</p>
-
 <h3>AoU 1K SVs (<a href="hgTrackUi?g=aou1kSv">aou1kSv</a>)</h3>
 <p>
 Structural variants from 1,027 individuals from the All of Us (AoU) Research Program,
 sequenced with PacBio HiFi long reads. 541,049 SVs (insertions and deletions)
 with population-specific allele frequencies, gene annotations, and clinical
 trait associations.
 </p>
 
 <h3>GA4K SVs (<a href="hgTrackUi?g=ga4kSv">ga4kSv</a>)</h3>
 <p>
 Structural variants from 502 probands and family members enrolled in the
 Genomic Answers for Kids (GA4K) pediatric rare-disease program at Children's
 Mercy Research Institute, sequenced with PacBio HiFi long reads. 115,554
 replicated SVs (deletions, insertions, duplications, inversions) called with
 pbsv and merged with JASMINE. The matched GA4K small-variant callset (SNVs
@@ -325,40 +288,30 @@
 incidental Lewy body disease, and healthy controls) sequenced with PacBio
 HiFi long reads. 74,552 high-confidence SVs (deletions, insertions,
 duplications, inversions) with per-cohort allele frequencies and
 case-control carrier-rate differentials, from Kim et al. 2026.
 </p>
 
 <h3>SVatalog 101 SVs (<a href="hgTrackUi?g=chirmade101Sv">chirmade101Sv</a>)</h3>
 <p>
 Structural variants from 101 long-read whole-genome sequences released
 alongside the GWAS SVatalog tool (Chirmade et al. 2026). 87,183 SVs
 (deletions, insertions, duplications, inversions and complex events)
 annotated with gene overlaps, ClinGen / gnomAD constraint scores,
 OMIM / ClinVar / DGV / Decipher regional annotations.
 </p>
 
-<h3>1KG 3202 SVs (<a href="hgTrackUi?g=onekg3202Sr">onekg3202Sr</a>) - short-read comparator</h3>
-<p>
-<b>This is a short-read dataset, included for comparison only.</b>
-Structural variants from the expanded 1000 Genomes cohort of 3,202
-Illumina NovaSeq short-read whole genomes (Byrska-Bishop et al. 2022),
-called with the GATK-SV / svtools pipeline. 173,366 SVs (DEL, INS, DUP,
-INV, CPX, CNV, CTX) with per-superpopulation allele frequencies. Useful
-for contrasting short-read vs. long-read SV breakpoints and for spotting
-variants unique to long-read data.
-</p>
 
 <h2>Data Access</h2>
 <p>
 Each subtrack has its own documentation page with details on how to download
 and intersect the underlying annotations.
 </p>
 
 <h2>References</h2>
 
 <p>
 Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y <em>et al</em>.
 <a href="https://doi.org/10.1038/s41467-025-56661-9" target="_blank">
 Long-read sequencing of 945 Han individuals identifies structural variants associated with
 phenotypic diversity and disease susceptibility</a>.
 <em>Nat Commun</em>. 2025 Feb 10;16(1):1494.