7594507ca126d5242346787e42e13c52ea7709b1
max
  Fri Apr 17 08:40:31 2026 -0700
Add lrSv supertrack: long-read structural variants from 9 studies (hg38).

#Preview2 week - bugs introduced now will need a build patch to fix
Sub-tracks (all bigBed 9+):
han945Sv     - 945 Han Chinese, ONT (Gong 2025, PMID 39929826)
lrSv1kgOnt   - 1019 1000 Genomes, ONT, SVAN-annotated (Schloissnig 2025,
PMID 40702182; lifted from hs1)
tommoJpSv    - 333 Japanese (111 trios), ONT (Otsuki 2022, PMID 36127505)
aou1kSv      - 1027 All of Us, PacBio HiFi (Garimella 2025, PMID 41256123)
ga4kSv       - 502 GA4K pediatric rare disease, PacBio HiFi
(Cohen 2022, PMID 35305867)
decodeSv     - 3622 Icelanders, ONT (Beyter 2021, PMID 33972781)
hgsvc3Sv     - 65 HGSVC3 diverse haplotype-resolved assemblies, HiFi+ONT
(Logsdon 2025, PMID 40702183; merges insdel+inv tables)
kwanhoSv     - 100 post-mortem brains (PD/ILBD/HC), PacBio HiFi
(Kim 2026, PMID 41929179)
chirmade101Sv - 101 long-read WGS GWAS SVatalog cohort
(Chirmade 2026, PMID 41203876)

Includes per-track conversion scripts and autoSql under
scripts/lrSv/, the supertrack summary table in lrSv.html, and a
consolidated makeDoc at doc/hg38/lrSv.txt.

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/ga4kSv.html src/hg/makeDb/trackDb/human/ga4kSv.html
new file mode 100644
index 00000000000..0a4b668a46e
--- /dev/null
+++ src/hg/makeDb/trackDb/human/ga4kSv.html
@@ -0,0 +1,103 @@
+<h2>Description</h2>
+<p>
+This track shows structural variants (SVs) identified by PacBio HiFi long-read
+sequencing of probands and their families enrolled in the Genomic Answers for
+Kids (GA4K) program at Children's Mercy Research Institute. GA4K is a
+longitudinal pediatric genomics initiative that aims to enroll 30,000 children
+with suspected rare genetic disorders, together with their parents, to build
+a large-scale resource of clinical and genomic data.
+</p>
+<p>
+The callset contains 115,554 SVs (52,564 deletions, 58,219 insertions, 4,408
+duplications, 363 inversions) from 502 sequenced samples. Variants are
+site-level (no per-sample genotypes) and each SV has been replicated, meaning
+that it was either observed in two or more unrelated GA4K individuals, or
+matched an SV from an external long-read reference set (Decode or the Human
+Pangenome Reference Consortium).
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Items are colored by SV type:
+<ul>
+<li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
+<li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
+<li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li>
+<li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li>
+</ul>
+</p>
+<p>
+Insertions are placed at the insertion site with a width of 1 bp; deletions,
+duplications and inversions span the affected interval. Filters are available
+for SV type, SV length, carrier-sample count and allele frequency. The detail
+page also shows the total number of samples genotyped at each site.
+</p>
+
+<h2>Methods</h2>
+<p>
+Samples were sequenced on PacBio Revio and Sequel II instruments with HiFi
+chemistry. Single-sample SV callsets were produced with pbsv and then merged
+across the cohort with JASMINE v1.1.4 (<tt>jasmine --output-genotypes</tt>),
+which clusters equivalent SVs across samples and writes a site-level multi-sample
+VCF.
+</p>
+<p>
+To reduce false positives, the merged VCF was filtered to retain only SVs that
+were replicated in at least two independent observations: either (1) matching a
+second SV from another unrelated Children's Mercy (CMH) individual within the
+same Jasmine cluster, or (2) matching an SV from the Decode Icelandic or Human
+Pangenome Reference Consortium (HPRC) callsets using
+<tt>svpack match</tt> with default settings.
+</p>
+<p>
+Carrier counts (SVC), total sample counts (SVN) and allele frequencies
+(SVF = SVC/SVN) were recomputed on the replicated callset.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The data can be explored interactively in table format with the
+<a href="../cgi-bin/hgTables">Table Browser</a> or the
+<a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from there
+to spreadsheet or tab-sep tables. From scripts, the data can be accessed
+through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>ga4kSv</i>.
+</p>
+<p>
+For automated download and analysis, the annotation is stored in a bigBed file
+that can be downloaded from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
+download server</a>. The file for this track is called <tt>ga4kSv.bb</tt>.
+Individual regions or the whole annotation can be obtained using the
+<tt>bigBedToBed</tt> utility, available as a precompiled binary or from source
+as described on our
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">utilities
+page</a>.
+Example:
+<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/ga4kSv.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
+</p>
+<p>
+The original VCF is available from the Children's Mercy Research Institute
+GA4K data release at
+<a href="https://github.com/ChildrensMercyResearchInstitute/GA4K" target="_blank">
+github.com/ChildrensMercyResearchInstitute/GA4K</a>.
+</p>
+
+<h2>Credits</h2>
+<p>
+Thanks to the Children's Mercy Research Institute and the Genomic Answers
+for Kids participants and their families for making this dataset publicly
+available.
+</p>
+
+<h2>References</h2>
+
+
+<p>
+Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L,
+Baybayan P, Belden B <em>et al</em>.
+<a href="https://linkinghub.elsevier.com/retrieve/pii/S1098-3600(22)00653-0" target="_blank">
+Genomic answers for children: Dynamic analyses of &gt;1000 pediatric rare disease genomes</a>.
+<em>Genet Med</em>. 2022 Jun;24(6):1336-1348.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35305867" target="_blank">35305867</a>
+</p>
+