695f40f9d6139a4df393522c067f1702aff8d3bd
max
  Wed Apr 22 03:13:39 2026 -0700
varFreqs: add SVatalog 101 short-read SNV frequencies subtrack

SNV/indel allele frequencies from the 101-sample GWAS SVatalog cohort
(Chirmade et al. 2026, Heredity, PMID 41203876), called from 10X
Genomics linked short-read WGS with GATK HaplotypeCaller v4.0.0.0 and
phased with SHAPEIT v4.2.0. Sibling of the lrSv chirmade101Sv
structural-variant track, which is built from the same 101 samples.

8,814,835 autosomal + chrX sites. Source release ships only AF; AC and
AN are synthesized in the emitted VCF as AC=round(AF*202) and AN=202
(2*101 diploid), with the gnomAD v3.1 non-Finnish European AF and dbSNP
rsID passed through as GNOMAD_NFE_AF and RSID info fields. VCF is
bgzipped + tabix-indexed (172 MB + 1.6 MB .tbi).

Files:
- scripts/varFreqs/svatalogFreqToVcf.py (new): per-chrom allele-freq
TSV -> single VCF with hg38 ##contig header
- trackDb/human/varFreqs.ra: new svatalogSnv vcfTabix subtrack
- trackDb/human/svatalogSnv.html (new): doc page
- trackDb/human/varFreqs.html: new row in Available Datasets table
- doc/hg38/varFreqs.txt: wget-free build block (input files were
downloaded manually from Zenodo 13367574)

Note: the All Databases Combined varFreqs bigBed has NOT been rebuilt
to include this new source yet; a subsequent merge pass will add it.

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/svatalogSnv.html src/hg/makeDb/trackDb/human/svatalogSnv.html
new file mode 100644
index 00000000000..20340d84bef
--- /dev/null
+++ src/hg/makeDb/trackDb/human/svatalogSnv.html
@@ -0,0 +1,95 @@
+<h2>Description</h2>
+<p>
+This track shows small-variant (single-nucleotide variant and short-indel)
+allele frequencies from 101 samples released as part of the
+<a href="https://svatalog.research.sickkids.ca/" target="_blank">GWAS
+SVatalog</a> tool (Chirmade et al. 2026). The same 101-sample cohort
+underlies the structural-variant sibling track
+<a href="hgTrackUi?g=chirmade101Sv">SVatalog 101 SVs</a> in the Long-read
+SV collection; this track provides the companion small-variant allele
+frequencies that SVatalog uses to compute linkage disequilibrium between
+SNPs and SVs.
+</p>
+<p>
+The callset contains approximately 8.8 million sites across the autosomes
+and chromosome X. Each site reports the alternate allele frequency in the
+101 samples, the gnomAD v3.1 non-Finnish European allele frequency (when
+annotated in the source release), and a dbSNP rsID when one was available.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+The track uses the standard VCF display. Variants appear as colored marks
+along the genome; clicking an item opens the detail page with per-site
+INFO fields: AF, AC, AN, the gnomAD v3.1 NFE allele frequency
+(<tt>GNOMAD_NFE_AF</tt>) and the dbSNP rsID (<tt>RSID</tt>).
+</p>
+<p>
+Note on AC/AN: the source allele-frequency release only ships AF. For this
+track AC and AN are synthesized by assuming the full 2x101 = 202-allele
+denominator (AN=202, AC=round(AF x 202)); these are therefore
+approximations at sites where some samples had missing genotypes.
+</p>
+
+<h2>Methods</h2>
+<p>
+Small variants were called from 10X Genomics linked-read (paired-end
+short-read) whole-genome sequencing of the 101 SVatalog samples with
+<a href="https://gatk.broadinstitute.org/" target="_blank">GATK
+HaplotypeCaller v4.0.0.0</a> using default parameters. Calls were phased
+across the cohort with
+<a href="https://odelaneau.github.io/shapeit4/" target="_blank">SHAPEIT
+v4.2.0</a>, and per-site alternate allele frequencies were computed on
+the resulting joint callset. Structural variants, released as a separate
+lrSv subtrack, were called from long-read data and merged with these
+SNPs for the LD analyses reported by GWAS SVatalog.
+</p>
+<p>
+For display here, the per-chromosome allele-frequency text files
+(<tt>chr{1..22,X}_allele_freq.txt</tt>) were converted to a single
+sites-only VCF with approximate AC/AN fields and bgzipped / tabix
+indexed. The step-by-step build commands are recorded in the UCSC
+makeDoc
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">
+doc/hg38/varFreqs.txt</a>; the converter script lives in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">
+makeDb/scripts/varFreqs</a>.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The VCF file for this track is available from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/svatalog/" target="_blank">our
+download server</a> as <tt>svatalog.vcf.gz</tt> (with <tt>.tbi</tt> index).
+Regions can be extracted with <tt>tabix</tt>:
+<tt>tabix http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/svatalog/svatalog.vcf.gz chr21:1-100000000</tt>.
+</p>
+<p>
+The original per-chromosome allele-frequency tables and the accompanying
+LD statistics used by the SVatalog tool are available from the
+companion Zenodo deposit:
+<a href="https://zenodo.org/records/13367574" target="_blank">zenodo.org/records/13367574</a>.
+The SVatalog web tool itself is at
+<a href="https://svatalog.research.sickkids.ca/" target="_blank">svatalog.research.sickkids.ca</a>.
+</p>
+
+<h2>Credits</h2>
+<p>
+Thanks to Chirmade, Strug and colleagues at The Hospital for Sick
+Children and the University of Toronto for releasing this annotated
+SNP frequency callset alongside the GWAS SVatalog tool.
+</p>
+
+<h2>References</h2>
+
+
+<p>
+Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G,
+Lin F, Keenan K, Patel RV <em>et al</em>.
+<a href="https://doi.org/10.1038/s41437-025-00809-2" target="_blank">
+GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations</a>.
+<em>Heredity (Edinb)</em>. 2026 Mar;135(3):199-210.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41203876" target="_blank">41203876</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13031531/" target="_blank">PMC13031531</a>
+</p>
+