src/hg/makeDb/trackDb/human/varFreqs.html 695f40f9d6139a4df393522c067f1702aff8d3bd

695f40f9d6139a4df393522c067f1702aff8d3bd
max
  Wed Apr 22 03:13:39 2026 -0700
varFreqs: add SVatalog 101 short-read SNV frequencies subtrack

SNV/indel allele frequencies from the 101-sample GWAS SVatalog cohort
(Chirmade et al. 2026, Heredity, PMID 41203876), called from 10X
Genomics linked short-read WGS with GATK HaplotypeCaller v4.0.0.0 and
phased with SHAPEIT v4.2.0. Sibling of the lrSv chirmade101Sv
structural-variant track, which is built from the same 101 samples.

8,814,835 autosomal + chrX sites. Source release ships only AF; AC and
AN are synthesized in the emitted VCF as AC=round(AF*202) and AN=202
(2*101 diploid), with the gnomAD v3.1 non-Finnish European AF and dbSNP
rsID passed through as GNOMAD_NFE_AF and RSID info fields. VCF is
bgzipped + tabix-indexed (172 MB + 1.6 MB .tbi).

Files:
- scripts/varFreqs/svatalogFreqToVcf.py (new): per-chrom allele-freq
TSV -> single VCF with hg38 ##contig header
- trackDb/human/varFreqs.ra: new svatalogSnv vcfTabix subtrack
- trackDb/human/svatalogSnv.html (new): doc page
- trackDb/human/varFreqs.html: new row in Available Datasets table
- doc/hg38/varFreqs.txt: wget-free build block (input files were
downloaded manually from Zenodo 13367574)

Note: the All Databases Combined varFreqs bigBed has NOT been rebuilt
to include this new source yet; a subsequent merge pass will add it.

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index 159a3373fda..0e9f236cd67 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -257,30 +257,39 @@
   <td>552</td>
   <td>PacBio HiFi long-read WGS</td>
   <td>Genomic Answers for Kids: pediatric rare-disease probands and families (Children's Mercy)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=colorsDbSnv">CoLoRSdb v1.2.0</a></td>
   <td>Multi-national</td>
   <td>1,027</td>
   <td>PacBio HiFi long-read WGS</td>
   <td>Consortium of Long Read Sequencing: aggregated population-consented samples across multiple research cohorts</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
+<tr>
+  <td><a href="hgTrackUi?g=svatalogSnv">SVatalog 101</a></td>
+  <td>Canada (SickKids)</td>
+  <td>101</td>
+  <td>10X Genomics linked short-read WGS</td>
+  <td>GWAS SVatalog cohort: 101 samples with matched long-read SVs (see <a href="hgTrackUi?g=chirmade101Sv">chirmade101Sv</a>)</td>
+  <td>&mdash;</td>
+  <td>Yes</td>
+</tr>
 </table>
 
 <h2>Display Conventions</h2>
 
 <p>Most tracks only show the variant and allele frequencies on mouseover or clicks.
 When zoomed in, tracks display alleles with base-specific coloring. Homozygote
 data are shown as one letter, while heterozygotes will be displayed with both
 letters. All VCF files are normalized, with one single allele per annotation (no multi-allele
 lines).
 </p>
 
 <h2>Data Access</h2>
 <p>All the data is publicly available. The table above indicates if we are allowed to distribute it in VCF format. Most of the databases do not allow us to redistribute the data files directly from our website, but it can always be downloaded from the original websites in some form. Click the database link in the table above and see the "Data Access" section of the respective track for a description of where to download the data. When the data is freely available from our website, the Data Access section will also indicate the VCF file location on our download server. Because it contains some licensed data, the combined track is not available for download, but can be recreated using the conversion scripts in our Github repository and the accompanying documentation file.
 </p>