695f40f9d6139a4df393522c067f1702aff8d3bd max Wed Apr 22 03:13:39 2026 -0700 varFreqs: add SVatalog 101 short-read SNV frequencies subtrack SNV/indel allele frequencies from the 101-sample GWAS SVatalog cohort (Chirmade et al. 2026, Heredity, PMID 41203876), called from 10X Genomics linked short-read WGS with GATK HaplotypeCaller v4.0.0.0 and phased with SHAPEIT v4.2.0. Sibling of the lrSv chirmade101Sv structural-variant track, which is built from the same 101 samples. 8,814,835 autosomal + chrX sites. Source release ships only AF; AC and AN are synthesized in the emitted VCF as AC=round(AF*202) and AN=202 (2*101 diploid), with the gnomAD v3.1 non-Finnish European AF and dbSNP rsID passed through as GNOMAD_NFE_AF and RSID info fields. VCF is bgzipped + tabix-indexed (172 MB + 1.6 MB .tbi). Files: - scripts/varFreqs/svatalogFreqToVcf.py (new): per-chrom allele-freq TSV -> single VCF with hg38 ##contig header - trackDb/human/varFreqs.ra: new svatalogSnv vcfTabix subtrack - trackDb/human/svatalogSnv.html (new): doc page - trackDb/human/varFreqs.html: new row in Available Datasets table - doc/hg38/varFreqs.txt: wget-free build block (input files were downloaded manually from Zenodo 13367574) Note: the All Databases Combined varFreqs bigBed has NOT been rebuilt to include this new source yet; a subsequent merge pass will add it. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index 159a3373fda..0e9f236cd67 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -257,30 +257,39 @@ 552 PacBio HiFi long-read WGS Genomic Answers for Kids: pediatric rare-disease probands and families (Children's Mercy) — Yes CoLoRSdb v1.2.0 Multi-national 1,027 PacBio HiFi long-read WGS Consortium of Long Read Sequencing: aggregated population-consented samples across multiple research cohorts — Yes + + SVatalog 101 + Canada (SickKids) + 101 + 10X Genomics linked short-read WGS + GWAS SVatalog cohort: 101 samples with matched long-read SVs (see chirmade101Sv) + — + Yes +

Display Conventions

Most tracks only show the variant and allele frequencies on mouseover or clicks. When zoomed in, tracks display alleles with base-specific coloring. Homozygote data are shown as one letter, while heterozygotes will be displayed with both letters. All VCF files are normalized, with one single allele per annotation (no multi-allele lines).

Data Access

All the data is publicly available. The table above indicates if we are allowed to distribute it in VCF format. Most of the databases do not allow us to redistribute the data files directly from our website, but it can always be downloaded from the original websites in some form. Click the database link in the table above and see the "Data Access" section of the respective track for a description of where to download the data. When the data is freely available from our website, the Data Access section will also indicate the VCF file location on our download server. Because it contains some licensed data, the combined track is not available for download, but can be recreated using the conversion scripts in our Github repository and the accompanying documentation file.