695f40f9d6139a4df393522c067f1702aff8d3bd max Wed Apr 22 03:13:39 2026 -0700 varFreqs: add SVatalog 101 short-read SNV frequencies subtrack SNV/indel allele frequencies from the 101-sample GWAS SVatalog cohort (Chirmade et al. 2026, Heredity, PMID 41203876), called from 10X Genomics linked short-read WGS with GATK HaplotypeCaller v4.0.0.0 and phased with SHAPEIT v4.2.0. Sibling of the lrSv chirmade101Sv structural-variant track, which is built from the same 101 samples. 8,814,835 autosomal + chrX sites. Source release ships only AF; AC and AN are synthesized in the emitted VCF as AC=round(AF*202) and AN=202 (2*101 diploid), with the gnomAD v3.1 non-Finnish European AF and dbSNP rsID passed through as GNOMAD_NFE_AF and RSID info fields. VCF is bgzipped + tabix-indexed (172 MB + 1.6 MB .tbi). Files: - scripts/varFreqs/svatalogFreqToVcf.py (new): per-chrom allele-freq TSV -> single VCF with hg38 ##contig header - trackDb/human/varFreqs.ra: new svatalogSnv vcfTabix subtrack - trackDb/human/svatalogSnv.html (new): doc page - trackDb/human/varFreqs.html: new row in Available Datasets table - doc/hg38/varFreqs.txt: wget-free build block (input files were downloaded manually from Zenodo 13367574) Note: the All Databases Combined varFreqs bigBed has NOT been rebuilt to include this new source yet; a subsequent merge pass will add it. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/svatalogSnv.html src/hg/makeDb/trackDb/human/svatalogSnv.html new file mode 100644 index 00000000000..20340d84bef --- /dev/null +++ src/hg/makeDb/trackDb/human/svatalogSnv.html @@ -0,0 +1,95 @@ +

Description

+

+This track shows small-variant (single-nucleotide variant and short-indel) +allele frequencies from 101 samples released as part of the +GWAS +SVatalog tool (Chirmade et al. 2026). The same 101-sample cohort +underlies the structural-variant sibling track +SVatalog 101 SVs in the Long-read +SV collection; this track provides the companion small-variant allele +frequencies that SVatalog uses to compute linkage disequilibrium between +SNPs and SVs. +

+

+The callset contains approximately 8.8 million sites across the autosomes +and chromosome X. Each site reports the alternate allele frequency in the +101 samples, the gnomAD v3.1 non-Finnish European allele frequency (when +annotated in the source release), and a dbSNP rsID when one was available. +

+ +

Display Conventions and Configuration

+

+The track uses the standard VCF display. Variants appear as colored marks +along the genome; clicking an item opens the detail page with per-site +INFO fields: AF, AC, AN, the gnomAD v3.1 NFE allele frequency +(GNOMAD_NFE_AF) and the dbSNP rsID (RSID). +

+

+Note on AC/AN: the source allele-frequency release only ships AF. For this +track AC and AN are synthesized by assuming the full 2x101 = 202-allele +denominator (AN=202, AC=round(AF x 202)); these are therefore +approximations at sites where some samples had missing genotypes. +

+ +

Methods

+

+Small variants were called from 10X Genomics linked-read (paired-end +short-read) whole-genome sequencing of the 101 SVatalog samples with +GATK +HaplotypeCaller v4.0.0.0 using default parameters. Calls were phased +across the cohort with +SHAPEIT +v4.2.0, and per-site alternate allele frequencies were computed on +the resulting joint callset. Structural variants, released as a separate +lrSv subtrack, were called from long-read data and merged with these +SNPs for the LD analyses reported by GWAS SVatalog. +

+

+For display here, the per-chromosome allele-frequency text files +(chr{1..22,X}_allele_freq.txt) were converted to a single +sites-only VCF with approximate AC/AN fields and bgzipped / tabix +indexed. The step-by-step build commands are recorded in the UCSC +makeDoc + +doc/hg38/varFreqs.txt; the converter script lives in + +makeDb/scripts/varFreqs. +

+ +

Data Access

+

+The VCF file for this track is available from +our +download server as svatalog.vcf.gz (with .tbi index). +Regions can be extracted with tabix: +tabix http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/svatalog/svatalog.vcf.gz chr21:1-100000000. +

+

+The original per-chromosome allele-frequency tables and the accompanying +LD statistics used by the SVatalog tool are available from the +companion Zenodo deposit: +zenodo.org/records/13367574. +The SVatalog web tool itself is at +svatalog.research.sickkids.ca. +

+ +

Credits

+

+Thanks to Chirmade, Strug and colleagues at The Hospital for Sick +Children and the University of Toronto for releasing this annotated +SNP frequency callset alongside the GWAS SVatalog tool. +

+ +

References

+ + +

+Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G, +Lin F, Keenan K, Patel RV et al. + +GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations. +Heredity (Edinb). 2026 Mar;135(3):199-210. +PMID: 41203876; PMC: PMC13031531 +

+