86744c40b7e7f18792d287aedf9cf5da543e2d5a max Fri Apr 17 07:22:27 2026 -0700 Add GA4K (Genomic Answers for Kids) small-variant subtrack to the Variant Frequencies supertrack for hg38. #Preview2 week - bugs introduced now will need a build patch to fix Children's Mercy pediatric rare-disease cohort: ~36.2M SNVs and short indels from 552 PacBio HiFi long-read samples (DeepVariant/GLnexus), filtered to variants replicated in >=2 unrelated GA4K individuals or an HPRC variant. Ref: Cohen et al. 2022, Genet Med, PMID 35305867. refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/doc/hg38/varFreqs.txt src/hg/makeDb/doc/hg38/varFreqs.txt index 7de0e6a41e4..3051e5f2e12 100644 --- src/hg/makeDb/doc/hg38/varFreqs.txt +++ src/hg/makeDb/doc/hg38/varFreqs.txt @@ -1,15 +1,29 @@ +# Genomic Answers for Kids (GA4K), Children's Mercy - 2026-04-16 Claude max +# GA4K is a pediatric rare-disease PacBio HiFi long-read cohort (Cohen et al. +# 2022, Genet Med, PMID 35305867). The release ships 24 per-chromosome VCFs of +# site-only small variants (SNVs and short indels), filtered to variants +# replicated in >=2 unrelated GA4K individuals or matched to an HPRC variant. +# Upstream data lives under /hive/data/genomes/hg38/bed/lrSv/GA4K (co-located +# with the matched GA4K structural-variant release; see the lrSv makedoc). +cd /hive/data/genomes/hg38/bed/lrSv/GA4K +bcftools concat -Oz -o ga4kSnv.vcf.gz \ + pacbio_snv_vcf/pb_joint_merged.snv.chr{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y}.vcf.gz +tabix -p vcf ga4kSnv.vcf.gz +# Symlinks placed under /gbdb/hg38/varFreqs/ga4k/ for the ga4kSnv stanza in +# trackDb/human/varFreqs.ra. + # Mexico Biobank, Max, Nov 8 2025 CrossMap.py vcf /gbdb/hg19/liftOver/hg19ToHg38.over.chain.gz /hive /data/genomes/hg19/bed/varFreqs/mexbb/MXBv2.vcf.gz /hive/data/genomes/hg38/p14Clean/hg38.p14.fa MXBv2.lift.hg19ToHg38.vcf && bgzip MXBv2.lift.hg19ToHg38.vcf && bcftools sort MXBv2.lift.hg19ToHg38.vcf -Oz -m 200G -T /data/tmp/ -o MXBv2.lift.hg19ToHg38.vcf.gz && tabix -p vcf MXBv2.lift.hg19ToHg38.vcf.gz # Mexico City Prospective study, Max Oct 28 2025 cd /hive/data/genomes/hg38/bed/varFreqs/mcps/ for i in `seq 1 22` X; do wget https://rgc-mcps.regeneron.com/downloads/20230130/chr$i.freq.vcf.gz; done for i in `seq 1 22` X; do wget https://rgc-mcps.regeneron.com/downloads/20230130/chr$i.freq.vcf.gz.tbi; done mv *vcf* vcf/ bcftools concat --threads 16 -Oz -o mcps.freq.vcf.gz vcf/chr{1..22}.freq.vcf.gz vcf/chrX.freq.vcf.gz # make normal AC and AF and AN fields for mouseovers zcat mcps.freq.vcf.gz | sed -e 's/_RAW//g' > mcps.fix.freq.vcf mv -f mcps.fix.freq.vcf mcps.freq.vcf bgzip mcps.freq.vcf tabix -p vcf mcps.freq.vcf.gz