86744c40b7e7f18792d287aedf9cf5da543e2d5a max Fri Apr 17 07:22:27 2026 -0700 Add GA4K (Genomic Answers for Kids) small-variant subtrack to the Variant Frequencies supertrack for hg38. #Preview2 week - bugs introduced now will need a build patch to fix Children's Mercy pediatric rare-disease cohort: ~36.2M SNVs and short indels from 552 PacBio HiFi long-read samples (DeepVariant/GLnexus), filtered to variants replicated in >=2 unrelated GA4K individuals or an HPRC variant. Ref: Cohen et al. 2022, Genet Med, PMID 35305867. refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/ga4kSnv.html src/hg/makeDb/trackDb/human/ga4kSnv.html new file mode 100644 index 00000000000..2f00876c279 --- /dev/null +++ src/hg/makeDb/trackDb/human/ga4kSnv.html @@ -0,0 +1,81 @@ +

Description

+

+This track shows small variants (single-nucleotide variants and short +insertion/deletion variants) identified by PacBio HiFi long-read sequencing +of probands and their families enrolled in the Genomic Answers for Kids +(GA4K) program at Children's Mercy Research Institute. GA4K is a longitudinal +pediatric genomics initiative that aims to enroll 30,000 children with +suspected rare genetic disorders, together with their parents, to build a +large-scale resource of clinical and genomic data. +

+

+The callset contains approximately 36.2 million variants genotyped across +up to 552 samples (maximum allele number 1104 on the autosomes). Each +variant is annotated with allele count (AC), total called alleles (AN), +cohort allele frequency (AF), variant type (substitution, insertion or +deletion) and the corresponding allele frequency in gnomAD v3.0 where +available. +

+ +

Display Conventions and Configuration

+

+The track uses the standard VCF display. By default, variants are shown as +colored marks along the genome; clicking an item opens the detail page +with per-site INFO fields including AC, AN, AF and the gnomAD v3 allele +frequency. +

+ +

Methods

+

+Samples were sequenced on PacBio Revio and Sequel II instruments with HiFi +chemistry. Per-sample variant calls were generated with DeepVariant as gVCFs, +then merged across the cohort with GLnexus v1.2.7 using the +DeepVariant_unfiltered configuration. The resulting BCF was converted +to VCF with bcftools view v1.10. +

+

+To reduce false positives, the merged callset was filtered to variants +replicated by independent evidence: (1) observed in at least one additional +unrelated Children's Mercy individual, or (2) matching a variant observed in +a sample from the Human Pangenome Reference Consortium (HPRC). +

+

+The GA4K release is provided as 24 per-chromosome VCF files (chr1-22, chrX, +chrY). For display on the Genome Browser, these were concatenated with +bcftools concat into a single bgzip-compressed, tabix-indexed file. +

+ +

Data Access

+

+The VCF file for this track is available from +our +download server as ga4kSnv.vcf.gz (with .tbi index). +Regions can be extracted with tabix, for example: +tabix http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/ga4k/ga4kSnv.vcf.gz chr21:1-100000000. +

+

+The original per-chromosome VCFs and full release documentation are +available from the Children's Mercy Research Institute GA4K data release at + +github.com/ChildrensMercyResearchInstitute/GA4K. +

+ +

Credits

+

+Thanks to the Children's Mercy Research Institute and the Genomic Answers +for Kids participants and their families for making this dataset publicly +available. +

+ +

References

+ + +

+Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L, +Baybayan P, Belden B et al. + +Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. +Genet Med. 2022 Jun;24(6):1336-1348. +PMID: 35305867 +

+