src/hg/makeDb/trackDb/human/varFreqs.html 86744c40b7e7f18792d287aedf9cf5da543e2d5a

86744c40b7e7f18792d287aedf9cf5da543e2d5a
max
  Fri Apr 17 07:22:27 2026 -0700
Add GA4K (Genomic Answers for Kids) small-variant subtrack to the
Variant Frequencies supertrack for hg38.
#Preview2 week - bugs introduced now will need a build patch to fix

Children's Mercy pediatric rare-disease cohort: ~36.2M SNVs and short
indels from 552 PacBio HiFi long-read samples (DeepVariant/GLnexus),
filtered to variants replicated in >=2 unrelated GA4K individuals or
an HPRC variant. Ref: Cohen et al. 2022, Genet Med, PMID 35305867.

refs #36642

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index 7a69894d318..cecfeb0e22b 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -1,271 +1,280 @@
 <h2>Description</h2>
 <p>
 This supertrack collects variant allele frequencies from population-scale sequencing and
 genotyping projects worldwide, from a total of ~1.7 million genomes/exomes/arrays.
 The data was not reprocessed in a harmonized way but the variant VCFs were collected from the projects.
 The goal is to provide a single place to compare how common
 a variant is across different populations, ancestries, and cohorts, for
 projects that cannot be recomputed by gnomAD soon. The main
 <a href="hgTrackUi?g=varFreqsAll">combined track</a> merges all databases into one single summary track, 
 with filters, summed population frequencies and recalculated protein-effect annotations.
 In addition, there is one subtrack per project with the original VCF data and all the annotations that the project provides.
 The different projects use different pipelines and sequencing technologies, click any of the projects 
 above or below for a summary of their sample selection, sequencing assay and software pipeline.
 Many projects do not allow us to distribute the data but we document how the
 data can be requested and provide all converters.</p>
 
 <p>
 Data from projects that provide haplotype-phased genotypes can also be found
 elsewhere: 1000 Genomes is also a separate track, and the phased genotypes HGDP, SGDP,
 HGDP+1000 Genomes and Mexico Biobank can also be found in the &quot;Phased Variants&quot; track.
 Their VCF versions below show only the isolate frequency per variant.
 </p>
 
 <p>Please contact us (genome@soe.ucsc.edu), if you know a project that we should add. So far,
 we already requested these: UK Biobank (pending for a year),
 Regeneron&apos;s Million Exomes and Mexico City Studies (request rejected), Taiwan Biobank (pending).
 </p>
 
 <h2>Combined Track (All Databases)</h2>
 <p>
 The &quot;All Databases Combined&quot; track merges variants from all individual databases into a single
 bigBed file with consequence annotations, a total of more than 1.2 billion variants from 1.7 mil individuals.
 The track supports filtering by variant type
 (SNV, insertion, deletion, MNV), predicted consequence (missense, synonymous, stop gained,
 frameshift, splice, intron, intergenic), source database, allele frequency (overall maximum
 and per-database), and allele count (total or per-database). This track is either useful in dense mode
 for getting a quick overview of variant density across all projects, or with filters to find
 variants present in specific databases or within certain frequency ranges. Note that with the "clone track"
 feature you can clone this track and have multiple versions, each with different filters activated.
 You can also use our "Density mode" checkbox on the track configuration page to show a plot with the
 density of variants passing a filter, one per track clone.
 </p>
 
 <h3>Available Datasets</h3>
 
 <table class="stdTbl">
 <tr>
   <th>Database</th>
   <th>Region</th>
   <th>N</th>
   <th>Data Type</th>
   <th>Cohort</th>
   <th>Sub-populations</th>
   <th>Downloadable from UCSC</th>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=varFreqsAll">All Databases combined</a></td>
   <td>All below</td>
   <td>1.7mil</td>
   <td>WGS/WES/imputed</td>
   <td></td>
   <td></td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=allofus">AllOfUs v7</a></td>
   <td>USA</td>
   <td>245k</td>
   <td>WGS</td>
   <td>General population, diverse</td>
   <td>European, East Asian, African, Indigenous American, Oceanian, South Asian</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=topmed">TOPMED Freeze 10</a></td>
   <td>USA</td>
   <td>151k</td>
   <td>WGS</td>
   <td>Heart, lung, blood, sleep disorder cohorts</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=sfariSparkExomes">SFARI SPARK WES</a></td>
   <td>USA</td>
   <td>140k</td>
   <td>WES</td>
   <td>Autism families (parents + affected children)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=sfariSparkWgs">SFARI SPARK WGS</a></td>
   <td>USA</td>
   <td>12.5k</td>
   <td>WGS</td>
   <td>Autism families (parents + affected children)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=alfaVcf">NCBI ALFA R4</a></td>
   <td>USA</td>
   <td>408k</td>
   <td>WGS/WES/array mix</td>
   <td>Aggregated dbGaP studies, mixed phenotypes</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=finngen">FinnGen R12</a></td>
   <td>Finland</td>
   <td>500k</td>
   <td>Imputed (8.5k WGS ref panel)</td>
   <td>National biobank, ~10% of population</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=swefreq">SweGen</a></td>
   <td>Sweden</td>
   <td>1k</td>
   <td>WGS</td>
   <td>Cross-section of Swedish population</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=schema">SCHEMA</a></td>
   <td>Multi-national</td>
   <td>121k</td>
   <td>WES</td>
   <td>Schizophrenia: 24k cases, 97k controls</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=tommo60kjpn">Japan ToMMO 61k</a></td>
   <td>Japan</td>
   <td>61k</td>
   <td>WGS</td>
   <td>General population</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=mgrb">Australia MGRB</a></td>
   <td>Australia</td>
   <td>4k</td>
   <td>WGS</td>
   <td>Healthy elderly (age &ge;70)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=gasp">GenomeAsia Pilot</a></td>
   <td>Asia (219 groups)</td>
   <td>1.7k</td>
   <td>WGS</td>
   <td>Diverse populations across Asia</td>
   <td>Northeast Asian, Southeast Asian, South Asian</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=abraom">ABraOM Brazil</a></td>
   <td>Brazil</td>
   <td>1.2k</td>
   <td>WGS</td>
   <td>Elderly admixed individuals (S&atilde;o Paulo)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=indigenomes">IndiGenomes</a></td>
   <td>India</td>
   <td>1k</td>
   <td>WGS</td>
   <td>Healthy individuals</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=kova">KOVA Korea</a></td>
   <td>Korea</td>
   <td>5.3k</td>
   <td>1.9k WGS + 3.4k WES</td>
   <td>Normal tissue from cancer patients, healthy parents, volunteers</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=npm">NPM Singapore</a></td>
   <td>Singapore</td>
   <td>9.8k</td>
   <td>WGS</td>
   <td>Chinese, Indian, Malay ancestry</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=saudi">Saudi Genome</a></td>
   <td>Saudi Arabia</td>
   <td>302</td>
   <td>WGS (30x)</td>
   <td>Saudi population</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=hrc">HRC</a></td>
   <td>Multi-national</td>
   <td>~30k</td>
   <td>Low-coverage WGS (7x)</td>
   <td>Imputation reference panel (excl. 1000 Genomes)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=mxbFreq">MXB Mexico Biobank</a></td>
   <td>Mexico</td>
   <td>6k</td>
   <td>Genotyping array</td>
   <td>Diverse Mexican ancestries, 898 recruitment sites</td>
   <td>By state, by ancestry</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=sgdpFreq">SGDP</a></td>
   <td>Global</td>
   <td>279</td>
   <td>WGS</td>
   <td>142 diverse populations worldwide</td>
   <td>By population</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=gregor">GREGoR R4</a></td>
   <td>USA</td>
   <td>3.6k</td>
   <td>WGS</td>
   <td>Rare disease families (10.7k participants, 4.4k families)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=hgdp1kFreq">gnomAD HGDP+1kG</a></td>
   <td>Global</td>
   <td>4k</td>
   <td>WGS</td>
   <td>80 populations (HGDP + 1000 Genomes reprocessed)</td>
   <td>80 populations, continental groups</td>
   <td>Yes</td>
 </tr>
+<tr>
+  <td><a href="hgTrackUi?g=ga4kSnv">GA4K</a></td>
+  <td>USA</td>
+  <td>552</td>
+  <td>PacBio HiFi long-read WGS</td>
+  <td>Genomic Answers for Kids: pediatric rare-disease probands and families (Children's Mercy)</td>
+  <td>&mdash;</td>
+  <td>Yes</td>
+</tr>
 </table>
 
 <h2>Display Conventions</h2>
 
 <p>Most tracks only show the variant and allele frequencies on mouseover or clicks.
 When zoomed in, tracks display alleles with base-specific coloring. Homozygote
 data are shown as one letter, while heterozygotes will be displayed with both
 letters. All VCF files are normalized, with one single allele per annotation (no multi-allele
 lines).
 </p>
 
 <h2>Data Access</h2>
 <p>All the data is publicly available. The table above indicates if we are allowed to distribute it in VCF format. Most of the databases do not allow us to redistribute the data files directly from our website, but it can always be downloaded from the original websites in some form. Click the database link in the table above and see the "Data Access" section of the respective track for a description of where to download the data. When the data is freely available from our website, the Data Access section will also indicate the VCF file location on our download server. Because it contains some licensed data, the combined track is not available for download, but can be recreated using the conversion scripts in our Github repository and the accompanying documentation file.
 </p>
 
 <h2>Credits</h2>
 
 <p>This track is only possible thanks to the data from millions of volunteers around the world, who donated blood, signed consent forms and provided health information about themselves and sometimes their families. Click on any of the tracks in the list above to see the specific credits for each project. Thanks to Alex Ioannidis, UCSC, for the motivation for this track and to Andreas Lahner, MGZ, for feedback.</p>