68c5b3b5dfc4053ff78a6b1d236bd1ac90251cfa lrnassar Mon Jun 1 14:40:45 2026 -0700 varFreqs: description pages for the three combined tracks and "SNV" rename sweep. Add varFreqsDisease.html and varFreqsArray.html so the two new combined tracks have full Description/Display/Methods/Data Access/References. Add a Caveats section on varFreqsArray about chip-data quality vs sequencing. Update varFreqsAll.html and the supertrack varFreqs.html to reflect the three-combined-track family (cross-links between siblings, new "Combined Tracks" section, new table rows, and updated source/variant counts). Add a GoNL row to the supertrack table. Sweep 37 subtrack longLabels and four cross-referencing description pages (colorsDbSnv.html, mei.html, meiSwegen.html, phasedVars.html) from "Variant Frequencies:" to "SNV Frequencies:" to match the supertrack shortLabel. refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index fa9d6dbb231..bb8288f2744 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -1,76 +1,99 @@ <h2>Description</h2> <p> This supertrack collects variant allele frequencies from population-scale sequencing and genotyping projects worldwide, from a total of ~1.7 million genomes/exomes/arrays. -The data was not reprocessed in a harmonized way; the variant VCFs were collected from the projects as-is. -The goal is a single place to compare how common -a variant is across different populations, ancestries, and cohorts, for -projects that cannot be recomputed by gnomAD soon. The main -<a href="hgTrackUi?g=varFreqsAll">combined track</a> merges all databases into one summary track, -with filters, summed population frequencies and recalculated protein-effect annotations. -There is also one subtrack per project with the original VCF data and all the annotations that the project provides. -The different projects use different pipelines and sequencing technologies. Click any of the projects -above or below for a summary of their sample selection, sequencing assay and software pipeline. -Many projects do not allow us to distribute the data, but we document how to request it -and provide all converters.</p> +The data was not reprocessed in a harmonized way; the variant VCFs were collected from the +projects as-is. The goal is a single place to compare how common a variant is across +different populations, ancestries, and cohorts, for projects that cannot be recomputed by +gnomAD soon. Three combined tracks aggregate the source data along different lines, and +there is also one subtrack per project with the original VCF data and all the annotations +that the project provides. The different projects use different pipelines and sequencing +technologies. Click any of the projects above or below for a summary of their sample +selection, sequencing assay and software pipeline. Many projects do not allow us to +distribute the data, but we document how to request it and provide all converters. +</p> <p> Data from projects that provide haplotype-phased genotypes can also be found elsewhere: 1000 Genomes is also a separate track, and the phased genotypes HGDP, SGDP, HGDP+1000 Genomes and Mexico Biobank can also be found in the "Phased Variants" track. Their VCF versions below show only the isolate frequency per variant. </p> <p>Please contact us (<A HREF="mailto:genome@soe.ucsc.edu">genome@soe.ucsc.edu</A><!-- above address is genome at soe.ucsc.edu -->) if you know of a project that we should add. So far, Regeneron's Million Exomes and Mexico City Studies (request rejected) and Taiwan Biobank (pending). </p> -<h2>Combined Track (All Databases)</h2> +<h2>Combined Tracks</h2> <p> -The "All Databases Combined" track merges variants from all individual databases into a single -bigBed file with consequence annotations, totaling 1.17 billion variants from ~1.7 million individuals. -The track supports filtering by variant type -(SNV, insertion, deletion, MNV), predicted consequence (missense, synonymous, stop gained, -frameshift, splice, intron, intergenic), source database, allele frequency (overall maximum -and per-database), and allele count (total or per-database). The track is useful in dense mode -to get a quick overview of variant density across all projects, or with filters to find -variants present in specific databases or within certain frequency ranges. With the "clone track" -feature you can clone this track and keep multiple versions, each with different filters activated. -The "Density mode" checkbox on the track configuration page shows a plot of the -density of variants passing a filter, one per track clone. +Three combined tracks merge variants from the individual subtracks into single bigBed files +with predicted protein consequences and cross-database filtering. All three use the same +filter conventions (variant type, consequence, source database, allele frequency, allele +count, and per-database AF/AC). </p> +<ul> + <li><a href="hgTrackUi?g=varFreqsAll"><b>All Databases Combined</b></a> — 1.34 + billion variants from 28 sequencing-based cohorts (WGS, WES, long-read). The default + summary view of the supertrack. Excludes the genotyping-array cohorts.</li> + <li><a href="hgTrackUi?g=varFreqsDisease"><b>Disease-related Databases Combined</b></a> + — 932 million variants from six disease-focused cohorts (SPARK, SFARI WGS, + TOPMed, SCHEMA, GREGoR, GA4K), with phenotype-stratified AC/AF where the source + provides it.</li> + <li><a href="hgTrackUi?g=varFreqsArray"><b>Genotyping Array Databases Combined</b></a> + — 14.7 million variants from three array cohorts (TPMI Taiwan, Mexico Biobank, + UK Biobank imputed). Kept separate because chip data has different per-variant + confidence than sequencing.</li> +</ul> <h3>Available Datasets</h3> <table class="stdTbl"> <tr> <th>Database</th> <th>Region</th> <th>N</th> <th>Data Type</th> <th>Cohort</th> <th>Sub-populations</th> <th>Downloadable from UCSC</th> </tr> <tr> - <td><a href="hgTrackUi?g=varFreqsAll">All Databases combined</a></td> - <td>All below</td> - <td>1.7mil</td> - <td>WGS/WES/imputed</td> - <td></td> - <td></td> + <td><a href="hgTrackUi?g=varFreqsAll">All Databases Combined</a></td> + <td>Sequencing-based, all below</td> + <td>~1.7mil</td> + <td>WGS/WES/long-read</td> + <td>1.34B variants</td> + <td>Phenotype splits for SPARK, SFARI WGS, GREGoR</td> + <td>No</td> +</tr> +<tr> + <td><a href="hgTrackUi?g=varFreqsDisease">Disease-related Databases Combined</a></td> + <td>SPARK, SFARI WGS, TOPMed, SCHEMA, GREGoR, GA4K</td> + <td>~300k</td> + <td>WGS/WES/long-read</td> + <td>932M variants</td> + <td>SPARK ASD/Non-ASD, SFARI WGS ASD/Non-ASD, SCHEMA case/control, GREGoR aff/unaff/unknown</td> + <td>No</td> +</tr> +<tr> + <td><a href="hgTrackUi?g=varFreqsArray">Genotyping Array Databases Combined</a></td> + <td>TPMI, MexBB, UKBB</td> + <td>~530k</td> + <td>Array / imputed</td> + <td>14.7M variants</td> + <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=allofus">AllOfUs v7</a></td> <td>USA</td> <td>245k</td> <td>WGS</td> <td>General population, diverse</td> <td>African, Indigenous American, East Asian, European, Oceanian, South Asian (<b>local ancestry</b>; see Notes below)</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=topmed">TOPMED Freeze 10</a></td> <td>USA</td> @@ -122,30 +145,39 @@ <td>361k</td> <td>Imputed array (HRC+UK10K+1KGp3 ref panel)</td> <td>White British subset of UK Biobank, Neale Lab Round 2 GWAS</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=swefreq">SweGen</a></td> <td>Sweden</td> <td>1k</td> <td>WGS</td> <td>Cross-section of Swedish population</td> <td>—</td> <td>No</td> </tr> +<tr> + <td><a href="hgTrackUi?g=gonl">GoNL</a></td> + <td>Netherlands</td> + <td>498</td> + <td>WGS (~13x)</td> + <td>250 unrelated Dutch trios (parents only)</td> + <td>—</td> + <td>Yes</td> +</tr> <tr> <td><a href="hgTrackUi?g=schema">SCHEMA</a></td> <td>Multi-national</td> <td>121k</td> <td>WES</td> <td>Schizophrenia: 24k cases, 97k controls</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=tommo60kjpn">Japan ToMMO 61k</a></td> <td>Japan</td> <td>61k</td> <td>WGS</td> <td>General population</td>