68c5b3b5dfc4053ff78a6b1d236bd1ac90251cfa lrnassar Mon Jun 1 14:40:45 2026 -0700 varFreqs: description pages for the three combined tracks and "SNV" rename sweep. Add varFreqsDisease.html and varFreqsArray.html so the two new combined tracks have full Description/Display/Methods/Data Access/References. Add a Caveats section on varFreqsArray about chip-data quality vs sequencing. Update varFreqsAll.html and the supertrack varFreqs.html to reflect the three-combined-track family (cross-links between siblings, new "Combined Tracks" section, new table rows, and updated source/variant counts). Add a GoNL row to the supertrack table. Sweep 37 subtrack longLabels and four cross-referencing description pages (colorsDbSnv.html, mei.html, meiSwegen.html, phasedVars.html) from "Variant Frequencies:" to "SNV Frequencies:" to match the supertrack shortLabel. refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqsArray.html src/hg/makeDb/trackDb/human/varFreqsArray.html new file mode 100644 index 00000000000..44a7f29e44e --- /dev/null +++ src/hg/makeDb/trackDb/human/varFreqsArray.html @@ -0,0 +1,181 @@ +

Description

+This track merges variants from three genotyping-array cohorts into a single bigBed file +with predicted protein consequences and cross-database filtering. It contains 14.7 million +variants from the Taiwan Precision Medicine Initiative (TPMI Axiom TPM1 chip, +~1 million Han Chinese), the Mexico Biobank (MexBB, 6,011 individuals), and UK Biobank +(361k unrelated white British, imputed from the Neale Lab Round 2 release). +

+ +

+The array track is kept separate from the +All Databases Combined WGS/WES summary so that +sequencing-based and array-based frequencies can be inspected independently. For a summary +of all available variant frequency databases, see the +SNV Frequencies supertrack page. +

+ +

Display Conventions

+ +

Color by Consequence

Variants are colored by their most severe predicted consequence:

+ + + + + + + + + + + + + + + + + + + + + + +

Color	Consequence class	Examples
Red	Protein-truncating / Loss-of-function	stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost
Blue	Missense / In-frame	missense, inframe_insertion, inframe_deletion, protein_altering
Green	Synonymous	synonymous, stop_retained
Grey	Non-coding / Intergenic	intron, non_coding, intergenic, UTR

+ +

Amino Acid Change Notation

+The "AA change" field uses bcftools csq notation: 23I>23V means position +23 changed from Isoleucine (I) to Valine (V) (missense). 23I alone (no arrow) +means position 23 is Isoleucine and unchanged (synonymous). A "*" indicates a +stop codon (e.g. 45R>45* is a stop_gained). +

+ +

Caveats

+Allele frequencies from genotyping arrays are not directly comparable to those from +whole-genome or whole-exome sequencing. Two limitations to keep in mind: +

Probe coverage is sparse and curated. Array variants are only those the + manufacturer designed probes for. Absence from this track does not mean a + variant is absent in that population, only that it was not on the chip.
Per-variant call confidence varies and is sometimes unreported. TPMI publishes + a per-probe NGS_concordance value (chip-vs-sequencing concordance from + its own validation) in the source VCF; high-AF claims with low concordance are + common. MexBB ships only AN/AF/AC with no FILTER column and no per-site QC at all. + For both arrays, high-AF rare-disease candidates should be cross-checked against the + sequencing-based All Databases Combined track + before drawing conclusions.

+ +

Filters

+This track supports filtering via the track settings page. Click the track title or use the +"Configure" button to access filters. +

+ +

Variant Type and Consequence

Variant Type: SNV, Insertion, Deletion, or MNV.
Consequence: Missense, Synonymous, Stop Gained, Frameshift, Splice Donor, + Splice Acceptor, Intron, 3' UTR, 5' UTR, Non-coding, Intergenic, or Other. The filter + uses OR logic across the comma-separated consequence tokens on each variant. See the + All Databases Combined description page for a + complete description of the "Other" bucket.

+ +

Frequency and Count Filters

Max Allele Frequency: Filter by the maximum AF observed across the three + array sources.
Total Allele Count: Filter by the sum of allele counts across all three + databases.
Per-database AF and AC: Filter by allele frequency or count in any specific + source (TPMI Taiwan, Mexico Biobank, UK Biobank imputed).

+ +

Source Database

+The Source Database filter restricts the display to variants present in specific +databases. It uses OR logic. +

+ +

Length Filters

Reference/Alternate Length: Filter by the length of the reference or alternate allele.
Length Change: Filter by the size difference between alternate and reference + (positive = insertion, negative = deletion, zero = SNV or MNV).

+ +

Methods

+The same merge-and-annotate pipeline used for the +All Databases Combined track was run on the +array-cohort subset of source VCFs. Each VCF was stripped of its INFO fields, normalized +with bcftools norm (splitting multi-allelic sites), and merged with +bcftools merge. The merged VCF was then annotated with predicted protein +consequences using bcftools csq with the +Ensembl +GRCh38 release 115 gene annotation (GFF3). +

+ +

+The track's +makeDoc file documents how each source VCF was converted. Scripts are +available from +Github. +

+ +

Data Access

+The data can be explored interactively with the +Table Browser or the +Data Integrator. For programmatic access, our +REST API can be used; the track +name is varFreqsArray. +

+Because the merged callset includes data from multiple sources whose redistribution +licenses differ, the combined bigBed is not available for download from our download +server. The combined track can be reconstructed from the individual source VCFs using the +conversion scripts on GitHub together with the +build documentation. +

+ +

Credits

+This track is only possible thanks to the participants in TPMI, the Mexico Biobank, and UK +Biobank, who donated samples and provided health information. Click on the individual +TPMI, MexBB, or UK Biobank subtracks in the +SNV Frequencies supertrack for full project credits. +Thanks to Alex Ioannidis, UCSC, for the motivation for this track family and to Andreas +Lahner, MGZ, for feedback. +

+ +

References

+For primary citations of each source dataset, see the References section on the +SNV Frequencies supertrack page. The merged-track +build itself uses the following tools: +

+Danecek P, McCarthy SA. + +BCFtools/csq: haplotype-aware variant consequences. +Bioinformatics. 2017 Jul 1;33(13):2037-2039. +PMID: 28205675; PMC: PMC5870570 +

+McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F. + +The Ensembl Variant Effect Predictor. +Genome Biol. 2016 Jun 6;17(1):122. +PMID: 27268795; PMC: PMC4893825 +