ef70dfff0e8710e8aa4bc369a939f838c75947fb
lrnassar
Fri Jun 5 14:59:06 2026 -0700
varFreqs: Phase-7 audit cleanup on the supertrack and combined-track
description pages.
Supertrack varFreqs.html:
- Restore the consequence-filter "Other" bucket explanation that was lost
when varFreqsAll.html was replaced by the Affected+Background pair (now
documented once on the supertrack page, since all three combined tracks
share the filter).
- Add 6 primary citations that were already in standalone subtrack pages
but not carried up: Bycroft (UK Biobank), Cao (ChinaMAP), Cong (WBBC),
Genome of the Netherlands Consortium (GoNL), Malomane (Saudi), Yang
(TPMI).
- Reorder Ameur, Singh into correct alphabetical position.
- Lowercase Description
This track merges variants from three genotyping-array cohorts into a single bigBed file
with predicted protein consequences and cross-database filtering. It contains 14.7 million
variants from the Taiwan Precision Medicine Initiative (TPMI Axiom TPM1 chip,
~1 million Han Chinese), the Mexico Biobank (MexBB, 6,011 individuals), and UK Biobank
(361k unrelated white British, imputed from the Neale Lab Round 2 release).
-The array track is kept separate from the
-All Databases Combined WGS/WES summary so that
+The array track is kept separate from the sequencing-based combined tracks
+(Affected/Case Individuals and
+Population + Unaffected) so that
sequencing-based and array-based frequencies can be inspected independently. For a summary
of all available variant frequency databases, see the
SNV Frequencies supertrack page.
Variants are colored by their most severe predicted consequence:Display Conventions
Color by Consequence
| Color | Consequence class | Examples |
|---|---|---|
| Red | -Protein-truncating / Loss-of-function | -stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost | -
| Blue | -Missense / In-frame | -missense, inframe_insertion, inframe_deletion, protein_altering | -
| Green | +||
| + | Protein-truncating / loss-of-function | +stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost |
| + | Missense / in-frame | +missense, inframe_insertion, inframe_deletion, protein_altering |
| Synonymous | -synonymous, stop_retained | -|
| Grey | -Non-coding / Intergenic | -intron, non_coding, intergenic, UTR | -synonymous, stop_retained | +
| + | Non-coding / intergenic | +intron, non_coding, intergenic, UTR |
The "AA change" field uses bcftools csq notation: 23I>23V means position 23 changed from Isoleucine (I) to Valine (V) (missense). 23I alone (no arrow) means position 23 is Isoleucine and unchanged (synonymous). A "*" indicates a stop codon (e.g. 45R>45* is a stop_gained).
Allele frequencies from genotyping arrays are not directly comparable to those from whole-genome or whole-exome sequencing. Two limitations to keep in mind:
NGS_concordance value (chip-vs-sequencing concordance from
its own validation) in the source VCF; high-AF claims with low concordance are
common. MexBB ships only AN/AF/AC with no FILTER column and no per-site QC at all.
For both arrays, high-AF rare-disease candidates should be cross-checked against the
- sequencing-based All Databases Combined track
- before drawing conclusions.This track supports filtering via the track settings page. Click the track title or use the "Configure" button to access filters.
The Source Database filter restricts the display to variants present in specific databases. It uses OR logic.
-The same merge-and-annotate pipeline used for the
-All Databases Combined track was run on the
+The same merge-and-annotate pipeline used for the sequencing-based combined tracks
+(Affected/Case Individuals and
+Population + Unaffected) was run on the
array-cohort subset of source VCFs. Each VCF was stripped of its INFO fields, normalized
with bcftools norm (splitting multi-allelic sites), and merged with
bcftools merge. The merged VCF was then annotated with predicted protein
consequences using bcftools csq with the
Ensembl
GRCh38 release 115 gene annotation (GFF3).
The track's makeDoc file documents how each source VCF was converted. Scripts are available from Github.