165a15d6a94d53f8162a01e69f3912a7a23a3b50 max Mon Mar 23 06:47:55 2026 -0700 mostly done with the variant frequencies track, refs#36642 diff --git src/hg/makeDb/trackDb/human/varFreqsAll.html src/hg/makeDb/trackDb/human/varFreqsAll.html index 46809b5844e..d1203549bd7 100644 --- src/hg/makeDb/trackDb/human/varFreqsAll.html +++ src/hg/makeDb/trackDb/human/varFreqsAll.html @@ -1,6 +1,155 @@

Description

-This track merges variants from all individual variant frequency databases into a single file -with consequence annotations and cross-database filtering. For full documentation, see the +This track merges variants from all individual variant frequency databases into a single +bigBed file with predicted protein consequences and cross-database filtering. It contains +over 1.1 billion variants from 20 population databases worldwide. For a summary of +all available databases, see the Variant Frequencies supertrack page.

+ +

+Each variant is annotated with its predicted consequence on protein-coding genes +(using bcftools csq with +Ensembl +gene models), and colored by severity. +Allele counts and frequencies are shown for each source database and, where available, +broken down by ancestry or population group. +

+ +

Display Conventions

+ +

Color by Consequence

+

Variants are colored by their most severe predicted consequence:

+ + + + + + + + + + + + + + + + + + + + + + +
ColorConsequence classExamples
RedProtein-truncating / Loss-of-functionstop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost
BlueMissense / In-framemissense, inframe_insertion, inframe_deletion, protein_altering
GreenSynonymoussynonymous, stop_retained
GreyNon-coding / Intergenicintron, non_coding, intergenic, UTR
+ +

Amino Acid Change Notation

+

+The "AA change" field uses bcftools csq notation: 23I>23V means position +23 changed from Isoleucine (I) to Valine (V) (missense). 23I alone (no arrow) +means position 23 is Isoleucine and unchanged (synonymous). A "*" indicates a +stop codon (e.g. 45R>45* is a stop_gained). +

+ +

Filters

+

+This track supports extensive filtering via the track settings page. Click on the track +title or use the "Configure" button to access filters: +

+ +

Variant Type and Consequence

+ + +

How to find protein-truncating variants: Set the Consequence filter to include +only "Stop Gained", "Frameshift", "Splice Donor", and +"Splice Acceptor". These will appear as red items in the track display.

+ +

Frequency and Count Filters

+ + +

Source Database

+

+The Source Database filter lets you restrict to variants present in specific databases. +For example, select only "GREGoR" to see variants found in the rare disease cohort. +This filter uses OR logic: selecting multiple databases shows variants found in +any of the selected databases. +

+ +

Population-Specific Filters

+

+Several databases provide ancestry-specific allele frequencies: +

+ + +

Length Filters

+ + +

Methods

+

+Variant frequency VCF files from 20 databases were stripped of their INFO fields +(to reduce size), normalized with bcftools norm (splitting multi-allelic sites), +and merged with bcftools merge. The merged VCF was then annotated with predicted +protein consequences using bcftools csq with the +Ensembl +GRCh38 release 115 gene annotation (GFF3). +

+ +

+The annotated VCF was converted to bigBed format using a custom Python script +(vcfToBigBed.py) that reads frequency data from each source VCF in parallel, +matches variants by position/ref/alt, and writes a BED file with consequence coloring, +per-database allele counts and frequencies, and population breakdowns. +The database configuration (which VCFs to include, field mappings, and population definitions) +is stored in two TSV files +(databases.tsv and +populations.tsv) +to make future updates easy. +

+ +

+We provide documentation that indicates how all source files of the varFreqs track were +converted in the +makeDoc file of the track. +Scripts are available from +Github. +

+ +

Credits

+

+This track is only possible thanks to the data from millions of volunteers around the world, +who donated blood, signed consent forms and provided health information about themselves and +sometimes their families. Click on any of the individual tracks in the +Variant Frequencies supertrack to see the specific +credits for each project. Thanks to Alex Ioannidis, UCSC, for the motivation for this track +and to Andreas Lahner, MGZ, for feedback. +