165a15d6a94d53f8162a01e69f3912a7a23a3b50 max Mon Mar 23 06:47:55 2026 -0700 mostly done with the variant frequencies track, refs#36642 diff --git src/hg/makeDb/trackDb/human/varFreqsAll.html src/hg/makeDb/trackDb/human/varFreqsAll.html index 46809b5844e..d1203549bd7 100644 --- src/hg/makeDb/trackDb/human/varFreqsAll.html +++ src/hg/makeDb/trackDb/human/varFreqsAll.html @@ -1,6 +1,155 @@
-This track merges variants from all individual variant frequency databases into a single file -with consequence annotations and cross-database filtering. For full documentation, see the +This track merges variants from all individual variant frequency databases into a single +bigBed file with predicted protein consequences and cross-database filtering. It contains +over 1.1 billion variants from 20 population databases worldwide. For a summary of +all available databases, see the Variant Frequencies supertrack page.
+ ++Each variant is annotated with its predicted consequence on protein-coding genes +(using bcftools csq with +Ensembl +gene models), and colored by severity. +Allele counts and frequencies are shown for each source database and, where available, +broken down by ancestry or population group. +
+ +Variants are colored by their most severe predicted consequence:
+| Color | Consequence class | Examples |
|---|---|---|
| Red | +Protein-truncating / Loss-of-function | +stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost | +
| Blue | +Missense / In-frame | +missense, inframe_insertion, inframe_deletion, protein_altering | +
| Green | +Synonymous | +synonymous, stop_retained | +
| Grey | +Non-coding / Intergenic | +intron, non_coding, intergenic, UTR | +
+The "AA change" field uses bcftools csq notation: 23I>23V means position +23 changed from Isoleucine (I) to Valine (V) (missense). 23I alone (no arrow) +means position 23 is Isoleucine and unchanged (synonymous). A "*" indicates a +stop codon (e.g. 45R>45* is a stop_gained). +
+ ++This track supports extensive filtering via the track settings page. Click on the track +title or use the "Configure" button to access filters: +
+ +How to find protein-truncating variants: Set the Consequence filter to include +only "Stop Gained", "Frameshift", "Splice Donor", and +"Splice Acceptor". These will appear as red items in the track display.
+ ++The Source Database filter lets you restrict to variants present in specific databases. +For example, select only "GREGoR" to see variants found in the rare disease cohort. +This filter uses OR logic: selecting multiple databases shows variants found in +any of the selected databases. +
+ ++Several databases provide ancestry-specific allele frequencies: +
+
+Variant frequency VCF files from 20 databases were stripped of their INFO fields
+(to reduce size), normalized with bcftools norm (splitting multi-allelic sites),
+and merged with bcftools merge. The merged VCF was then annotated with predicted
+protein consequences using bcftools csq with the
+Ensembl
+GRCh38 release 115 gene annotation (GFF3).
+
+The annotated VCF was converted to bigBed format using a custom Python script
+(vcfToBigBed.py) that reads frequency data from each source VCF in parallel,
+matches variants by position/ref/alt, and writes a BED file with consequence coloring,
+per-database allele counts and frequencies, and population breakdowns.
+The database configuration (which VCFs to include, field mappings, and population definitions)
+is stored in two TSV files
+(databases.tsv and
+populations.tsv)
+to make future updates easy.
+
+We provide documentation that indicates how all source files of the varFreqs track were +converted in the +makeDoc file of the track. +Scripts are available from +Github. +
+ ++This track is only possible thanks to the data from millions of volunteers around the world, +who donated blood, signed consent forms and provided health information about themselves and +sometimes their families. Click on any of the individual tracks in the +Variant Frequencies supertrack to see the specific +credits for each project. Thanks to Alex Ioannidis, UCSC, for the motivation for this track +and to Andreas Lahner, MGZ, for feedback. +