5ea5925226bd7ccbfcdda2fe987f7fff6122dabf
jnavarr5
Mon Nov 10 16:42:14 2025 -0800
Moving the Simons Genome Diversity track next to the Mexico Biobank track since those are the only two available for hg19. refs #36642
diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index 3fd8aee3897..857e558cf49 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -5,71 +5,72 @@
projects. Only the projects 1000 Genomes (its own track), HGDP, SGDP, HGDP+1k and MXB provide individual-level genotypes.
All others provide only allele frequencies, their genotypes require signing a data access agreement.
-
Mexico Biobank (MXB): This track displays
phased alleles from the Mexico Biobank Project
(MXB), based on array genotyping of 6,011 individuals sampled across all 32 states of Mexico during
the 2000 National Health Survey (ENSA 2000) conducted by the National Institute of Public Health (INSP).
Frequencies can be plotted onto a map on MexVar.
The hg38 track was lifted from hg19.
(Publication?)
+ - Simons Genome Diversity Project (SGDP):
+ Funded by the Simons Foundation, the Simons Genome Diversity Project
+ is a large-scale effort that sequenced high-coverage genomes from 300
+ individuals (279 in this track) representing 142 diverse and often
+ indigenous populations worldwide.
+ Its goal was to capture the full range of human genetic
+ diversity to better understand population history, migration, and
+ adaptation. It is sampling populations in a way that represents as much
+ anthropological, linguistic and cultural diversity as possible, and
+ thus includes many deeply divergent human populations that are not well
+ represented in other datasets. SGDP emphasizes breadth of global representation and
+ population history, whereas HGDP emphasizes continuity and
+ comparability across major population groups. Not all iits data is
+ public, so this track contains only 279 genomes. For details, see
+ (Mallick et al, Nature 2016). The hg38 track was lifted from hg19.
+
+
- Human Genome Diversity Project (HGDP):
929 high-coverage genome sequences from 54 diverse human populations,
26 of which are physically phased using linked-read sequencing. The
Human Genome Diversity Project (HGDP) was launched in the early 1990s
to study the genetic variation and evolutionary history of modern
humans across global populations. Its goal was to document the full
spectrum of human genetic diversity, particularly in indigenous and
geographically isolated groups, to better understand population
structure, migration, adaptation, and disease susceptibility.The
project collected samples from ~1,000 individuals representing over 50
populations worldwide, including groups from Africa, Europe, Asia,
Oceania, and the Americas. These data have become a foundational
reference for population genetics and human evolution studies.
Data can be downloaded from the Sanger Website. For details, see (Bergström et al, Science 2020).
- gnomAD HGDP and 1000 Genomes callset:
A reprocessed version by the gnomAD project for the 1000 Genomes and
Human Genome Diversity Project (HGDP) data, with 4094 genomes from 80
populations. We already have separate, older tracks for 1000 Genomes on the main hg38
browser and for HGDP, just above. This
track combines both datasets, with harmonized data quality. For details, see (Koenig et al, 2024).
- - Simons Genome Diversity Project (SGDP):
- Funded by the Simons Foundation, the Simons Genome Diversity Project
- is a large-scale effort that sequenced high-coverage genomes from 300
- individuals (279 in this track) representing 142 diverse and often
- indigenous populations worldwide.
- Its goal was to capture the full range of human genetic
- diversity to better understand population history, migration, and
- adaptation. It is sampling populations in a way that represents as much
- anthropological, linguistic and cultural diversity as possible, and
- thus includes many deeply divergent human populations that are not well
- represented in other datasets. SGDP emphasizes breadth of global representation and
- population history, whereas HGDP emphasizes continuity and
- comparability across major population groups. Not all iits data is
- public, so this track contains only 279 genomes. For details, see
- (Mallick et al, Nature 2016). The hg38 track was lifted from hg19.
-
-
-
Mexico City Prospective Study (MCPS):
9,950 whole genome sequenced individuals
and 141,046 exome sequenced and genotyped individuals from the Mexico
City Prospective Study (MCPS), a collaboration between the Regeneron Genetics
Center, University of Oxford, Universidad Nacional Autónoma de México (UNAM),
National Institute of Genomic Medicine in Mexico, Abbvie Inc. and AstraZeneca
UK. For details see (Ziyatdinov A, Nature 2023), the reference section.
-
Mexico City Prospective Study (MCPS):
9,950 whole genome sequenced individuals
and 141,046 exome sequenced and genotyped individuals from the Mexico
City Prospective Study (MCPS), a collaboration between the Regeneron Genetics
@@ -154,30 +155,31 @@
Data can be downloaded from the IndiGen Website. For details see (Jain et al, NAR 2020). Only
the allele frequency is available from this project. The website also provides SV call
and Alu insertion VCFs.
- Korean Variant Archive (KOVA):
1,896 whole genome sequencing and 3,409 whole exome sequencing data from healthy individuals of Korean ethnicity.
Most of the samples were originated from normal tissue of cancer
patients (40.16 %), healthy parents of rare disease patients (28.4 %),
or healthy volunteers (31.44 %). Japanese ancestry is broken down
in the INFO field.
TSV data can be requested on the KOVA Downloads website.
Coverage 100x for WES, 30x for WGS.
For details see (Lee et al, Exp Mol Med 2022).
+
Display Conventions
Most tracks only show the variant and allele frequencies on mouseover or clicks.
When zoomed in, tracks display alleles with base-specific coloring. Homozygote
data are shown as one letter, while heterozygotes will be displayed with both
letters.
Full haplotype display - only for the MXB and HGDP tracks: In "pack" mode, this track sorts the haplotypes. This can be
useful for determining the similarity between the samples and inferring
inheritance at a particular locus.
For a full description of how the display works, please see our
Haplotype Display help page.