aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/hgdp1kFreq.html src/hg/makeDb/trackDb/human/hgdp1kFreq.html new file mode 100644 index 00000000000..7897a903467 --- /dev/null +++ src/hg/makeDb/trackDb/human/hgdp1kFreq.html @@ -0,0 +1,79 @@ +

Description

+

+A reprocessed callset by the gnomAD project combining the 1000 Genomes and Human Genome Diversity Project +(HGDP) data, with 4,094 whole genomes from 80 populations. The dataset includes per-population +allele frequencies for all 80 populations as well as broad continental groupings from gnomAD +(African, Admixed American, East Asian, European, Middle Eastern, South Asian, and others). +

+ +

+This track shows allele frequencies only. The full phased genotype data with haplotype +clustering display is available in the +gnomAD HGDP+1000G track under Phased Variants. +The track here does not include the full variant frequencies for all subpopulations, instead, +it aggregates frequencies to the main groups, AFR, AMI, AMR, ASJ, EAS, FIN, MID, NFE, OTH, SAS. +To access the full frequency information, use the track under "Phased Variants". +

+ +

Data Access

+

+The data can be explored interactively with the +Table Browser or the +Data Integrator. +For programmatic access, our REST API can be used; the +track name is hgdp1kFreq. +For bulk download, the VCF file can be obtained from +our download server. +

+

+The original VCFs with full genotypes can also be downloaded from +gnomAD Downloads. +

+ +

Methods

+

+The gnomAD project reprocessed 4,094 whole genomes from the 1000 Genomes Project and the Human +Genome Diversity Project (HGDP) through a unified pipeline. Sequencing was performed on Illumina +platforms at a mean coverage of 32–34x. Reads were aligned to GRCh38 (hs38DH reference with +decoy and HLA sequences) using BWA-MEM 0.7.15. Variant calling followed GATK best practices: +per-sample calling with GATK 3.5 HaplotypeCaller followed by joint genotyping with GATK4 using +the Hail VCF combiner for scalable merging. Allele-specific variant quality score recalibration +(AS-VQSR) was applied for both SNPs and indels. Sample QC included contamination estimation +(verifyBamID), sex concordance, relatedness filtering (PC-Relate), and population assignment +using PCA against gnomAD reference panels. Per-population allele frequencies were computed for +80 fine-grained populations as well as broad continental groupings. +

+

+We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +

+ +

Credits

+

+Thanks to the gnomAD team at the Broad Institute for harmonizing and making this dataset +publicly available, and to all participants of the 1000 Genomes Project and the Human Genome +Diversity Project. +

+ +

References

+

+Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian +N et al. + +A harmonized public resource of deeply sequenced diverse human genomes. +Genome Res. 2024 Jun 25;34(5):796-809. +PMID: 38749656; PMC: PMC11216312 +

+ +

+Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J +et al. + +Insights into human genetic variation and population history from 929 diverse genomes. +Science. 2020 Mar 20;367(6484). +PMID: 32193295; PMC: PMC7115999 +