383da828477aad2b3c6053880a64fdbfc2a00cd9 max Thu Mar 19 02:30:41 2026 -0700 Fix varFreqs HTML issues and trexplorer citation, from AI code review 2026-03-19, refs #36642 Fix broken $db download URLs to hg38 in 14 HTML files, correct "Japanese" to "Korean" in kova.html, fix "area" typo in schema.html, fix "Finnland" to "Finland" in varFreqs.ra, normalize GREGoR capitalization, fix grammar, quote all target=_blank attributes, capitalize GitHub consistently, and fix bioRxiv citation formatting in trexplorer.html. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/hgdp1kFreq.html src/hg/makeDb/trackDb/human/hgdp1kFreq.html index 7897a903467..9a200a62463 100644 --- src/hg/makeDb/trackDb/human/hgdp1kFreq.html +++ src/hg/makeDb/trackDb/human/hgdp1kFreq.html @@ -12,54 +12,54 @@ clustering display is available in the gnomAD HGDP+1000G track under Phased Variants. The track here does not include the full variant frequencies for all subpopulations, instead, it aggregates frequencies to the main groups, AFR, AMI, AMR, ASJ, EAS, FIN, MID, NFE, OTH, SAS. To access the full frequency information, use the track under "Phased Variants".

Data Access

The data can be explored interactively with the Table Browser or the Data Integrator. For programmatic access, our REST API can be used; the track name is hgdp1kFreq. For bulk download, the VCF file can be obtained from -our download server. +our download server.

The original VCFs with full genotypes can also be downloaded from gnomAD Downloads.

Methods

The gnomAD project reprocessed 4,094 whole genomes from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) through a unified pipeline. Sequencing was performed on Illumina platforms at a mean coverage of 32–34x. Reads were aligned to GRCh38 (hs38DH reference with decoy and HLA sequences) using BWA-MEM 0.7.15. Variant calling followed GATK best practices: per-sample calling with GATK 3.5 HaplotypeCaller followed by joint genotyping with GATK4 using the Hail VCF combiner for scalable merging. Allele-specific variant quality score recalibration (AS-VQSR) was applied for both SNPs and indels. Sample QC included contamination estimation (verifyBamID), sex concordance, relatedness filtering (PC-Relate), and population assignment using PCA against gnomAD reference panels. Per-population allele frequencies were computed for 80 fine-grained populations as well as broad continental groupings.

-We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. -For some tracks, python scripts were necessary and are also available from Github. +We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from GitHub.

Credits

Thanks to the gnomAD team at the Broad Institute for harmonizing and making this dataset publicly available, and to all participants of the 1000 Genomes Project and the Human Genome Diversity Project.

References

Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian N et al. A harmonized public resource of deeply sequenced diverse human genomes.