85a3ec13e80a0e61f16e691afb878956e0483892 max Fri Nov 28 08:53:18 2025 -0800 adding Finnland to var freqs track, refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index 053f7ef112c..8dc145fd00a 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -57,33 +57,38 @@
Most tracks only show the variant and allele frequencies on mouseover or clicks. When zoomed in, tracks display alleles with base-specific coloring. Homozygote data are shown as one letter, while heterozygotes will be displayed with both letters.
For NCBI ALFA: This track has no single VCF with INFO fields, but uses multiple subtracks instead, one per ancestry. @@ -165,53 +167,71 @@ Regeneron one million exomes: VCFs with summarized allele frequencies are available from the RGC ME website.
TOPMED: VCFs with summarized allele frequencies are available from the TOPMED BRAVO website. They require a login.
GenomeAsia Pilot: VCFs are available from UCSC and also from the GenomeAsia 100K website. No license nor login.
+KOVA: + TSV data can be requested on the KOVA Downloads website. +
+ +Finngen: TSV data can be requested via the form at https://finngen.gitbook.io/documentation/data-download which triggers an email with the download link.
+ +NPM: + VCF access can be requested on the + Chorus Browser website, which requires an + account and data access request. +
+MXB: Genotyping was performed with the Illumina Multi-Ethnic Global Array (MEGA, ~1.8M SNPs), optimized for admixed populations and enriched for ancestry-informative and medically relevant variants. Only autosomal, biallelic SNPs passing quality control are included. Samples were selected from 898 recruitment sites, with prioritization of indigenous language speakers. Data processing included GenomeStudio → PLINK conversion, strand alignment, removal of duplicates, update of map positions using dbSNP Build 151 and low-quality variants/individuals, and relatedness filtering.
SGDP: The version used was https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/vcf_variants/, merged with bcftools and lifted to hg38 with CrossMap.
KOVA: V7 of the TSV.gz was obtained from the KOVA staff and converted to VCF. It is not available for download from our site but can be requested from the KOVA website.
-Finngen: R12 was downloaded from https://finngen.gitbook.io/documentation/data-download and converted to VCF with a Python script.
+Finngen: R12 annotated variants were downloaded from the Google Cloud +bucket link received though an email after filling out the form linked from +https://finngen.gitbook.io/documentation/data-download and converted to VCF +with a custom Python script.
NPM Singapore: Whole Genome Sequencing (WGS) data processing followed GATK4 best practices. GATK4 germline variant analysis workflow written in WDL was adapted to use Nextflow and deployed at the National Supercomputing Centre, Singapore (NSCC). In short, WGS reads were aligned against GRCh38 using the BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single sample gVCFs. The gVCF files were joint-called then loaded in Hail, an open-source python-based data analysis library suited to work with population-scale with genomic data collections. Low-quality WGS libraries and low-quality variants were removed. QC-ed variants were functionally annotated using Ensembl Variant Effect Predictor (VEP) (version 95). Functional annotations for variant impacting protein-coding were also complemented with information on the potential alteration to their cognate protein's 3D structure and drug binding ability.