aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/kova.html src/hg/makeDb/trackDb/human/kova.html new file mode 100644 index 00000000000..bfa7fb2b4da --- /dev/null +++ src/hg/makeDb/trackDb/human/kova.html @@ -0,0 +1,59 @@ +

Description

+

+The Korean Variant Archive (KOVA) +contains 1,896 whole genome sequencing and 3,409 whole exome sequencing data from healthy +individuals of Korean ethnicity. Most of the samples originated from normal tissue of cancer +patients (40.16%), healthy parents of rare disease patients (28.4%), or healthy volunteers +(31.44%). Japanese ancestry is broken down in the INFO field. Coverage 100x for WES, 30x for WGS. +SVs called with Manta are also available. +

+ +

Data Access

+

+Due to license restrictions, the data for this track cannot be downloaded from the UCSC +Genome Browser. The Table Browser, Data Integrator, and download server are not available +for this track. +

+

+TSV data can be requested on the KOVA Downloads website. Our Github repo contains a script that +converts this format to VCF. +

+ +

Methods

+

+Raw reads were aligned to the GRCh38+decoy reference using BWA-MEM v0.7.17 with default +parameters, followed by duplicate marking and coordinate sorting with MarkDuplicatesSpark, and base +quality score recalibration using BQSRPipelineSpark in GATK v4.1.3.0; mapping quality control +metrics were generated with Qualimap v2.2.1. Single-nucleotide variants and small +insertions/deletions were called per sample using GATK HaplotypeCaller in GVCF mode (-ERC GVCF), and +joint genotyping was performed by creating a GenomicsDB with GenomicsDBImport and following GATK +Best Practices, including variant quality score recalibration (VQSR) retaining 99.7% of true SNVs +and 99.0% of true indels based on training sets (workflow detailed in Supplementary Fig. 1). +Downstream analyses followed a modified version of the gnomAD quality-control framework and were +primarily conducted using Hail; after merging WES and WGS data in Hail, multiallelic variants and +variants with genotype quality <20, read depth <10, allelic balance <0.2, or overlapping +low-complexity regions were excluded. +

+

+At UCSC, V7 of the TSV.gz was obtained from the KOVA staff by email and converted to VCF. It is not +available for download from our site but can be requested from the KOVA website. +We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +

+ +

Credits

+

+Thanks to Insu Jang and the KOVA director for providing variant frequencies in TSV format. +

+ +

References

+

+Lee J, Lee J, Jeon S, Lee J, Jang I, Yang JO, Park S, Lee B, Choi J, Choi BO et al. + +A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East +Asian population. +Exp Mol Med. 2022 Nov;54(11):1862-1871. +PMID: 36323850; PMC: PMC9628380 +