383da828477aad2b3c6053880a64fdbfc2a00cd9 max Thu Mar 19 02:30:41 2026 -0700 Fix varFreqs HTML issues and trexplorer citation, from AI code review 2026-03-19, refs #36642 Fix broken $db download URLs to hg38 in 14 HTML files, correct "Japanese" to "Korean" in kova.html, fix "area" typo in schema.html, fix "Finnland" to "Finland" in varFreqs.ra, normalize GREGoR capitalization, fix grammar, quote all target=_blank attributes, capitalize GitHub consistently, and fix bioRxiv citation formatting in trexplorer.html. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/kova.html src/hg/makeDb/trackDb/human/kova.html index bfa7fb2b4da..a31dbab8906 100644 --- src/hg/makeDb/trackDb/human/kova.html +++ src/hg/makeDb/trackDb/human/kova.html @@ -1,57 +1,57 @@

Description

The Korean Variant Archive (KOVA) contains 1,896 whole genome sequencing and 3,409 whole exome sequencing data from healthy individuals of Korean ethnicity. Most of the samples originated from normal tissue of cancer patients (40.16%), healthy parents of rare disease patients (28.4%), or healthy volunteers -(31.44%). Japanese ancestry is broken down in the INFO field. Coverage 100x for WES, 30x for WGS. +(31.44%). Korean ancestry is not broken down further in the INFO field. Coverage 100x for WES, 30x for WGS. SVs called with Manta are also available.

Data Access

Due to license restrictions, the data for this track cannot be downloaded from the UCSC Genome Browser. The Table Browser, Data Integrator, and download server are not available for this track.

TSV data can be requested on the KOVA Downloads website. Our Github repo contains a script that +target="_blank">KOVA Downloads website. Our GitHub repo contains a script that converts this format to VCF.

Methods

Raw reads were aligned to the GRCh38+decoy reference using BWA-MEM v0.7.17 with default parameters, followed by duplicate marking and coordinate sorting with MarkDuplicatesSpark, and base quality score recalibration using BQSRPipelineSpark in GATK v4.1.3.0; mapping quality control metrics were generated with Qualimap v2.2.1. Single-nucleotide variants and small insertions/deletions were called per sample using GATK HaplotypeCaller in GVCF mode (-ERC GVCF), and joint genotyping was performed by creating a GenomicsDB with GenomicsDBImport and following GATK Best Practices, including variant quality score recalibration (VQSR) retaining 99.7% of true SNVs and 99.0% of true indels based on training sets (workflow detailed in Supplementary Fig. 1). Downstream analyses followed a modified version of the gnomAD quality-control framework and were primarily conducted using Hail; after merging WES and WGS data in Hail, multiallelic variants and variants with genotype quality <20, read depth <10, allelic balance <0.2, or overlapping low-complexity regions were excluded.

At UCSC, V7 of the TSV.gz was obtained from the KOVA staff by email and converted to VCF. It is not available for download from our site but can be requested from the KOVA website. -We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. -For some tracks, python scripts were necessary and are also available from Github. +We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from GitHub.

Credits

Thanks to Insu Jang and the KOVA director for providing variant frequencies in TSV format.

References

Lee J, Lee J, Jeon S, Lee J, Jang I, Yang JO, Park S, Lee B, Choi J, Choi BO et al. A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East Asian population. Exp Mol Med. 2022 Nov;54(11):1862-1871. PMID: 36323850; PMC: