aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/kova.html src/hg/makeDb/trackDb/human/kova.html new file mode 100644 index 00000000000..bfa7fb2b4da --- /dev/null +++ src/hg/makeDb/trackDb/human/kova.html @@ -0,0 +1,59 @@ +<h2>Description</h2> +<p> +The <a href="https://www.kobic.re.kr/kova/" target="_blank">Korean Variant Archive (KOVA)</a> +contains 1,896 whole genome sequencing and 3,409 whole exome sequencing data from healthy +individuals of Korean ethnicity. Most of the samples originated from normal tissue of cancer +patients (40.16%), healthy parents of rare disease patients (28.4%), or healthy volunteers +(31.44%). Japanese ancestry is broken down in the INFO field. Coverage 100x for WES, 30x for WGS. +SVs called with Manta are also available. +</p> + +<h2>Data Access</h2> +<p> +Due to license restrictions, the data for this track cannot be downloaded from the UCSC +Genome Browser. The Table Browser, Data Integrator, and download server are not available +for this track. +</p> +<p> +TSV data can be requested on the <a href="https://www.kobic.re.kr/kova/downloads" +target="_blank">KOVA Downloads</a> website. Our Github repo contains a script that +converts this format to VCF. +</p> + +<h2>Methods</h2> +<p> +Raw reads were aligned to the GRCh38+decoy reference using BWA-MEM v0.7.17 with default +parameters, followed by duplicate marking and coordinate sorting with MarkDuplicatesSpark, and base +quality score recalibration using BQSRPipelineSpark in GATK v4.1.3.0; mapping quality control +metrics were generated with Qualimap v2.2.1. Single-nucleotide variants and small +insertions/deletions were called per sample using GATK HaplotypeCaller in GVCF mode (-ERC GVCF), and +joint genotyping was performed by creating a GenomicsDB with GenomicsDBImport and following GATK +Best Practices, including variant quality score recalibration (VQSR) retaining 99.7% of true SNVs +and 99.0% of true indels based on training sets (workflow detailed in Supplementary Fig. 1). +Downstream analyses followed a modified version of the gnomAD quality-control framework and were +primarily conducted using Hail; after merging WES and WGS data in Hail, multiallelic variants and +variants with genotype quality <20, read depth <10, allelic balance <0.2, or overlapping +low-complexity regions were excluded. +</p> +<p> +At UCSC, V7 of the TSV.gz was obtained from the KOVA staff by email and converted to VCF. It is not +available for download from our site but can be requested from the KOVA website. +We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target=_blank>makeDoc file</a> of the track. +For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target=_blank>Github</a>. +</p> + +<h2>Credits</h2> +<p> +Thanks to Insu Jang and the KOVA director for providing variant frequencies in TSV format. +</p> + +<h2>References</h2> +<p> +Lee J, Lee J, Jeon S, Lee J, Jang I, Yang JO, Park S, Lee B, Choi J, Choi BO <em>et al</em>. +<a href="https://doi.org/10.1038/s12276-022-00871-4" target="_blank"> +A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East +Asian population</a>. +<em>Exp Mol Med</em>. 2022 Nov;54(11):1862-1871. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36323850" target="_blank">36323850</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9628380/" target="_blank">PMC9628380</a> +</p>