986c4ede954e44904eb314772fb2cf83a48d307c max Wed May 6 06:24:47 2026 -0700 varFreqs: lift GenomeAsia (gasp + gaspIndel) GRCh37 -> hg38 Both subtracks were served at /gbdb/hg38/ but the upstream callset is GRCh37 (caught in QA, see #36642 note 2026-05-04). Lifted with CrossMap using hg19ToHg38.over.chain.gz; recipe matches tishkoff180 / mxbFreq. gasp (SNVs): 66,236,516 -> 66,222,771 (99.98%; 6,240 unmapped + 7,505 alt/random) gaspIndel: 4,415,156 -> 4,410,871 (99.90%; 3,332 unmapped + 953 alt/random) New driver script: scripts/varFreqs/gaspLift.sh. gaspIndel bigDataUrl renamed from All.indels.annot.cont_withmaf.vcf.gz to ga100k.indels.vcf.gz (old name was a verbatim copy of the upstream download name). varFreqsAll combined bigBed regenerated to fold in the corrected coordinates (36.5 GB, 1,166,451,644 items, 125 fields). refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/gasp.html src/hg/makeDb/trackDb/human/gasp.html index 8efa4f255bb..09e43208d44 100644 --- src/hg/makeDb/trackDb/human/gasp.html +++ src/hg/makeDb/trackDb/human/gasp.html @@ -23,28 +23,34 @@ website</a>. No license nor login is required. </p> <h2>Methods</h2> <p> Samples were sequenced on Illumina HiSeq 2500, HiSeq 4000, and HiSeq X Ten instruments with 2×100 bp or 2×150 bp paired-end reads at an average depth of 36x. Reads were aligned to GRCh37 using BWA-MEM. Duplicate reads were marked with SAMBLASTER and sorted with Sambamba. Per-sample variant calling was performed with GATK HaplotypeCaller in GVCF mode, followed by joint genotyping with GenotypeGVCFs. Variant quality score recalibration (VQSR) was applied at a 99% sensitivity tranche for both SNPs and indels. Sample-level QC included contamination checks with verifyBamID and sex concordance verification. The final callset contains ∼65 million variants across 1,739 individuals from 219 populations. </p> <p> +The upstream callset is on GRCh37. We lifted it to hg38 using +<a href="https://crossmap.sourceforge.net/" target="_blank">CrossMap</a> and the UCSC +<tt>hg19ToHg38</tt> chain file. After lifting, variants that landed on alt, random, fix, or +unplaced contigs were dropped, and the result was sorted and indexed with tabix. +</p> +<p> We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc file</a> of the track. For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target="_blank">GitHub</a>. </p> <h2>References</h2> <p> GenomeAsia100K Consortium. <a href="https://doi.org/10.1038/s41586-019-1793-z" target="_blank"> The GenomeAsia 100K Project enables genetic discoveries across Asia</a>. <em>Nature</em>. 2019 Dec;576(7785):106-111. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31802016" target="_blank">31802016</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054211/" target="_blank">PMC7054211</a> </p>