src/hg/makeDb/trackDb/human/gasp.html 986c4ede954e44904eb314772fb2cf83a48d307c

986c4ede954e44904eb314772fb2cf83a48d307c
max
  Wed May 6 06:24:47 2026 -0700
varFreqs: lift GenomeAsia (gasp + gaspIndel) GRCh37 -> hg38

Both subtracks were served at /gbdb/hg38/ but the upstream callset is
GRCh37 (caught in QA, see #36642 note 2026-05-04). Lifted with CrossMap
using hg19ToHg38.over.chain.gz; recipe matches tishkoff180 / mxbFreq.

gasp (SNVs):   66,236,516 -> 66,222,771 (99.98%; 6,240 unmapped + 7,505 alt/random)
gaspIndel:      4,415,156 ->  4,410,871 (99.90%; 3,332 unmapped +   953 alt/random)

New driver script: scripts/varFreqs/gaspLift.sh. gaspIndel bigDataUrl
renamed from All.indels.annot.cont_withmaf.vcf.gz to ga100k.indels.vcf.gz
(old name was a verbatim copy of the upstream download name).
varFreqsAll combined bigBed regenerated to fold in the corrected
coordinates (36.5 GB, 1,166,451,644 items, 125 fields).

refs #36642

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/gasp.html src/hg/makeDb/trackDb/human/gasp.html
index 8efa4f255bb..09e43208d44 100644
--- src/hg/makeDb/trackDb/human/gasp.html
+++ src/hg/makeDb/trackDb/human/gasp.html
@@ -23,28 +23,34 @@
 website</a>. No license nor login is required.
 </p>
 
 <h2>Methods</h2>
 <p>
 Samples were sequenced on Illumina HiSeq 2500, HiSeq 4000, and HiSeq X Ten instruments with
 2&times;100 bp or 2&times;150 bp paired-end reads at an average depth of 36x. Reads were aligned to
 GRCh37 using BWA-MEM. Duplicate reads were marked with SAMBLASTER and sorted with Sambamba.
 Per-sample variant calling was performed with GATK HaplotypeCaller in GVCF mode, followed by
 joint genotyping with GenotypeGVCFs. Variant quality score recalibration (VQSR) was applied at
 a 99% sensitivity tranche for both SNPs and indels. Sample-level QC included contamination
 checks with verifyBamID and sex concordance verification. The final callset contains
 &sim;65 million variants across 1,739 individuals from 219 populations.
 </p>
 <p>
+The upstream callset is on GRCh37. We lifted it to hg38 using
+<a href="https://crossmap.sourceforge.net/" target="_blank">CrossMap</a> and the UCSC
+<tt>hg19ToHg38</tt> chain file. After lifting, variants that landed on alt, random, fix, or
+unplaced contigs were dropped, and the result was sorted and indexed with tabix.
+</p>
+<p>
 We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc file</a> of the track.
 For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target="_blank">GitHub</a>.
 </p>
 
 <h2>References</h2>
 <p>
 GenomeAsia100K Consortium.
 <a href="https://doi.org/10.1038/s41586-019-1793-z" target="_blank">
 The GenomeAsia 100K Project enables genetic discoveries across Asia</a>.
 <em>Nature</em>. 2019 Dec;576(7785):106-111.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31802016" target="_blank">31802016</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054211/" target="_blank">PMC7054211</a>
 </p>