986c4ede954e44904eb314772fb2cf83a48d307c max Wed May 6 06:24:47 2026 -0700 varFreqs: lift GenomeAsia (gasp + gaspIndel) GRCh37 -> hg38 Both subtracks were served at /gbdb/hg38/ but the upstream callset is GRCh37 (caught in QA, see #36642 note 2026-05-04). Lifted with CrossMap using hg19ToHg38.over.chain.gz; recipe matches tishkoff180 / mxbFreq. gasp (SNVs): 66,236,516 -> 66,222,771 (99.98%; 6,240 unmapped + 7,505 alt/random) gaspIndel: 4,415,156 -> 4,410,871 (99.90%; 3,332 unmapped + 953 alt/random) New driver script: scripts/varFreqs/gaspLift.sh. gaspIndel bigDataUrl renamed from All.indels.annot.cont_withmaf.vcf.gz to ga100k.indels.vcf.gz (old name was a verbatim copy of the upstream download name). varFreqsAll combined bigBed regenerated to fold in the corrected coordinates (36.5 GB, 1,166,451,644 items, 125 fields). refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/gasp.html src/hg/makeDb/trackDb/human/gasp.html index 8efa4f255bb..09e43208d44 100644 --- src/hg/makeDb/trackDb/human/gasp.html +++ src/hg/makeDb/trackDb/human/gasp.html @@ -23,28 +23,34 @@ website. No license nor login is required.

Methods

Samples were sequenced on Illumina HiSeq 2500, HiSeq 4000, and HiSeq X Ten instruments with 2×100 bp or 2×150 bp paired-end reads at an average depth of 36x. Reads were aligned to GRCh37 using BWA-MEM. Duplicate reads were marked with SAMBLASTER and sorted with Sambamba. Per-sample variant calling was performed with GATK HaplotypeCaller in GVCF mode, followed by joint genotyping with GenotypeGVCFs. Variant quality score recalibration (VQSR) was applied at a 99% sensitivity tranche for both SNPs and indels. Sample-level QC included contamination checks with verifyBamID and sex concordance verification. The final callset contains ∼65 million variants across 1,739 individuals from 219 populations.

+The upstream callset is on GRCh37. We lifted it to hg38 using +CrossMap and the UCSC +hg19ToHg38 chain file. After lifting, variants that landed on alt, random, fix, or +unplaced contigs were dropped, and the result was sorted and indexed with tabix. +

We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.

References

GenomeAsia100K Consortium. The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature. 2019 Dec;576(7785):106-111. PMID: 31802016; PMC: PMC7054211