aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/gasp.html src/hg/makeDb/trackDb/human/gasp.html new file mode 100644 index 00000000000..ba30bc28f05 --- /dev/null +++ src/hg/makeDb/trackDb/human/gasp.html @@ -0,0 +1,50 @@ +

Description

+

+The GenomeAsia 100K project aims +to sequence 100,000 Asian individuals. This pilot release (GAsP) contains whole-genome sequencing +data of 1,739 individuals from 219 population groups across Asia. Frequencies are broken down by +Northeast Asian, Southeast Asian, and South Asian ancestry groups. The data is split into two +subtracks: substitutions and indels. +

+ +

Data Access

+

+The data can be explored interactively with the +Table Browser or the +Data Integrator. +For programmatic access, our REST API can be used; the +track name is gasp. +For bulk download, the VCF file can be obtained from +our download server. +

+

+The original VCFs are also available from the +GenomeAsia 100K +website. No license nor login is required. +

+ +

Methods

+

+Samples were sequenced on Illumina HiSeq 2500, HiSeq 4000, and HiSeq X Ten instruments with +2×100 bp or 2×150 bp paired-end reads at an average depth of 36x. Reads were aligned to +GRCh37 using BWA-MEM. Duplicate reads were marked with SAMBLASTER and sorted with Sambamba. +Per-sample variant calling was performed with GATK HaplotypeCaller in GVCF mode, followed by +joint genotyping with GenotypeGVCFs. Variant quality score recalibration (VQSR) was applied at +a 99% sensitivity tranche for both SNPs and indels. Sample-level QC included contamination +checks with verifyBamID and sex concordance verification. The final callset contains +∼65 million variants across 1,739 individuals from 219 populations. +

+

+We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +

+ +

References

+

+GenomeAsia100K Consortium. + +The GenomeAsia 100K Project enables genetic discoveries across Asia. +Nature. 2019 Dec;576(7785):106-111. +PMID: 31802016; PMC: PMC7054211 +