aa61ebc800429515f9ced7e28f669c6042219f43
max
  Wed Mar 18 09:09:13 2026 -0700
varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642

Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and
Methods sections for all 20+ subtrack HTML files with consistent formatting,
sequencing methods from source papers, and links to makeDoc and Github scripts.
Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and
update makeDoc paths accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/gasp.html src/hg/makeDb/trackDb/human/gasp.html
new file mode 100644
index 00000000000..ba30bc28f05
--- /dev/null
+++ src/hg/makeDb/trackDb/human/gasp.html
@@ -0,0 +1,50 @@
+<h2>Description</h2>
+<p>
+The <a href="https://www.genomeasia100k.org/" target="_blank">GenomeAsia 100K</a> project aims
+to sequence 100,000 Asian individuals. This pilot release (GAsP) contains whole-genome sequencing
+data of 1,739 individuals from 219 population groups across Asia. Frequencies are broken down by
+Northeast Asian, Southeast Asian, and South Asian ancestry groups. The data is split into two
+subtracks: substitutions and indels.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The data can be explored interactively with the
+<a href="../cgi-bin/hgTables">Table Browser</a> or the
+<a href="../cgi-bin/hgIntegrator">Data Integrator</a>.
+For programmatic access, our <a href="https://api.genome.ucsc.edu">REST API</a> can be used; the
+track name is <em>gasp</em>.
+For bulk download, the VCF file can be obtained from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/varFreqs/" target="_blank">our download server</a>.
+</p>
+<p>
+The original VCFs are also available from the
+<a href="https://browser.genomeasia100k.org/#tid=download" target="_blank">GenomeAsia 100K
+website</a>. No license nor login is required.
+</p>
+
+<h2>Methods</h2>
+<p>
+Samples were sequenced on Illumina HiSeq 2500, HiSeq 4000, and HiSeq X Ten instruments with
+2&times;100 bp or 2&times;150 bp paired-end reads at an average depth of 36x. Reads were aligned to
+GRCh37 using BWA-MEM. Duplicate reads were marked with SAMBLASTER and sorted with Sambamba.
+Per-sample variant calling was performed with GATK HaplotypeCaller in GVCF mode, followed by
+joint genotyping with GenotypeGVCFs. Variant quality score recalibration (VQSR) was applied at
+a 99% sensitivity tranche for both SNPs and indels. Sample-level QC included contamination
+checks with verifyBamID and sex concordance verification. The final callset contains
+&sim;65 million variants across 1,739 individuals from 219 populations.
+</p>
+<p>
+We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target=_blank>makeDoc file</a> of the track.
+For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target=_blank>Github</a>.
+</p>
+
+<h2>References</h2>
+<p>
+GenomeAsia100K Consortium.
+<a href="https://doi.org/10.1038/s41586-019-1793-z" target="_blank">
+The GenomeAsia 100K Project enables genetic discoveries across Asia</a>.
+<em>Nature</em>. 2019 Dec;576(7785):106-111.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31802016" target="_blank">31802016</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054211/" target="_blank">PMC7054211</a>
+</p>