37a5b97198453bd06cb03d2092cae239f368e84c
max
  Wed Apr 29 05:49:14 2026 -0700
varFreqs: add tishkoff180 subtrack (Fan et al. 2023, 180 indigenous African WGS, hg19 lift)

Sites-only SNP VCF with aggregate AC/AF/AN from 180 individuals (15 each
from 12 populations across Ethiopia, Tanzania, Cameroon, Botswana),
sequenced at >30x on HiSeq X Ten. hg19 calls supplied by the Tishkoff
lab (UPenn) and lifted to hg38 with CrossMap. Redistribution is not
permitted, so tableBrowser is disabled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>, refs #36642

diff --git src/hg/makeDb/trackDb/human/tishkoff180.html src/hg/makeDb/trackDb/human/tishkoff180.html
new file mode 100644
index 00000000000..661aa78a1f4
--- /dev/null
+++ src/hg/makeDb/trackDb/human/tishkoff180.html
@@ -0,0 +1,93 @@
+<h2>Description</h2>
+<p>
+This track shows allele frequencies from high-coverage whole-genome sequencing of
+180 individuals (15 per population) from 12 indigenous African populations representing
+all four major African language phyla (Khoesan, Niger-Congo, Nilo-Saharan, Afroasiatic).
+The cohort, generated by the Tishkoff lab and collaborators (Fan et al., <em>Cell</em> 2023),
+spans the Amhara, Dizi, Chabu and Mursi from Ethiopia; the Hadza and Sandawe from Tanzania;
+the Central African rainforest hunter-gatherers (Baka and Bagyeli, merged), Fulani and Tikari
+from Cameroon; and the Herero, Ju|&apos;hoansi and !Xoo (the latter two collectively the &quot;San&quot;)
+from Botswana. The dataset was generated to capture demographic history and signatures of
+local adaptation in African populations that are poorly represented in other reference panels.
+</p>
+
+<p>
+Only aggregate allele frequencies (AC, AF, AN summed over all 180 individuals) are
+shown for each variant; per-population frequencies are not provided in the released
+sites VCF. The original variant calls were on the GRCh37/hs37d5 reference and were
+lifted to hg38 at UCSC.
+</p>
+
+<h2>Display Conventions</h2>
+<p>
+Variants display as standard VCF allele frequency tracks. On mouseover and click,
+the allele count (AC), total allele number (AN) and allele frequency (AF) are shown.
+When zoomed in, alleles are colored by base. Multi-allelic records were split into
+biallelic rows during normalization upstream.
+</p>
+
+<h2>Methods</h2>
+<p>
+Whole genome sequencing of 180 individuals (15 unrelated samples per population)
+was performed at &gt;30&times; average coverage on the Illumina HiSeq X Ten platform
+using PCR-free library preparation with paired-end 150&nbsp;bp reads and a 350&nbsp;bp
+insert size. Adapters were trimmed with trimadap, optical duplicates were marked with
+SAMBLASTER (v0.1.22), and reads were aligned to the hs37d5 decoy version of GRCh37
+with BWA-MEM (v0.7.10). Reads with mapping quality &lt;&nbsp;20 were filtered. Per-sample
+short variants were called with GATK HaplotypeCaller (nightly-2016-09-26-gfade77f) in
+gVCF mode using a custom genotype prior (0.4995, 0.001, 0.4995) to reduce reference
+bias, following the SGDP recommendation. Joint genotyping was performed with GATK
+GenotypeGVCFs. Variants were filtered with GATK VQSR using 1000 Genomes Phase 3,
+Illumina Omni 5M and HapMap as SNP truth sets and Mills indels as the indel truth set.
+Variants overlapping potential duplications detected by Delly (v0.7.6) and low-complexity
+regions were excluded. After QC the cohort yielded 32.4&nbsp;M SNPs and 2.8&nbsp;M small
+indels. The publicly released SNP-only sites VCF used here contains 33.6&nbsp;M
+biallelic SNPs with aggregate AC/AF/AN summaries. See Fan et al. (2023) for full
+methods.
+</p>
+
+<p>
+The hg19 SNPs sites VCF was provided directly by Matthew Hansen at the Tishkoff lab
+(University of Pennsylvania) via a Box link
+(<tt>180wgs.SNPs.sites.AF.vcf.gz</tt>). Bare chromosome names (1-22) were converted
+to UCSC-style names with <tt>bcftools annotate --rename-chrs</tt>, the VCF was lifted
+from hg19 to hg38 with <tt>CrossMap.py vcf</tt> using the UCSC
+<tt>hg19ToHg38.over.chain.gz</tt> chain, then sorted, bgzip-compressed and tabix-indexed
+with <tt>bcftools sort</tt> and <tt>tabix</tt>. Step-by-step processing instructions are in
+the
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt"
+target="_blank">makeDoc file</a>; the supporting scripts live under
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs"
+target="_blank">kent/src/hg/makeDb/scripts/varFreqs</a>.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The original (hg19) variant calls and supplementary data accompany the publication;
+see the &quot;Data and code availability&quot; section of Fan et al. (2023). The dataset is
+not available for redistribution from our website, so the Table Browser, Data
+Integrator and download server are disabled for this track. The hg19 sites VCF can
+be requested from the Tishkoff lab at the University of Pennsylvania.
+</p>
+
+<h2>Credits</h2>
+<p>
+Thanks to Matthew Hansen and Sarah Tishkoff (University of Pennsylvania) for sharing
+the sites-only allele-frequency VCF, and to all participating individuals and field
+collaborators in Ethiopia, Tanzania, Cameroon and Botswana whose contributions made
+this dataset possible.
+</p>
+
+<h2>References</h2>
+
+<p>
+Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, Ranciaro A, Hirbo J, Beggs W, Thomas
+N <em>et al</em>.
+<a href="https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(23)00101-0" target="_blank">
+Whole-genome sequencing reveals a complex African population demographic history and signatures of
+local adaptation</a>.
+<em>Cell</em>. 2023 Mar 2;186(5):923-939.e14.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36868214" target="_blank">36868214</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10568978/" target="_blank">PMC10568978</a>
+</p>
+