9bfd58221b1539193cb7f0a317b4e959c1c7e49a
max
Thu May 21 01:00:45 2026 -0700
varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful <b> emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
diff --git src/hg/makeDb/trackDb/human/wbbc.html src/hg/makeDb/trackDb/human/wbbc.html
index 520409f384c..77bd5670856 100644
--- src/hg/makeDb/trackDb/human/wbbc.html
+++ src/hg/makeDb/trackDb/human/wbbc.html
@@ -1,104 +1,103 @@
<h2>Description</h2>
<p>
This track shows allele frequencies for 78.6 million variants from
4,480 whole-genome-sequenced Chinese individuals released by the
<a href="https://cpbb.cn/" target="_blank">Westlake BioBank for Chinese
-(WBBC)</a> pilot project. The WBBC is a population study of around 35,000
-Chinese volunteers spanning 31 provinces; about 15,000 of them have
-been deeply phenotyped and a subset have been whole-genome sequenced.
+(WBBC)</a> pilot project. The WBBC is a population study of about 35,000
+Chinese volunteers across 31 provinces; about 15,000 have been deeply
+phenotyped and a subset have been whole-genome sequenced.
The frequencies are also broken down into four Han Chinese regional
groups (North, Central, South, Lingnan) defined by recruitment province
in the WBBC paper.
</p>
<p>
The pilot project has been folded into the larger
<a href="https://cpbb.cn/" target="_blank">China Precision BioBank
(CPBB)</a> initiative, which is collecting up to 100,000 samples
nationwide. The variant frequencies on this track are from the original
WBBC Phase I release (v20210103) and are unchanged by the rebranding.
</p>
<h2>Display</h2>
<p>
The track uses the standard UCSC VCF display. Hovering a variant shows
the cohort allele frequency, the four regional frequencies, sequencing
depth, GATK VQSR log-odds score, and the per-genotype hom-ref / het /
hom-alt sample counts as reported by WBBC.
</p>
<h2>Methods</h2>
<p>
The WBBC pilot whole-genome-sequenced 4,535 individuals at a mean depth
-of around 13.9x on Illumina HiSeq X10 platforms, after dropping samples
-that failed standard QC. Reads were aligned to GRCh38 with BWA-MEM,
-variants were jointly called with GATK 4.0 HaplotypeCaller, and the
-callset was hard-filtered with VQSR. The 4,480 unrelated samples
-released for download were stratified into four Han Chinese regional
-groups (North, Central, South and Lingnan, which together cover roughly
-the 27 administrative divisions reached by the pilot). Allele counts
-and frequencies are reported overall and per region. See Cong <em>et al.</em>
-2022 (in References below) for full sample-selection and pipeline
-details.
+of 13.9x on Illumina HiSeq X10 platforms, after dropping samples that
+failed standard QC. Reads were aligned to GRCh38 with BWA-MEM, variants
+were jointly called with GATK 4.0 HaplotypeCaller, and the callset was
+hard-filtered with VQSR. The 4,480 unrelated samples released for download
+were stratified into four Han Chinese regional groups (North, Central,
+South and Lingnan, which together cover 27 of the administrative divisions
+the pilot reached). Allele counts and frequencies are reported overall
+and per region. See Cong <em>et al.</em> 2022 (in References below) for
+full sample-selection and pipeline details.
</p>
<p>
The per-chromosome WGS sites VCFs (chr1-22) were downloaded from
<a href="https://wbbc.westlake.edu.cn/" target="_blank">https://wbbc.westlake.edu.cn/</a>
(URL pattern: <tt>WBBC.chr<N>.GRCh38.vcf.gz</tt>). We concatenated
the 22 files with <tt>bcftools concat</tt>, re-headered the result to
add the standard hg38 contig lines and proper INFO definitions, then
dropped variants with cohort allele count zero (multi-allelic splits
-that are not observed in the WBBC samples; ~1.9% of rows), and sorted,
-bgzipped and tabix-indexed the result. No coordinate liftover was
+that no WBBC sample carries; ~1.9% of rows), and sorted, bgzipped and
+tabix-indexed the result. No coordinate liftover was
needed: the upstream files are already on GRCh38 with chr-prefixed
chromosomes. The pipeline is recorded in the
<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc
file</a> of the track.
</p>
<h2>Caveats</h2>
<p>
Only autosomes (chr1-22) are present; chrX/Y/M are not in the WBBC
download. Variants reported as AC=0 in the WBBC release (about 1.9 %
of rows, mostly multi-allelic split sites that no WBBC individual
carries) have been removed from this track.
</p>
<h2>Data Access</h2>
<p>
The variant frequencies can be explored interactively using the
<a href="../cgi-bin/hgTables">Table Browser</a> or the
<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and exported to
spreadsheet or tab-separated tables. From scripts, the data can be
accessed via our <a href="https://api.genome.ucsc.edu" target="_blank">REST
API</a> with <tt>track=wbbc</tt>.
</p>
<p>
The VCF file is also available from
<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/wbbc/" target="_blank">our
download server</a> as <tt>wbbc.vcf.gz</tt>. Individual regions can be
extracted with <tt>tabix</tt>, for example
<tt>tabix http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/wbbc/wbbc.vcf.gz chr21:1-100000000</tt>.
The original per-chromosome WBBC release is distributed at
<a href="https://wbbc.westlake.edu.cn/" target="_blank">https://wbbc.westlake.edu.cn/</a>.
</p>
<h2>Credits</h2>
<p>
Thanks to the WBBC participants and to the Westlake University team
(Pei-Kuan Cong, Hou-Feng Zheng and colleagues) for making the pilot
sites-only VCFs publicly available.
</p>
<h2>References</h2>
<p>
Cong PK, Bai WY, Li JC, Yang MY, Khederzadeh S, Gai SR, Li N, Liu YH, Yu SH, Zhao WW <em>et al</em>.
<a href="https://doi.org/10.1038/s41467-022-30526-x" target="_blank">
Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project</a>.
<em>Nat Commun</em>. 2022 May 26;13(1):2939.
PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35618720" target="_blank">35618720</a>; PMC: <a
href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9135724/" target="_blank">PMC9135724</a>
</p>