9bfd58221b1539193cb7f0a317b4e959c1c7e49a max Thu May 21 01:00:45 2026 -0700 varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642 Co-Authored-By: Claude Sonnet 4.6 diff --git src/hg/makeDb/trackDb/human/wbbc.html src/hg/makeDb/trackDb/human/wbbc.html index 520409f384c..77bd5670856 100644 --- src/hg/makeDb/trackDb/human/wbbc.html +++ src/hg/makeDb/trackDb/human/wbbc.html @@ -1,67 +1,66 @@

Description

This track shows allele frequencies for 78.6 million variants from 4,480 whole-genome-sequenced Chinese individuals released by the Westlake BioBank for Chinese -(WBBC) pilot project. The WBBC is a population study of around 35,000 -Chinese volunteers spanning 31 provinces; about 15,000 of them have -been deeply phenotyped and a subset have been whole-genome sequenced. +(WBBC) pilot project. The WBBC is a population study of about 35,000 +Chinese volunteers across 31 provinces; about 15,000 have been deeply +phenotyped and a subset have been whole-genome sequenced. The frequencies are also broken down into four Han Chinese regional groups (North, Central, South, Lingnan) defined by recruitment province in the WBBC paper.

The pilot project has been folded into the larger China Precision BioBank (CPBB) initiative, which is collecting up to 100,000 samples nationwide. The variant frequencies on this track are from the original WBBC Phase I release (v20210103) and are unchanged by the rebranding.

Display

The track uses the standard UCSC VCF display. Hovering a variant shows the cohort allele frequency, the four regional frequencies, sequencing depth, GATK VQSR log-odds score, and the per-genotype hom-ref / het / hom-alt sample counts as reported by WBBC.

Methods

The WBBC pilot whole-genome-sequenced 4,535 individuals at a mean depth -of around 13.9x on Illumina HiSeq X10 platforms, after dropping samples -that failed standard QC. Reads were aligned to GRCh38 with BWA-MEM, -variants were jointly called with GATK 4.0 HaplotypeCaller, and the -callset was hard-filtered with VQSR. The 4,480 unrelated samples -released for download were stratified into four Han Chinese regional -groups (North, Central, South and Lingnan, which together cover roughly -the 27 administrative divisions reached by the pilot). Allele counts -and frequencies are reported overall and per region. See Cong et al. -2022 (in References below) for full sample-selection and pipeline -details. +of 13.9x on Illumina HiSeq X10 platforms, after dropping samples that +failed standard QC. Reads were aligned to GRCh38 with BWA-MEM, variants +were jointly called with GATK 4.0 HaplotypeCaller, and the callset was +hard-filtered with VQSR. The 4,480 unrelated samples released for download +were stratified into four Han Chinese regional groups (North, Central, +South and Lingnan, which together cover 27 of the administrative divisions +the pilot reached). Allele counts and frequencies are reported overall +and per region. See Cong et al. 2022 (in References below) for +full sample-selection and pipeline details.

The per-chromosome WGS sites VCFs (chr1-22) were downloaded from https://wbbc.westlake.edu.cn/ (URL pattern: WBBC.chr<N>.GRCh38.vcf.gz). We concatenated the 22 files with bcftools concat, re-headered the result to add the standard hg38 contig lines and proper INFO definitions, then dropped variants with cohort allele count zero (multi-allelic splits -that are not observed in the WBBC samples; ~1.9% of rows), and sorted, -bgzipped and tabix-indexed the result. No coordinate liftover was +that no WBBC sample carries; ~1.9% of rows), and sorted, bgzipped and +tabix-indexed the result. No coordinate liftover was needed: the upstream files are already on GRCh38 with chr-prefixed chromosomes. The pipeline is recorded in the makeDoc file of the track.

Caveats

Only autosomes (chr1-22) are present; chrX/Y/M are not in the WBBC download. Variants reported as AC=0 in the WBBC release (about 1.9 % of rows, mostly multi-allelic split sites that no WBBC individual carries) have been removed from this track.

Data Access