9bfd58221b1539193cb7f0a317b4e959c1c7e49a max Thu May 21 01:00:45 2026 -0700 varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642 Co-Authored-By: Claude Sonnet 4.6 diff --git src/hg/makeDb/trackDb/human/chinamap.html src/hg/makeDb/trackDb/human/chinamap.html index 4909748f4ef..4a3a4ed98a9 100644 --- src/hg/makeDb/trackDb/human/chinamap.html +++ src/hg/makeDb/trackDb/human/chinamap.html @@ -5,68 +5,68 @@ 10,588 Chinese individuals deep-whole-genome-sequenced at a mean depth of about 40x by the China Metabolic Analytics Project (ChinaMAP). Participants come from three large Chinese cohort studies (the China Noncommunicable Disease Surveillance, the REACTION study and the Community-based Cardiovascular Risk During Urbanization in Shanghai study) and span 27 provinces of China and eight ethnic populations (Han, Hui, Manchu, Miao, Mongolian, Yi, Tibetan and Zhuang). For each variant the track records the cohort allele count, allele number and allele frequency. The original release also ships the matched 1000 Genomes Project (1KGP) allele frequencies (global, EAS, AMR, AFR, EUR and SAS) as INFO fields, which are kept verbatim in the VCF.

Display

-The track uses the standard UCSC VCF display. Hovering a variant -shows the cohort allele frequency and count, the total number of -called alleles, and the 1KGP frequencies that the ChinaMAP release -ships alongside each site. +The track uses the standard UCSC VCF display. When you hover over a +variant, the popup shows the cohort allele frequency and count, the +total number of called alleles, and the 1KGP frequencies that the +ChinaMAP release ships alongside each site.

Methods

DNA from each participant was prepared with the QIAGEN DNeasy Blood & Tissue Kit, sheared by Covaris, ligated to BGISEQ-500 adapters and rolling-circle amplified into DNA nanoballs for 100 bp paired-end sequencing on the BGISEQ-500 platform at BGI Genomics. Reads were quality-filtered with SOAPnuke v1.5.6, aligned to GRCh38 (GENCODE release) with BWA-MEM v0.7.16a, coordinate-sorted with Picard SortSam v2.13.2, and duplicate-marked and base-quality recalibrated with GATK v4.beta.4. Samples were required to pass six QC criteria (base quality Q30 > 80%, mean depth > 30x, mapping rate ≥ 95%, mismatch rate < 1%, duplicate rate < 10% and 20x coverage > 80%) and a 21-SNP mass spectrometric fingerprint check; 10,588 WGS samples passed. Germline variants were called per-sample as GVCFs with GATK HaplotypeCaller v4.0.4.0, combined with GATK CombineGVCFs and joint-called with GATK GenotypeGVCFs (v4.0.4.0), ignoring low-complexity regions. Variants were filtered -with GATK VariantFiltration, restricted to length ≤ 10 bp and a -maximum of 10 alt alleles, multi-allelic sites were split, and the +with GATK VariantFiltration and restricted to length ≤ 10 bp and a +maximum of 10 alt alleles. Multi-allelic sites were split, and the final callset was annotated with SnpEff v4.3. See Cao et al. 2020 (in References below) for the full pipeline.

The bgzipped sites-only VCF (mbiobank_ChinaMAP.phase1.vcf.gz) was downloaded from the ChinaMAP / mBiobank distribution site (http://chinamapwgs.mbiobank.com/download/), -renamed locally to chinamap.vcf.gz and tabix-indexed. No -coordinate liftover or reformatting was needed: the upstream file is -already on GRCh38 with chr-prefixed chromosome names, autosomes only, -and ships standard AC, AF and AN INFO -fields. The pipeline is recorded in the +renamed locally to chinamap.vcf.gz and tabix-indexed. We did +not need to lift over coordinates or reformat the file: the upstream +file is already on GRCh38 with chr-prefixed chromosome names, +autosomes only, and ships standard AC, AF and +AN INFO fields. The pipeline is recorded in the makeDoc file of the track.

Caveats

Only autosomes (chr1-22) are present; chrX, chrY and chrM are not in the ChinaMAP phase 1 release. The 1KGP frequency fields (1KGP_AF, 1KGP_EAS_AF, 1KGP_AMR_AF, 1KGP_AFR_AF, 1KGP_EUR_AF, 1KGP_SAS_AF) are carried over verbatim from the ChinaMAP VCF and only populate the small fraction of ChinaMAP sites that are also catalogued in the matched 1KGP release.

@@ -75,31 +75,31 @@ The ChinaMAP Limitations on Use (see the ChinaMAP download page) prohibit redistribution of the data, so the ChinaMAP VCF is not available from the UCSC Table Browser, Data Integrator, REST API or the public download server. The track can be browsed interactively in the Genome Browser; for bulk access please register with the ChinaMAP project at http://chinamapwgs.mbiobank.com/ and download the original VCF directly from them.

Credits

Thanks to the ChinaMAP participants and to the National Clinical Research Center for Metabolic Diseases (Shanghai Jiao Tong -University School of Medicine, Ruijin Hospital) and BGI Genomics for -producing and releasing the ChinaMAP phase 1 sites VCF. +University School of Medicine, Ruijin Hospital) and BGI Genomics, who +produced and released the ChinaMAP phase 1 sites VCF.

References

Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, Xu Y, Du P, Wang T, Hu R et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res. 2020 Sep;30(9):717-731. PMID: 32355288; PMC: PMC7609296