9bfd58221b1539193cb7f0a317b4e959c1c7e49a
max
Thu May 21 01:00:45 2026 -0700
varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642
Co-Authored-By: Claude Sonnet 4.6
-The track uses the standard UCSC VCF display. Hovering a variant -shows the cohort allele frequency and count, the total number of -called alleles, and the 1KGP frequencies that the ChinaMAP release -ships alongside each site. +The track uses the standard UCSC VCF display. When you hover over a +variant, the popup shows the cohort allele frequency and count, the +total number of called alleles, and the 1KGP frequencies that the +ChinaMAP release ships alongside each site.
DNA from each participant was prepared with the QIAGEN DNeasy Blood & Tissue Kit, sheared by Covaris, ligated to BGISEQ-500 adapters and rolling-circle amplified into DNA nanoballs for 100 bp paired-end sequencing on the BGISEQ-500 platform at BGI Genomics. Reads were quality-filtered with SOAPnuke v1.5.6, aligned to GRCh38 (GENCODE release) with BWA-MEM v0.7.16a, coordinate-sorted with Picard SortSam v2.13.2, and duplicate-marked and base-quality recalibrated with GATK v4.beta.4. Samples were required to pass six QC criteria (base quality Q30 > 80%, mean depth > 30x, mapping rate ≥ 95%, mismatch rate < 1%, duplicate rate < 10% and 20x coverage > 80%) and a 21-SNP mass spectrometric fingerprint check; 10,588 WGS samples passed. Germline variants were called per-sample as GVCFs with GATK HaplotypeCaller v4.0.4.0, combined with GATK CombineGVCFs and joint-called with GATK GenotypeGVCFs (v4.0.4.0), ignoring low-complexity regions. Variants were filtered -with GATK VariantFiltration, restricted to length ≤ 10 bp and a -maximum of 10 alt alleles, multi-allelic sites were split, and the +with GATK VariantFiltration and restricted to length ≤ 10 bp and a +maximum of 10 alt alleles. Multi-allelic sites were split, and the final callset was annotated with SnpEff v4.3. See Cao et al. 2020 (in References below) for the full pipeline.
The bgzipped sites-only VCF (mbiobank_ChinaMAP.phase1.vcf.gz) was downloaded from the ChinaMAP / mBiobank distribution site (http://chinamapwgs.mbiobank.com/download/), -renamed locally to chinamap.vcf.gz and tabix-indexed. No -coordinate liftover or reformatting was needed: the upstream file is -already on GRCh38 with chr-prefixed chromosome names, autosomes only, -and ships standard AC, AF and AN INFO -fields. The pipeline is recorded in the +renamed locally to chinamap.vcf.gz and tabix-indexed. We did +not need to lift over coordinates or reformat the file: the upstream +file is already on GRCh38 with chr-prefixed chromosome names, +autosomes only, and ships standard AC, AF and +AN INFO fields. The pipeline is recorded in the makeDoc file of the track.
Only autosomes (chr1-22) are present; chrX, chrY and chrM are not in the ChinaMAP phase 1 release. The 1KGP frequency fields (1KGP_AF, 1KGP_EAS_AF, 1KGP_AMR_AF, 1KGP_AFR_AF, 1KGP_EUR_AF, 1KGP_SAS_AF) are carried over verbatim from the ChinaMAP VCF and only populate the small fraction of ChinaMAP sites that are also catalogued in the matched 1KGP release.
@@ -75,31 +75,31 @@ The ChinaMAP Limitations on Use (see the ChinaMAP download page) prohibit redistribution of the data, so the ChinaMAP VCF is not available from the UCSC Table Browser, Data Integrator, REST API or the public download server. The track can be browsed interactively in the Genome Browser; for bulk access please register with the ChinaMAP project at http://chinamapwgs.mbiobank.com/ and download the original VCF directly from them.Thanks to the ChinaMAP participants and to the National Clinical Research Center for Metabolic Diseases (Shanghai Jiao Tong -University School of Medicine, Ruijin Hospital) and BGI Genomics for -producing and releasing the ChinaMAP phase 1 sites VCF. +University School of Medicine, Ruijin Hospital) and BGI Genomics, who +produced and released the ChinaMAP phase 1 sites VCF.
Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, Xu Y, Du P, Wang T, Hu R et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res. 2020 Sep;30(9):717-731. PMID: 32355288; PMC: PMC7609296