89491842e0ec6b2250aa6f6dc2c83c294930e6d6 max Sun May 17 14:38:40 2026 -0700 Add ChinaMAP phase 1 variant frequencies subtrack on hg38 ChinaMAP (Cao et al. 2020, Cell Res, PMID 32355288) is a deep-WGS cohort of 10,588 Chinese individuals across 27 provinces and 8 ethnic groups, with 147.4 M autosomal variants (136.7 M SNPs + 10.7 M short indels). The released VCF is already on GRCh38 with chr-prefixed chromosomes and ships AC/AF/AN plus matched 1KGP_* INFO fields, so it is served directly via vcfTabix. The ChinaMAP Limitations on Use prohibit redistribution, so the gbdb directory is _chinamap (hidden from hgdownload) and the trackDb stanza has tableBrowser off. Registered in scripts/varFreqs/databases.tsv so the next varFreqsAll combined rebuild picks it up; filter UI is deliberately not added yet (WBBC/TPMI precedent). , refs #36642 diff --git src/hg/makeDb/trackDb/human/chinamap.html src/hg/makeDb/trackDb/human/chinamap.html new file mode 100644 index 00000000000..4909748f4ef --- /dev/null +++ src/hg/makeDb/trackDb/human/chinamap.html @@ -0,0 +1,105 @@ +
+This track shows allele frequencies for 147.4 million variants (136.7 +million SNPs and 10.7 million short indels, autosomes only) from +10,588 Chinese individuals deep-whole-genome-sequenced at a mean depth +of about 40x by the China Metabolic Analytics Project (ChinaMAP). +Participants come from three large Chinese cohort studies (the China +Noncommunicable Disease Surveillance, the REACTION study and the +Community-based Cardiovascular Risk During Urbanization in Shanghai +study) and span 27 provinces of China and eight ethnic populations +(Han, Hui, Manchu, Miao, Mongolian, Yi, Tibetan and Zhuang). For +each variant the track records the cohort allele count, allele number +and allele frequency. The original release also ships the matched 1000 +Genomes Project (1KGP) allele frequencies (global, EAS, AMR, AFR, EUR +and SAS) as INFO fields, which are kept verbatim in the VCF. +
+ ++The track uses the standard UCSC VCF display. Hovering a variant +shows the cohort allele frequency and count, the total number of +called alleles, and the 1KGP frequencies that the ChinaMAP release +ships alongside each site. +
+ ++DNA from each participant was prepared with the QIAGEN DNeasy +Blood & Tissue Kit, sheared by Covaris, ligated to BGISEQ-500 +adapters and rolling-circle amplified into DNA nanoballs for +100 bp paired-end sequencing on the BGISEQ-500 platform at BGI +Genomics. Reads were quality-filtered with SOAPnuke v1.5.6, aligned +to GRCh38 (GENCODE release) with BWA-MEM v0.7.16a, coordinate-sorted +with Picard SortSam v2.13.2, and duplicate-marked and base-quality +recalibrated with GATK v4.beta.4. Samples were required to pass six +QC criteria (base quality Q30 > 80%, mean depth > 30x, mapping +rate ≥ 95%, mismatch rate < 1%, duplicate rate < 10% and +20x coverage > 80%) and a 21-SNP mass spectrometric fingerprint +check; 10,588 WGS samples passed. Germline variants were called +per-sample as GVCFs with GATK HaplotypeCaller v4.0.4.0, combined +with GATK CombineGVCFs and joint-called with GATK GenotypeGVCFs +(v4.0.4.0), ignoring low-complexity regions. Variants were filtered +with GATK VariantFiltration, restricted to length ≤ 10 bp and a +maximum of 10 alt alleles, multi-allelic sites were split, and the +final callset was annotated with SnpEff v4.3. See Cao et al. +2020 (in References below) for the full pipeline. +
++The bgzipped sites-only VCF +(mbiobank_ChinaMAP.phase1.vcf.gz) was downloaded from the +ChinaMAP / mBiobank distribution site +(http://chinamapwgs.mbiobank.com/download/), +renamed locally to chinamap.vcf.gz and tabix-indexed. No +coordinate liftover or reformatting was needed: the upstream file is +already on GRCh38 with chr-prefixed chromosome names, autosomes only, +and ships standard AC, AF and AN INFO +fields. The pipeline is recorded in the +makeDoc +file of the track. +
+ ++Only autosomes (chr1-22) are present; chrX, chrY and chrM are not +in the ChinaMAP phase 1 release. The 1KGP frequency fields +(1KGP_AF, 1KGP_EAS_AF, 1KGP_AMR_AF, +1KGP_AFR_AF, 1KGP_EUR_AF, 1KGP_SAS_AF) are +carried over verbatim from the ChinaMAP VCF and only populate the +small fraction of ChinaMAP sites that are also catalogued in the +matched 1KGP release. +
+ ++The ChinaMAP Limitations on Use (see the +ChinaMAP +download page) prohibit redistribution of the data, so the +ChinaMAP VCF is not available from the UCSC Table Browser, Data +Integrator, REST API or the public download server. The track can be +browsed interactively in the Genome Browser; for bulk access please +register with the ChinaMAP project at +http://chinamapwgs.mbiobank.com/ +and download the original VCF directly from them. +
+ ++Thanks to the ChinaMAP participants and to the National Clinical +Research Center for Metabolic Diseases (Shanghai Jiao Tong +University School of Medicine, Ruijin Hospital) and BGI Genomics for +producing and releasing the ChinaMAP phase 1 sites VCF. +
+ ++Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, Xu Y, Du P, Wang T, Hu R et al. + +The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. +Cell Res. 2020 Sep;30(9):717-731. +PMID: 32355288; PMC: PMC7609296 +
+