9bfd58221b1539193cb7f0a317b4e959c1c7e49a max Thu May 21 01:00:45 2026 -0700 varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642 Co-Authored-By: Claude Sonnet 4.6 diff --git src/hg/makeDb/trackDb/human/tpmi.html src/hg/makeDb/trackDb/human/tpmi.html new file mode 100644 index 00000000000..65fe574df79 --- /dev/null +++ src/hg/makeDb/trackDb/human/tpmi.html @@ -0,0 +1,135 @@ +

Description

+

+This track shows allele frequencies for 672,843 variants from the +Taiwan +Precision Medicine Initiative (TPMI), a large cohort of people of +Han Chinese ancestry recruited in Taiwan. The frequencies come from the +publicly released annotation of the Axiom TPM1 SNP array, the +population-optimized chip that TPMI used to genotype 165,596 of its +participants. Variants are positioned on hg38 (GRCh38). About 80% of +the sites are biallelic SNVs; the remainder are short insertions or +deletions and a small number of multi-nucleotide variants. +

+ +

+TPMI is one of the largest non-European cohorts in genetic research, +with 565,390 enrolled participants as of the v37 data freeze. Han +Chinese people are nearly 20% of the world's population but are +under-represented in genetic studies. A cohort of this size is useful +for population-specific allele frequency reference, GWAS replication, +and clinical variant interpretation in East Asian populations. +

+ +

Display

+

+The track uses the standard UCSC VCF display. Hovering a variant shows +the cohort allele frequency (AF), the derived allele count +(AC), the assumed total allele number (AN), the TPMI +NGS concordance score from the chip annotation, and the Affymetrix +probe set ID. +

+ +

Methods

+

+TPMI participants were recruited from 16 partner medical centres (33 +affiliated hospitals) across Taiwan, who together serve about 40% of the +Taiwanese population. Each participant donated a blood sample and +consented to access of their electronic medical records. Genomic DNA +was extracted with the QIAsymphony DSP DNA Mini Kit and genotyped on +two custom Axiom arrays (TPMv1 and TPMv2; Thermo Fisher Scientific) +designed to optimally tag Han Chinese variation. Genotype calling was +done with Applied Biosystems Array Power Tools using the Best Practices +Workflow at the National Center for Genome Medicine, Academia Sinica. +After QC, the TPMv1 array had been used on 165,596 participants and +TPMv2 on 321,360 (486,956 with both genotype and EMR). The cohort has +broad coverage of Han Chinese subgroups as well as Indigenous Taiwanese +populations. See the TPMI Nature paper (in References) for sample +recruitment, calling, imputation and quality control details. +

+

+The source data for this track is the Axiom TPM1 chip annotation file +TPM1_Array_Annotation.csv distributed by Thermo Fisher +Scientific (create date 2022-06-01), which embeds the TPMI cohort allele +frequency in a column named Allele Frequency alongside the +probe-design metadata. The chip annotation declares hg38 coordinates, +so no liftover was needed. We converted the CSV to VCF with the script +tpmiToVcf.py: +rows on alt or random contigs were dropped, rows flagged as TPMI +blacklist or with no reported allele frequency were dropped, and indels +encoded with - for the empty allele were rewritten in +VCF-compatible form by prepending an anchor base read from the hg38 +reference with twoBitToFa. The resulting VCF was sorted and +indexed with bcftools sort and tabix. The full +recipe is in the +makeDoc +file. +

+

+The source publishes only allele frequencies, not allele counts. To +make the track usable in count-based aggregate views, we derived +AC = round(AF * AN) with AN = 100,000. This AN value +was chosen because every reported AF in the file is an exact integer +multiple of 1/100,000, so the source data was rounded to that +precision. The TPMv1 chip was used on 165,596 participants (~330,000 +chromosomes for autosomes), so the true AN may be roughly three times +larger; the AC values published here are therefore proportional to the +true counts but not equal to them. The assumption is documented in the +VCF header. +

+ +

Caveats

+

+Of 752,921 rows in the source CSV, 672,843 were emitted to the VCF. +The skipped rows are: 80,034 rows with no reported allele frequency +(the chip carries probe annotations for some sites that the TPMI cohort +did not type or quality-filter, including the entire chrY content of +the chip); 36 rows on alt or random contigs; 8 rows with no defined +reference allele in the source. About 61,000 rows are also flagged as +TPMI blacklist; none of those have a published allele frequency, so +they are filtered out by the no-AF rule. +

+

+The TPM2 chip annotation (~755,000 SNPs) is not represented in this +track because its public annotation does not embed a TPMI cohort allele +frequency column. It only carries the 1000 Genomes / HapMap CEU/CHB/JPT/YRI +frequencies that ship with all Affymetrix Axiom chips, which are already +available through dbSNP. About 234,255 SNPs are shared between TPM1 and +TPM2, so the TPM1-only track still covers most of the cohort-typed +content. +

+

+The TPMI authors note that allele frequencies on the TPMv1 chip are +reliable for variants with MAF above about 0.1%; rarer sites are +reported but should be interpreted cautiously because SNP arrays have +higher genotyping error at low MAF. +

+ +

Data Access

+

+Due to license restrictions, the data for this track cannot be downloaded from the UCSC +Genome Browser. The Table Browser, Data Integrator, and download server are not available +for this track. +

+

+The original Axiom TPM1 chip annotation CSV is distributed by Thermo Fisher Scientific; +search their support site for "Axiom TPM1 Annotation" to download the matching version +(we used the 2022-06-01 release). +

+ +

Credits

+

+Thanks to the TPMI participants and to the Academia Sinica and Thermo +Fisher Scientific teams that designed and curated the Axiom TPMv1 SNP +array and published the chip annotation file. +

+ +

References

+

+Yang HC, Kwok PY, Li LH, Liu YM, Jong YJ, Lee KY, Wang DW, Tsai MF, Yang JH, Chen CH et al. + +The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies. +Nature. 2025 Dec;648(8092):117-127. +PMID: 41092961; PMC: PMC12675286 +

+