9bfd58221b1539193cb7f0a317b4e959c1c7e49a max Thu May 21 01:00:45 2026 -0700 varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642 Co-Authored-By: Claude Sonnet 4.6 diff --git src/hg/makeDb/trackDb/human/topmed.html src/hg/makeDb/trackDb/human/topmed.html index d67f8a75e61..cf4f6ebda7c 100644 --- src/hg/makeDb/trackDb/human/topmed.html +++ src/hg/makeDb/trackDb/human/topmed.html @@ -1,60 +1,60 @@

Description

NHLBI TOPMed (Trans-Omics for Precision Medicine) is a program launched by the U.S. National Heart, Lung, and Blood Institute that integrates whole-genome sequencing with molecular, clinical, and environmental data from large, well-phenotyped cohorts. Its goal is to uncover the biological mechanisms underlying heart, lung, blood, and sleep disorders to advance precision medicine and improve population health. Freeze 10 contains 868,581,653 variants from 150,899 whole genomes.

Data Access

Due to license restrictions, the data for this track cannot be downloaded from the UCSC Genome Browser. The Table Browser, Data Integrator, and download server are not available for this track.

VCFs with summarized allele frequencies are available from the TOPMED BRAVO website. They require a login. The VCFs were downloaded from BRAVO.

Methods

TOPMed whole genome sequencing was performed at multiple NHLBI-funded sequencing centers using PCR-free library preparation with 150 bp paired-end reads on Illumina short-read platforms, targeting ≥30x mean coverage. Reads were aligned to the GRCh38 reference genome (hs38DH, including decoy sequences) using BWA-MEM, followed by duplicate marking with Picard MarkDuplicates and base quality score recalibration (BQSR) with GATK. Variant calling was performed using the TOPMed GotCloud pipeline (developed at the Center for Statistical Genetics, University of Michigan), comprising: (1) per-sample candidate variant detection with vt discover2 and normalization with vt normalize; (2) cross-sample variant site consolidation using cramore vcf-merge-candidate-variants; (3) joint genotyping across all samples; and (4) variant filtering using a Support Vector Machine (SVM) classifier (libsvm) trained on positive labels derived from HapMap 3.3 and 1000 Genomes Omni2.5 array sites, and negative labels derived from Mendelian-inconsistent variants identified within the cohort's pedigree structure using vt milk-filter. Sample-level quality control included estimation of DNA contamination, genetic ancestry, and biological sex using cramore cram-verify-bam (verifyBamID2) and relative X/Y chromosomal depth. Full methods for TOPMed freeze 10 are available on the TOPMed WGS Methods page.

-We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +Documentation on how all source files of the varFreqs track were converted is in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.

References

Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021 Feb;590(7845):290-299. PMID: 33568819; PMC: PMC7875770