aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/topmed.html src/hg/makeDb/trackDb/human/topmed.html new file mode 100644 index 00000000000..4d0fcf3d3b5 --- /dev/null +++ src/hg/makeDb/trackDb/human/topmed.html @@ -0,0 +1,53 @@ +

Description

+

+NHLBI TOPMed (Trans-Omics for Precision +Medicine) is a program launched by the U.S. National Heart, Lung, and Blood Institute that +integrates whole-genome sequencing with molecular, clinical, and environmental data from large, +well-phenotyped cohorts. Its goal is to uncover the biological mechanisms underlying heart, lung, +blood, and sleep disorders to advance precision medicine and improve population health. Freeze 10 +contains 868,581,653 variants from 150,899 whole genomes. +

+ +

Data Access

+

+The data can be explored interactively with the +Table Browser or the +Data Integrator. +For programmatic access, our REST API can be used; the +track name is topmed. +For bulk download, the VCF file can be obtained from +our download server. +

+

+VCFs with summarized allele frequencies are also available from +the TOPMED BRAVO website. They require a +login. The VCFs were downloaded from +BRAVO. +

+ +

Methods

+

+TOPMed whole genome sequencing was performed at multiple NHLBI-funded sequencing centers +using PCR-free library preparation with 150 bp paired-end reads on Illumina short-read +platforms, targeting ≥30x mean coverage. Reads were aligned to the GRCh38 reference genome +(hs38DH, including decoy sequences) using BWA-MEM, followed by duplicate marking with +Picard MarkDuplicates and base quality score recalibration (BQSR) with GATK. Variant calling +was performed using the TOPMed GotCloud pipeline (developed at the Center for Statistical +Genetics, University of Michigan), comprising: (1) per-sample candidate variant detection with +vt discover2 and normalization with vt normalize; (2) cross-sample variant site +consolidation using cramore vcf-merge-candidate-variants; (3) joint genotyping across all +samples; and (4) variant filtering using a Support Vector Machine (SVM) classifier +(libsvm) trained on positive labels derived from HapMap 3.3 and 1000 Genomes Omni2.5 +array sites, and negative labels derived from Mendelian-inconsistent variants identified +within the cohort's pedigree structure using vt milk-filter. Sample-level quality +control included estimation of DNA contamination, genetic ancestry, and biological sex +using cramore cram-verify-bam (verifyBamID2) and relative X/Y chromosomal depth. Full +methods for TOPMed freeze 10 are available on the +TOPMed WGS Methods page. +

+ +

+We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +