3f88af395a070d9e1db62a0e95b8c0650dc958dc max Thu Feb 12 00:32:25 2026 -0800 doc updates for mrgb varFreqs track diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index c86b4361e97..7ade4a73a09 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -414,39 +414,37 @@ sequencing chemistry. Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM v0.7.12, then sorted and indexed with samtools v0.1.19 and assessed with qualimap v2.2.20; per-sample alignments from multiple lanes and flow cells were merged using Picard MergeSamFiles v1.120. Processing followed GATK best practices with GATK v3.3, including indel realignment (RealignerTargetCreator, IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and base quality score recalibration (BaseRecalibrator), producing one finalized BAM per sample. Per-sample gVCFs were generated with GATK HaplotypeCaller v3.3 using reference files from the GATK v2.8 resource bundle, with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples was performed by merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort genotyping with GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using VariantRecalibrator and ApplyRecalibration. <BR>At UCSC, the hg38 VCF was downloaded from <a target="_blank" href="https://swefreq.nbis.se/dataset/SweGen/download">SweFreq</a>. </p> -<p><b>Australia MGRB:</b> MGRB samples underwent whole-genome sequencing on +<p><b>Australia MGRB:</b> The 4,011 MGRB samples underwent whole-genome sequencing on Illumina HiSeq X instruments at KCCG under ISO 15189 accreditation, using -paired-end TruSeq DNA Nano libraries sequenced one lane per sample. Reads were -aligned to human reference genome Build 37 (GRCh37) and processed following -GATK best practices, including indel realignment and base quality score -recalibration, with variant calling performed using GATK HaplotypeCaller to -generate g.vcf files. Data processing utilized the Genome One Discovery -pipeline and analysis was conducted using the Hail framework. -</p> +paired-end TruSeq DNA Nano libraries sequenced one lane per sample. Alignment +of sequence reads to the hg38 reference genome assembly was with bwa +0.7.15-r1140. Variants were called following the Genome Analysis Toolkit +(GATK) best practices procedure using GATK 4.1.4.0. A sites-only VCF with only +passing variants (FILTER=PASS) was made with bcftools 1.20.</p> <p><b>NPM Singapore:</b> Whole Genome Sequencing (WGS) data processing followed GATK4 best practices. GATK4 germline variant analysis workflow written in WDL was adapted to use Nextflow and deployed at the National Supercomputing Centre, Singapore (NSCC). In short, WGS reads were aligned against GRCh38 using the BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single sample gVCFs. The gVCF files were joint-called then loaded in Hail, an open-source python-based data analysis library suited to work with population-scale with genomic data collections. Low-quality WGS libraries and low-quality variants were removed. QC-ed variants were functionally annotated using Ensembl Variant Effect Predictor (VEP) (version 95). Functional annotations for variant impacting protein-coding were also complemented with information on the potential alteration to their cognate protein's 3D structure and drug binding ability. </p>