src/hg/makeDb/trackDb/human/varFreqs.html 3f88af395a070d9e1db62a0e95b8c0650dc958dc

3f88af395a070d9e1db62a0e95b8c0650dc958dc
max
  Thu Feb 12 00:32:25 2026 -0800
doc updates for mrgb varFreqs track

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index c86b4361e97..7ade4a73a09 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -414,39 +414,37 @@
 sequencing chemistry. Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM
 v0.7.12, then sorted and indexed with samtools v0.1.19 and assessed with qualimap v2.2.20;
 per-sample alignments from multiple lanes and flow cells were merged using Picard MergeSamFiles
 v1.120. Processing followed GATK best practices with GATK v3.3, including indel realignment
 (RealignerTargetCreator, IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and
 base quality score recalibration (BaseRecalibrator), producing one finalized BAM per sample.
 Per-sample gVCFs were generated with GATK HaplotypeCaller v3.3 using reference files from the GATK
 v2.8 resource bundle, with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples
 was performed by merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort
 genotyping with GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using
 VariantRecalibrator and ApplyRecalibration.
 <BR>At UCSC, the hg38 VCF was downloaded 
 from <a target="_blank" href="https://swefreq.nbis.se/dataset/SweGen/download">SweFreq</a>.
 </p>
 
-<p><b>Australia MGRB:</b> MGRB samples underwent whole-genome sequencing on
+<p><b>Australia MGRB:</b> The 4,011 MGRB samples underwent whole-genome sequencing on
 Illumina HiSeq X instruments at KCCG under ISO 15189 accreditation, using
-paired-end TruSeq DNA Nano libraries sequenced one lane per sample. Reads were
-aligned to human reference genome Build 37 (GRCh37) and processed following
-GATK best practices, including indel realignment and base quality score
-recalibration, with variant calling performed using GATK HaplotypeCaller to
-generate g.vcf files. Data processing utilized the Genome One Discovery
-pipeline and analysis was conducted using the Hail framework.
-</p>
+paired-end TruSeq DNA Nano libraries sequenced one lane per sample. Alignment
+of sequence reads to the hg38 reference genome assembly was with bwa
+0.7.15-r1140.  Variants were called following the Genome Analysis Toolkit
+(GATK) best practices procedure using GATK 4.1.4.0. A sites-only VCF with only
+passing variants (FILTER=PASS) was made with bcftools 1.20.</p>
 
 <p><b>NPM Singapore:</b> Whole Genome Sequencing (WGS) data processing followed
 GATK4 best practices. GATK4 germline variant analysis workflow written in WDL
 was adapted to use Nextflow and deployed at the National Supercomputing Centre,
 Singapore (NSCC). In short, WGS reads were aligned against GRCh38 using the
 BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single
 sample gVCFs. The gVCF files were joint-called then loaded in Hail, an
 open-source python-based data analysis library suited to work with
 population-scale with genomic data collections. Low-quality WGS libraries and
 low-quality variants were removed.  QC-ed variants were functionally annotated
 using Ensembl Variant Effect Predictor (VEP) (version 95). Functional
 annotations for variant impacting protein-coding were also complemented with
 information on the potential alteration to their cognate protein's 3D structure
 and drug binding ability.
 </p>