src/hg/makeDb/trackDb/human/varFreqs.html 6413446eff01d749cfd23e206d4d3da059902759

6413446eff01d749cfd23e206d4d3da059902759
max
  Fri Jan 30 04:49:02 2026 -0800
fixing a bug in the gnomad converter and track docs update for the varFreqs track, refs #36642

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index 6dece34ca07..1a73429546f 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -60,33 +60,33 @@
         are also available publicly on the <a href="https://genomes.sfari.org/" target=_blank>SFARI Genome Browser</a>. 
        See (SPARK et al, Neuron 2018) for details or the methods below on this page.
     </li>
 
     <li>
         <b><a href="https://www.genomeasia100k.org/"
         target="_blank">GenomeAsia Pilot (GAsP)</a></b>:
         Whole-genome sequencing data of 1,739 individuals from 219 population groups across Asia.
         See (GenomeAsia Consortium, Nature 2019) for details.
     </li>
 
     <li>
         <b><a href="https://www.genomeasia100k.org/"
         target="_blank">Australia MRGB</a></b>:
         The Australian Medical Genome Reference Bank collected
-        whole-genome sequencing data of 4,011 healthy elderly individuals, to make sure 
-        that the dataset is depleted of damaging genetic variants.
-        Age and sex summary graphs are available from 
+        whole-genome sequencing data of 4,011 healthy elderly individuals who
+        lived >=70 years, to make sure that the dataset is depleted of damaging
+        genetic variants. Age and sex summary graphs are available from 
         <a href="https://sgc.garvan.org.au/initiatives/mgrb/index.html">the MGRB website</a>.
         See (Lacaze Eur J Humn Genet 2019) for details.
     </li>
 
     <li>
         <b><a href="https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/" target="_blank">ALFA</a></b>:
         The NCBI ALlele Frequency Aggregator pipeline computes allele frequencies from
         approved, unrestricted dbGaP studies and makes them publicly available through
         dbSNP. Its goal is to release frequency data from over one million dbGaP
         subjects to aid discoveries involving common and rare variants with biological
         or disease relevance. The R4 release includes 408,709 subjects and allele
         frequencies for 15.5 million rs sites, including nearly one million ClinVar
         variants. We converted the NCBI track hub to VCF format, the data is freely available.
         Genotype and associated individual-level data are accessible through the dbGaP
         <a href="https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login"
@@ -400,30 +400,40 @@
 sequencing chemistry. Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM
 v0.7.12, then sorted and indexed with samtools v0.1.19 and assessed with qualimap v2.2.20;
 per-sample alignments from multiple lanes and flow cells were merged using Picard MergeSamFiles
 v1.120. Processing followed GATK best practices with GATK v3.3, including indel realignment
 (RealignerTargetCreator, IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and
 base quality score recalibration (BaseRecalibrator), producing one finalized BAM per sample.
 Per-sample gVCFs were generated with GATK HaplotypeCaller v3.3 using reference files from the GATK
 v2.8 resource bundle, with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples
 was performed by merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort
 genotyping with GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using
 VariantRecalibrator and ApplyRecalibration.
 <BR>At UCSC, the hg38 VCF was downloaded 
 from <a target="_blank" href="https://swefreq.nbis.se/dataset/SweGen/download">SweFreq</a>.
 </p>
 
+<p><b>Australia MGRB:</b> MGRB samples underwent whole-genome sequencing on
+Illumina HiSeq X instruments at KCCG under ISO 15189 accreditation, using
+paired-end TruSeq DNA Nano libraries sequenced one lane per sample. Reads were
+aligned to human reference genome Build 37 (GRCh37) and processed following
+GATK best practices, including indel realignment and base quality score
+recalibration, with variant calling performed using GATK HaplotypeCaller to
+generate g.vcf files. Data processing utilized the Genome One Discovery
+pipeline and analysis was conducted using the Hail framework.
+</p>
+
 <p><b>NPM Singapore:</b> Whole Genome Sequencing (WGS) data processing followed
 GATK4 best practices. GATK4 germline variant analysis workflow written in WDL
 was adapted to use Nextflow and deployed at the National Supercomputing Centre,
 Singapore (NSCC). In short, WGS reads were aligned against GRCh38 using the
 BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single
 sample gVCFs. The gVCF files were joint-called then loaded in Hail, an
 open-source python-based data analysis library suited to work with
 population-scale with genomic data collections. Low-quality WGS libraries and
 low-quality variants were removed.  QC-ed variants were functionally annotated
 using Ensembl Variant Effect Predictor (VEP) (version 95). Functional
 annotations for variant impacting protein-coding were also complemented with
 information on the potential alteration to their cognate protein's 3D structure
 and drug binding ability.
 </p>
 
@@ -670,15 +680,28 @@
 population</a>.
 <em>Eur J Hum Genet</em>. 2017 Nov;25(11):1253-1260.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28832569" target="_blank">28832569</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765326/" target="_blank">PMC5765326</a>
 </p>
 
 <p>
 SPARK Consortium. Electronic address: pfeliciano@simonsfoundation.org, SPARK Consortium.
 <a href="https://linkinghub.elsevier.com/retrieve/pii/S0896-6273(18)30018-7" target="_blank">
 SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research</a>.
 <em>Neuron</em>. 2018 Feb 7;97(3):488-493.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/29420931" target="_blank">29420931</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7444276/" target="_blank">PMC7444276</a>
 </p>
 
+
+
+<p>
+Lacaze P, Pinese M, Kaplan W, Stone A, Brion MJ, Woods RL, McNamara M, McNeil JJ, Dinger ME, Thomas
+DM.
+<a href="https://doi.org/10.1038/s41431-018-0279-z" target="_blank">
+The Medical Genome Reference Bank: a whole-genome data resource of 4000 healthy elderly individuals.
+Rationale and cohort design</a>.
+<em>Eur J Hum Genet</em>. 2019 Feb;27(2):308-316.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30353151" target="_blank">30353151</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6336775/" target="_blank">PMC6336775</a>
+</p>
+