676b58d841804f049f720cc9ba3fddec216dae61
max
  Tue Dec 2 06:22:46 2025 -0800
adding saudi arabia to variant frequencies track

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index 8dc145fd00a..f47b5d89e6b 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -114,30 +114,37 @@
         1,896 whole genome sequencing and 3,409 whole exome sequencing data from healthy individuals
         of Korean ethnicity.
         Most of the samples were originated from normal tissue of cancer
         patients (40.16 %), healthy parents of rare disease patients (28.4 %),
         or healthy volunteers (31.44 %). Japanese ancestry is broken down
         in the INFO field. Coverage 100x for WES, 30x for WGS.
         For details see (Lee et al, Exp Mol Med 2022).</li>
     <li>
         <b><a href="https://www.npm.sg/"
         target="_blank">NPM Singapore</a></b>:
         9,770 whole genomes, mostly of Chinese, Indian and Malay ancestry. 
         A minimum allele count cutoff of &gt; 5 was applied.
         Data is available for download from the CHORUS browser, see "Data access" below.
         For details see (Wong et al, Nat Genetics 2023). CNV data is also available there.
     </li>
+    <li>
+        <b><a href="https://www.vision2030.gov.sa/en/explore/projects/the-saudi-genome-program"
+        target="_blank">Saudi Genome Program</a></b>:
+        Variant frequencies from 302 whole genomes at 30x coverage, on Saudi Genome Program Samples.
+        The genotyping data and imputations from 3,352 individuals do not seem to be available publicly.
+        For details see (Malomane et al 2025). 
+    </li>
 </ul>
 </p>
 
 <h2>Display Conventions</h2>
 
 <p>Most tracks only show the variant and allele frequencies on mouseover or clicks.
 When zoomed in, tracks display alleles with base-specific coloring. Homozygote
 data are shown as one letter, while heterozygotes will be displayed with both
 letters.
 </p>
 
 <p>
 For <b>NCBI ALFA:</b> This track has no single VCF with INFO fields, but uses multiple subtracks
 instead, one per ancestry.
 </p>
@@ -224,30 +231,35 @@
 <p><b>NPM Singapore:</b> Whole Genome Sequencing (WGS) data processing followed
 GATK4 best practices. GATK4 germline variant analysis workflow written in WDL
 was adapted to use Nextflow and deployed at the National Supercomputing Centre,
 Singapore (NSCC). In short, WGS reads were aligned against GRCh38 using the
 BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single
 sample gVCFs. The gVCF files were joint-called then loaded in Hail, an
 open-source python-based data analysis library suited to work with
 population-scale with genomic data collections. Low-quality WGS libraries and
 low-quality variants were removed.  QC-ed variants were functionally annotated
 using Ensembl Variant Effect Predictor (VEP) (version 95). Functional
 annotations for variant impacting protein-coding were also complemented with
 information on the potential alteration to their cognate protein's 3D structure
 and drug binding ability.
 </p>
 
+<p><b>Saudi Genome Program:</b> Data was downloaded 
+from <a href="https://figshare.com/articles/dataset/A_list_of_Saudi_Arabian_variants_and_their_allele_frequencies/28059686/1?file=51297884">Figshare</a>,
+and converted to VCF.
+</p>
+
 <h2>Credits</h2>
 <p>
 <b>MXB:</b> We thank the Center for Research and Advanced Studies (Cinvestav) of Mexico for
 generating and providing the frequency data, the National Institute of Medical
 Sciences and Nutrition (INCMNSZ) for DNA extraction, and the Ministry of Health
 together with the National Institute of Public Health (INSP) for the design and
 implementation of the National Health Survey 2000 (ENSA 2000). We also thank
 the ENSA-Genomics Consortium for their contributions to sample collection and
 data processing that made possible the construction of the MXB genomic
 resource.
 </p>
 <p>
 <b>MCPS:</b> Data produced by Regeneron RGC and collaborators, which are the
 University of Oxford, Universidad Nacional Aut&oacute;noma de M&eacute;xico (UNAM) and
 National Institute of Genomic Medicine in Mexico.
@@ -434,15 +446,26 @@
 FinnGen provides genetic insights from a well-phenotyped isolated population</a>.
 <em>Nature</em>. 2023 Jan;613(7944):508-518.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36653562" target="_blank">36653562</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849126/" target="_blank">PMC9849126</a>
 </p>
 
 <p>
 Wong E, Bertin N, Hebrard M, Tirado-Magallanes R, Bellis C, Lim WK, Chua CY, Tong PML, Chua R, Mak K
 <em>et al</em>.
 <a href="https://doi.org/10.1038/s41588-022-01274-x" target="_blank">
 The Singapore National Precision Medicine Strategy</a>.
 <em>Nat Genet</em>. 2023 Feb;55(2):178-186.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36658435" target="_blank">36658435</a>
 </p>
 
+
+
+<p>
+Malomane DK, Williams MP, Huber CD, Mangul S, Abedalthagafi M, Chiang CWK.
+<a href="https://doi.org/10.1101/2025.01.10.632500" target="_blank">
+Patterns of population structure and genetic variation within the Saudi Arabian population</a>.
+<em>bioRxiv</em>. 2025 Jan 13;.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39868174" target="_blank">39868174</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11761371/" target="_blank">PMC11761371</a>
+</p>
+