aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/mxbFreq.html src/hg/makeDb/trackDb/human/mxbFreq.html new file mode 100644 index 00000000000..26d93951cb2 --- /dev/null +++ src/hg/makeDb/trackDb/human/mxbFreq.html @@ -0,0 +1,78 @@ +<h2>Description</h2> +<p> +The <a href="https://www.mxbiobank.org/" target="_blank">Mexico Biobank (MXB)</a> project +genotyped 6,011 individuals sampled across all 32 states of Mexico during the 2000 National +Health Survey (ENSA 2000) conducted by the National Institute of Public Health (INSP). +Genotyping was performed with the Illumina Multi-Ethnic Global Array (MEGA, ~1.8M SNPs), +optimized for admixed populations and enriched for ancestry-informative and medically relevant +variants. Only autosomal, biallelic SNPs passing quality control are included. Samples were +selected from 898 recruitment sites, with prioritization of indigenous language speakers. +</p> + +<p> +This track shows allele frequencies computed from the phased genotypes. The full +phased genotype data with haplotype clustering display is available in the +<a href="hgTrackUi?g=mexbb">Mexico Biobank track</a> under Phased Variants. +Frequencies can also be plotted onto a map on the +<a href="https://morenolab.shinyapps.io/mexvar/" target="_blank">MexVar platform</a>. +The hg38 data was lifted from hg19 by UCSC (see below). +</p> + +<h2>Data Access</h2> +<p> +We are not allowed to redistribute the VCF file. +Allele frequencies by geographical state and ancestry are available via +the <a href="https://morenolab.shinyapps.io/mexvar/" target="_blank">MexVar platform</a>. +Raw genotype data are available under controlled access at the +EGA (Study: EGAS00001005797; Dataset: EGAD00010002361). For the VCFs, email +andres.moreno@cinvestav.mx to obtain the data. +</p> + +<h2>Methods</h2> +<p> +Data processing included GenomeStudio → PLINK conversion, strand alignment, removal +of duplicates, update of map positions using dbSNP Build 151 and low-quality +variants/individuals, and relatedness filtering. +At UCSC, the phased VCF was lifted from hg19 to hg38 with CrossMap, then allele counts +(AC, AF, AN) were computed using bcftools fill-tags and genotypes were stripped to produce +a sites-only frequency VCF. +</p> + +<p> +We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target=_blank>makeDoc file</a> of the track. +For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target=_blank>Github</a>. +</p> + +<h2>Credits</h2> +<p> +We thank the Center for Research and Advanced Studies (Cinvestav) of Mexico for +generating and providing the frequency data, the National Institute of Medical +Sciences and Nutrition (INCMNSZ) for DNA extraction, and the Ministry of Health +together with the National Institute of Public Health (INSP) for the design and +implementation of the National Health Survey 2000 (ENSA 2000). We also thank +the ENSA-Genomics Consortium for their contributions to sample collection and +data processing that made possible the construction of the MXB genomic resource. +</p> + +<h2>References</h2> +<p> +Barberena-Jonas C, Medina-Muñoz SG, Cedillo-Castelán V, Sepúlveda-Morales T, +Gonzaga-Jáuregui C, ENSA Genomics Consortium, García-García L, Ioannidis AG, +Moreno-Estrada A. +<a href="https://doi.org/10.1038/s41591-025-04100-z" target="_blank"> +Clinical genetic variation across Hispanic populations in the Mexican Biobank</a>. +<em>Nat Med</em>. 2026 Jan 21;. +DOI: <a href="https://doi.org/10.1038/s41591-025-04100-z" +target="_blank">10.1038/s41591-025-04100-z</a>; PMID: <a +href="https://www.ncbi.nlm.nih.gov/pubmed/41566040" target="_blank">41566040</a> +</p> + +<p> +Sohail M, Palma-Martínez MJ, Chong AY, Quinto-Corés CD, Barberena-Jonas C, Medina-Muñoz SG, +Ragsdale A, Delgado-Sánchez G, Cruz-Hervert LP, Ferreyra-Reyes L <em>et al</em>. +<a href="https://doi.org/10.1038/s41586-023-06560-0" target="_blank"> +Mexican Biobank advances population and medical genomics of diverse ancestries</a>. +<em>Nature</em>. 2023 Oct;622(7984):775-783. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37821706" target="_blank">37821706</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10600006/" target="_blank">PMC10600006</a> +</p>