ec5c73f4dc3ef4beae16fa1c12b7e5bf872bb73d lrnassar Tue May 5 15:04:39 2026 -0700 varFreqs: fix gaspIndel bigDataUrl after Max's GenomeAsia hg38 lift; add Tishkoff180 to combined-track filter UI; sync databases.tsv with deployed bigBed; minor description-page corrections. refs #36642 GenomeAsia hg38 lift (May 5 2026, by Max): - gaspIndel.bigDataUrl was pointing at the old GRCh37 filename "All.indels.annot.cont_withmaf.vcf.gz" which was renamed to "ga100k.indels.vcf.gz" during the lift; this left the gaspIndel track broken on the sandbox until the trackdb stanza was updated to match. - gasp/gaspIndel dataVersion strings updated from "Pilot 2019 (GRCh37 - to be lifted)" to "Pilot 2019 (lifted to hg38, May 2026)". - databases.tsv: also updated GenomeAsiaIndel path to ga100k.indels.vcf.gz so the next varFreqsAll rebuild reads from the lifted file. Tishkoff180 in varFreqsAll.bb but unfilterable (fresh-eyes audit finding): - Added Tishkoff180 to filterValues.sources and added filterByRange.Tishkoff180AF / Tishkoff180AC entries. - Added Tishkoff180 (and SVatalog) rows to databases.tsv to match the deployed bigBed (which already has those columns). Description-page corrections: - varFreqsAll.html: "20 population databases" -> "25 source databases" (matches actual count); HGDP+1kG bullet "European" -> "Non-Finnish European" to disambiguate from Finnish (gnomAD's nfe). - varFreqs.html: GenomeAsia row in the Available Datasets table updated from 3 to 7 sub-populations (NEA/SEA/SAS plus the previously hidden OCE/AMR/AFR/WER) so the table matches what the data exposes once Max's rebuild populates the new filter columns. - KOVA longLabel: "1.9k WGS+3.5k WES" -> "1.9k WGS+3.4k WES" (3.4k is correct per Lee 2017 and kova.html). diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index 34a4119aca8..dbdb68feb5e 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -1,593 +1,594 @@ <h2>Description</h2> <p> This supertrack collects variant allele frequencies from population-scale sequencing and genotyping projects worldwide, from a total of ~1.7 million genomes/exomes/arrays. The data was not reprocessed in a harmonized way but the variant VCFs were collected from the projects. The goal is to provide a single place to compare how common a variant is across different populations, ancestries, and cohorts, for projects that cannot be recomputed by gnomAD soon. The main <a href="hgTrackUi?g=varFreqsAll">combined track</a> merges all databases into one single summary track, with filters, summed population frequencies and recalculated protein-effect annotations. In addition, there is one subtrack per project with the original VCF data and all the annotations that the project provides. The different projects use different pipelines and sequencing technologies, click any of the projects above or below for a summary of their sample selection, sequencing assay and software pipeline. Many projects do not allow us to distribute the data but we document how the data can be requested and provide all converters.</p> <p> Data from projects that provide haplotype-phased genotypes can also be found elsewhere: 1000 Genomes is also a separate track, and the phased genotypes HGDP, SGDP, HGDP+1000 Genomes and Mexico Biobank can also be found in the "Phased Variants" track. Their VCF versions below show only the isolate frequency per variant. </p> <p>Please contact us (<A HREF="mailto:genome@soe.ucsc.edu">genome@soe.ucsc.edu</A><!-- above address is genome at soe.ucsc.edu -->), if you know a project that we should add. So far, we already requested these: UK Biobank (pending for a year), Regeneron's Million Exomes and Mexico City Studies (request rejected), Taiwan Biobank (pending). </p> <h2>Combined Track (All Databases)</h2> <p> The "All Databases Combined" track merges variants from all individual databases into a single bigBed file with consequence annotations, totaling 1.17 billion variants from ~1.7 million individuals. The track supports filtering by variant type (SNV, insertion, deletion, MNV), predicted consequence (missense, synonymous, stop gained, frameshift, splice, intron, intergenic), source database, allele frequency (overall maximum and per-database), and allele count (total or per-database). This track is either useful in dense mode for getting a quick overview of variant density across all projects, or with filters to find variants present in specific databases or within certain frequency ranges. Note that with the "clone track" feature you can clone this track and have multiple versions, each with different filters activated. You can also use our "Density mode" checkbox on the track configuration page to show a plot with the density of variants passing a filter, one per track clone. </p> <h3>Available Datasets</h3> <table class="stdTbl"> <tr> <th>Database</th> <th>Region</th> <th>N</th> <th>Data Type</th> <th>Cohort</th> <th>Sub-populations</th> <th>Downloadable from UCSC</th> </tr> <tr> <td><a href="hgTrackUi?g=varFreqsAll">All Databases combined</a></td> <td>All below</td> <td>1.7mil</td> <td>WGS/WES/imputed</td> <td></td> <td></td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=allofus">AllOfUs v7</a></td> <td>USA</td> <td>245k</td> <td>WGS</td> <td>General population, diverse</td> <td>African, Indigenous American, East Asian, European, Oceanian, South Asian (<b>local ancestry</b>; see Notes below)</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=topmed">TOPMED Freeze 10</a></td> <td>USA</td> <td>151k</td> <td>WGS</td> <td>Heart, lung, blood, sleep disorder cohorts</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=sfariSparkExomes">SFARI SPARK WES</a></td> <td>USA</td> <td>140k</td> <td>WES</td> <td>Autism families (parents + affected children)</td> <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=sfariSparkWgs">SFARI SPARK WGS</a></td> <td>USA</td> <td>12.5k</td> <td>WGS</td> <td>Autism families (parents + affected children)</td> <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=alfaVcf">NCBI ALFA R4</a></td> <td>USA</td> <td>408k</td> <td>WGS/WES/array mix</td> <td>Aggregated dbGaP studies, mixed phenotypes</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=finngen">FinnGen R12</a></td> <td>Finland</td> <td>500k</td> <td>Imputed (8.5k WGS ref panel)</td> <td>National biobank, ~10% of population</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=swefreq">SweGen</a></td> <td>Sweden</td> <td>1k</td> <td>WGS</td> <td>Cross-section of Swedish population</td> <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=schema">SCHEMA</a></td> <td>Multi-national</td> <td>121k</td> <td>WES</td> <td>Schizophrenia: 24k cases, 97k controls</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=tommo60kjpn">Japan ToMMO 61k</a></td> <td>Japan</td> <td>61k</td> <td>WGS</td> <td>General population</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=mgrb">Australia MGRB</a></td> <td>Australia</td> <td>4k</td> <td>WGS</td> <td>Healthy elderly (age ≥70)</td> <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=gasp">GenomeAsia Pilot</a></td> <td>Asia (219 groups)</td> <td>1.7k</td> <td>WGS</td> <td>Diverse populations across Asia</td> - <td>Northeast Asian, Southeast Asian, South Asian</td> + <td>Northeast Asian, Southeast Asian, South Asian, Oceanian, American, African, + Western European Reference</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=abraom">ABraOM Brazil</a></td> <td>Brazil</td> <td>1.2k</td> <td>WGS</td> <td>Elderly admixed individuals (São Paulo)</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=indigenomes">IndiGenomes</a></td> <td>India</td> <td>1k</td> <td>WGS</td> <td>Healthy individuals</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=kova">KOVA Korea</a></td> <td>Korea</td> <td>5.3k</td> <td>1.9k WGS + 3.4k WES</td> <td>Normal tissue from cancer patients, healthy parents, volunteers</td> <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=npm">NPM Singapore</a></td> <td>Singapore</td> <td>9.8k</td> <td>WGS</td> <td>Chinese, Indian, Malay ancestry</td> <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=saudi">Saudi Genome</a></td> <td>Saudi Arabia</td> <td>302</td> <td>WGS (30x)</td> <td>Saudi population</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=hrc">HRC</a></td> <td>Multi-national</td> <td>~30k</td> <td>Low-coverage WGS (7x)</td> <td>Imputation reference panel (excl. 1000 Genomes)</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=mxbFreq">MXB Mexico Biobank</a></td> <td>Mexico</td> <td>6k</td> <td>Genotyping array</td> <td>Diverse Mexican ancestries, 898 recruitment sites</td> <td>By state, by ancestry</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=sgdpFreq">SGDP</a></td> <td>Global</td> <td>279</td> <td>WGS</td> <td>142 diverse populations worldwide</td> <td>By population</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=gregor">GREGoR R4</a></td> <td>USA</td> <td>3.6k</td> <td>WGS</td> <td>Rare disease families (10.7k participants, 4.4k families)</td> <td>—</td> <td>No</td> </tr> <tr> <td><a href="hgTrackUi?g=hgdp1kFreq">gnomAD HGDP+1kG</a></td> <td>Global</td> <td>4k</td> <td>WGS</td> <td>80 populations (HGDP + 1000 Genomes reprocessed)</td> <td>4k-cohort total AF only; per-population AF columns are <b>full gnomAD v3.1.2</b> release values (~76k genomes), see Notes below</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=ga4kSnv">GA4K</a></td> <td>USA</td> <td>552</td> <td>PacBio HiFi long-read WGS</td> <td>Genomic Answers for Kids: pediatric rare-disease probands and families (Children's Mercy)</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=colorsDbSnv">CoLoRSdb v1.2.0</a></td> <td>Multi-national</td> <td>1,027</td> <td>PacBio HiFi long-read WGS</td> <td>Consortium of Long Read Sequencing: aggregated population-consented samples across multiple research cohorts</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=svatalogSnv">SVatalog 101</a></td> <td>Canada (SickKids)</td> <td>101</td> <td>10X Genomics linked short-read WGS</td> <td>GWAS SVatalog cohort: 101 samples with matched long-read SVs (see <a href="hgTrackUi?g=chirmade101Sv">chirmade101Sv</a>)</td> <td>—</td> <td>Yes</td> </tr> <tr> <td><a href="hgTrackUi?g=tishkoff180">Indigenous Africans 180</a></td> <td>Africa (Ethiopia, Tanzania, Cameroon, Botswana)</td> <td>180</td> <td>WGS (>30x)</td> <td>12 indigenous populations across all four African language phyla (Khoesan, Niger-Congo, Nilo-Saharan, Afroasiatic)</td> <td>—</td> <td>No</td> </tr> </table> <h2>Notes on Specific Sub-tracks</h2> <h3>AllOfUs — local-ancestry-stratified frequencies</h3> <p> The AllOfUs subtrack ships <b>local-ancestry-stratified</b> allele frequencies, not the global ancestry categories used in the All of Us Research Program 2024 Nature paper (see References). Each variant's per-ancestry AF/AC counts only the haplotypes whose inferred local ancestry at that exact genomic position belongs to the named group (strict-both-haps mode). The six ancestry classes (African, Indigenous American, East Asian, European, Oceanian, South Asian) match HGDP-derived local-ancestry reference panels and so include Oceanian, which is not one of the paper's six global Rye categories (those are AFR, AMR, EAS, EUR, Middle Eastern, SAS). For an admixed individual, the local-ancestry AF at a position can therefore differ substantially from the AF among self-reported members of the same ancestry group. The pipeline that produced this VCF was developed by the Ioannidis lab (Phoenix, UCSC) and applied to the AllOfUs v7 release; only variants with cohort allele count ≥ 20 were retained. </p> <h3>gnomAD HGDP+1kG — cohort vs full-release frequencies</h3> <p> This subtrack derives from the gnomAD v3.1.2 release, which embeds the 4,094-genome jointly-called HGDP+1kG cohort (Koenig et al. 2024) inside the larger gnomAD aggregation. To save space, only INFO fields useful for clinical and population-genetic interpretation were retained. Two distinct allele-frequency sets are exposed: </p> <ul> <li>The <b>cohort-level</b> AC/AF/AN fields (no prefix) are computed across the ~3,400 unrelated HGDP+1kG individuals (allele number ≈ 6,800).</li> <li>The <b>per-population</b> filter fields (gnomAD v3.1.2 African AF, gnomAD v3.1.2 Latino AF, etc.) are values from the <b>full gnomAD v3.1.2 release</b> (~76,000 genomes), not just the 4,094-genome HGDP+1kG cohort. The corresponding allele numbers are typically tens of thousands per population.</li> </ul> <p> The trackUI labels and bigBed field descriptions reflect this distinction. Per-population HGDP+1kG-cohort frequencies are not exposed because the cohort is too small to give stable per-population estimates for many populations. </p> <h2>Display Conventions</h2> <p>Most tracks only show the variant and allele frequencies on mouseover or clicks. When zoomed in, tracks display alleles with base-specific coloring. Homozygote data are shown as one letter, while heterozygotes will be displayed with both letters. All VCF files are normalized, with one single allele per annotation (no multi-allele lines). </p> <h2>Methods</h2> <p> Each subtrack ships the upstream project's VCF largely as-released; per-subtrack pipelines (coordinate liftover, format conversion, header normalization) are documented on each subtrack's own description page and recorded in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">build documentation</a>. The conversion scripts (<em>e.g.</em> <code>finngen_to_vcf.py</code>, <code>kovaToVcf.py</code>, <code>schema_addAcAnAf.py</code>, <code>svatalogFreqToVcf.py</code>) live alongside the makedoc in the <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">scripts directory</a>. </p> <p> The combined "All Databases" subtrack is built by a separate pipeline: each per-subtrack VCF is normalized (<code>bcftools norm</code>), all sites are merged into a single multi-sample callset, consequence annotations are recomputed against Ensembl with <code>bcftools csq</code>, and the result is converted to bigBed via <code>vcfToBigBed.py</code> + <code>bedToBigBed</code>. The mapping from upstream INFO fields to bigBed columns is driven by two configuration files in the scripts directory: <code>databases.tsv</code> (one row per source dataset) and <code>populations.tsv</code> (per-population AC/AF columns within each source). Editing those two files and rerunning <code>mergeAndAnnotate.sh</code> followed by <code>vcfToBigBed.py</code> rebuilds the combined track. </p> <h2>Data Access</h2> <p>All the data is publicly available. The table above indicates if we are allowed to distribute it in VCF format. Most of the databases do not allow us to redistribute the data files directly from our website, but it can always be downloaded from the original websites in some form. Click the database link in the table above and see the "Data Access" section of the respective track for a description of where to download the data. When the data is freely available from our website, the Data Access section will also indicate the VCF file location on our download server. Because it contains some licensed data, the combined track is not available for download, but can be recreated using the conversion scripts in our <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">GitHub repository</a> and the accompanying <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">documentation file</a>. </p> <h2>Credits</h2> <p>This track is only possible thanks to the data from millions of volunteers around the world, who donated blood, signed consent forms and provided health information about themselves and sometimes their families. Click on any of the tracks in the list above to see the specific credits for each project. Thanks to Alex Ioannidis, UCSC, for the motivation for this track and to Andreas Lahner, MGZ, for feedback.</p> <h2>References</h2> <p> All of Us Research Program Genomics Investigators. <a href="https://doi.org/10.1038/s41586-023-06957-x" target="_blank"> Genomic data in the All of Us Research Program</a>. <em>Nature</em>. 2024 Mar;627(8003):340-346. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38374255" target="_blank">38374255</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10937371/" target="_blank">PMC10937371</a> </p> <p> Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kahari AK, Lundin P, Che H <em>et al</em>. <a href="https://doi.org/10.1038/ejhg.2017.130" target="_blank"> SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population</a>. <em>Eur J Hum Genet</em>. 2017 Nov;25(11):1253-1260. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28832569" target="_blank">28832569</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765326/" target="_blank">PMC5765326</a> </p> <p> Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G, Lin F, Keenan K, Patel RV <em>et al</em>. <a href="https://doi.org/10.1038/s41437-025-00809-2" target="_blank"> GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations</a>. <em>Heredity (Edinb)</em>. 2025 Sep;135(3):199-210. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41203876" target="_blank">41203876</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13031531/" target="_blank">PMC13031531</a> </p> <p> Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L, Baybayan P, Belden B <em>et al</em>. <a href="https://doi.org/10.1016/j.gim.2022.02.007" target="_blank"> Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes</a>. <em>Genet Med</em>. 2022 Jun;24(6):1336-1348. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35305867" target="_blank">35305867</a> </p> <p> Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, Ranciaro A, Hirbo J, Beggs W, Thomas N <em>et al</em>. <a href="https://doi.org/10.1016/j.cell.2023.01.042" target="_blank"> Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation</a>. <em>Cell</em>. 2023 Mar 2;186(5):923-939.e14. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36868214" target="_blank">36868214</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10568978/" target="_blank">PMC10568978</a> </p> <p> Feliciano P, Daniels AM, Snyder LG, Beaumont A, Camba A, Esler A, Gulsrud AG, Mason A, Nicholson A, Paolicelli AM <em>et al</em>; The SPARK Consortium. <a href="https://doi.org/10.1016/j.neuron.2018.01.015" target="_blank"> SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research</a>. <em>Neuron</em>. 2018 Feb 7;97(3):488-493. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/29420931" target="_blank">29420931</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7444276/" target="_blank">PMC7444276</a> </p> <p> GenomeAsia100K Consortium. <a href="https://doi.org/10.1038/s41586-019-1793-z" target="_blank"> The GenomeAsia 100K Project enables genetic discoveries across Asia</a>. <em>Nature</em>. 2019 Dec;576(7785):106-111. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31802016" target="_blank">31802016</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054211/" target="_blank">PMC7054211</a> </p> <p> Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M, Jolly B <em>et al</em>. <a href="https://doi.org/10.1093/nar/gkaa923" target="_blank"> IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes</a>. <em>Nucleic Acids Res</em>. 2021 Jan 8;49(D1):D1225-D1232. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33095885" target="_blank">33095885</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778947/" target="_blank">PMC7778947</a> </p> <p> Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP <em>et al</em>. <a href="https://doi.org/10.1038/s41586-020-2308-7" target="_blank"> The mutational constraint spectrum quantified from variation in 141,456 humans</a>. <em>Nature</em>. 2020 May;581(7809):434-443. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461654" target="_blank">32461654</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334197/" target="_blank">PMC7334197</a> </p> <p> Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian N <em>et al</em>. <a href="https://doi.org/10.1101/gr.278378.123" target="_blank"> A harmonized public resource of deeply sequenced diverse human genomes</a>. <em>Genome Res</em>. 2024 Jun 25;34(5):796-809. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38749656" target="_blank">38749656</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11216312/" target="_blank">PMC11216312</a> </p> <p> Kurki MI, Karjalainen J, Palta P, Sipila TP, Kristiansson K, Donner KM, Reeve MP, Laivuori H, Aavikko M, Kaunisto MA <em>et al</em>. <a href="https://doi.org/10.1038/s41586-022-05473-8" target="_blank"> FinnGen provides genetic insights from a well-phenotyped isolated population</a>. <em>Nature</em>. 2023 Jan;613(7944):508-518. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36653562" target="_blank">36653562</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849126/" target="_blank">PMC9849126</a> </p> <p> Lacaze P, Pinese M, Kaplan W, Stone A, Brion MJ, Woods RL, McNamara M, McNeil JJ, Dinger ME, Thomas DM. <a href="https://doi.org/10.1038/s41431-018-0279-z" target="_blank"> The Medical Genome Reference Bank: a whole-genome data resource of 4000 healthy elderly individuals. Rationale and cohort design</a>. <em>Eur J Hum Genet</em>. 2019 Feb;27(2):308-316. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30353151" target="_blank">30353151</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6336775/" target="_blank">PMC6336775</a> </p> <p> Lee S, Seo J, Park J, Nam JY, Choi A, Ignatius JS, Bjornson RD, Chae JH, Jang IJ, Lee S <em>et al</em>. <a href="https://doi.org/10.1038/s41598-017-04642-4" target="_blank"> Korean Variant Archive (KOVA): a reference database of genetic variations in the Korean population</a>. <em>Sci Rep</em>. 2017 Jun 27;7(1):4287. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28655895" target="_blank">28655895</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5487339/" target="_blank">PMC5487339</a> </p> <p> Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A <em>et al</em>. <a href="https://doi.org/10.1038/nature18964" target="_blank"> The Simons Genome Diversity Project: 300 genomes from 142 diverse populations</a>. <em>Nature</em>. 2016 Oct 13;538(7624):201-206. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27654912" target="_blank">27654912</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5161557/" target="_blank">PMC5161557</a> </p> <p> McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K <em>et al</em>. <a href="https://doi.org/10.1038/ng.3643" target="_blank"> A reference panel of 64,976 haplotypes for genotype imputation</a>. <em>Nat Genet</em>. 2016 Oct;48(10):1279-83. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27548312" target="_blank">27548312</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5388176/" target="_blank">PMC5388176</a> </p> <p> Naslavsky MS, Scliar MO, Yamamoto GL, Wang JYT, Zverinova S, Karp T, Nunes K, Ceroni JRM, de Carvalho DL, da Silva Simões CE <em>et al</em>. <a href="https://doi.org/10.1038/s41467-022-28648-3" target="_blank"> Whole-genome sequencing of 1,171 elderly admixed individuals from São Paulo, Brazil</a>. <em>Nat Commun</em>. 2022 Mar 4;13(1):1004. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35246524" target="_blank">35246524</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8897431/" target="_blank">PMC8897431</a> </p> <p> Sohail M, Palma-Martínez MJ, Chong AY, Quinto-Cortés CD, Barberena-Jonas C, Medina-Muñoz SG, Ragsdale A, Delgado-Sánchez G, Cruz-Hervert LP, Ferreyra-Reyes L <em>et al</em>. <a href="https://doi.org/10.1038/s41586-023-06560-0" target="_blank"> Mexican Biobank advances population and medical genomics of diverse ancestries</a>. <em>Nature</em>. 2023 Oct;622(7984):775-783. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37821706" target="_blank">37821706</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10600006/" target="_blank">PMC10600006</a> </p> <p> Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, Bass N, Bigdeli TB, Breen G, Bromet EJ <em>et al</em>. <a href="https://doi.org/10.1038/s41586-022-04556-w" target="_blank"> Rare coding variants in ten genes confer substantial risk for schizophrenia</a>. <em>Nature</em>. 2022 Apr;604(7906):509-516. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35396579" target="_blank">35396579</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9805802/" target="_blank">PMC9805802</a> </p> <p> Tadaka S, Kawashima J, Hishinuma E, Saito S, Okamura Y, Otsuki A, Kojima K, Komaki S, Aoki Y, Kanno T <em>et al</em>. <a href="https://doi.org/10.1093/nar/gkad978" target="_blank"> jMorp: Japanese Multi-Omics Reference Panel update report 2023</a>. <em>Nucleic Acids Res</em>. 2024 Jan 5;52(D1):D622-D632. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37930845" target="_blank">37930845</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10767895/" target="_blank">PMC10767895</a> </p> <p> Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM <em>et al</em>. <a href="https://doi.org/10.1038/s41586-021-03205-y" target="_blank"> Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program</a>. <em>Nature</em>. 2021 Feb;590(7845):290-299. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33568819" target="_blank">33568819</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770/" target="_blank">PMC7875770</a> </p> <p> Wong E, Bertin N, Hebrard M, Tirado-Magallanes R, Bellis C, Lim WK, Chua CY, Tong PML, Chua R, Mak K <em>et al</em>. <a href="https://doi.org/10.1038/s41588-022-01274-x" target="_blank"> The Singapore National Precision Medicine Strategy</a>. <em>Nat Genet</em>. 2023 Feb;55(2):178-186. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36658435" target="_blank">36658435</a> </p> <p> Wu D, Dou J, Chai X, Bellis C, Wilm A, Shih CC, Soon WWJ, Bertin N, Lin CB, Khor CC <em>et al</em>. <a href="https://doi.org/10.1016/j.cell.2019.09.019" target="_blank"> Large-scale whole-genome sequencing of three diverse Asian populations in Singapore</a>. <em>Cell</em>. 2019 Oct 17;179(3):736-749.e15. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31626772" target="_blank">31626772</a> </p>