src/hg/makeDb/trackDb/human/varFreqs.html 1259dcfba3a263d92d2602665fd866dc44b47996

1259dcfba3a263d92d2602665fd866dc44b47996
lrnassar
  Sun Jun 21 11:17:10 2026 -0700
Clarify varFreqs description page wording per code review feedback. refs #37733

Reword the default_an sentence in the Pooled allele frequency sections of
varFreqsAffected.html and varFreqsBackground.html to explain that cohorts
publishing only AF are pooled via an assigned default_an, with per-arm AC
derived as round(AF * default_an). Change "tokens" to "terms" in the
Consequence filter section of varFreqs.html.

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index 44840eb1a7c..fb6fc359c6c 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -1,766 +1,766 @@
 <h2>Description</h2>
 <p>
 This track collection gathers variant allele frequencies from population-scale sequencing
 and genotyping projects worldwide, from a total of ~1.7 million genomes/exomes/arrays.
 Unlike gnomAD, the data was not reprocessed in a harmonized way; the variant VCFs were collected from the
 projects as-is. The goal is a single place to compare how common a variant is across
 different populations, ancestries, and cohorts, for projects that gnomAD is unlikely to
 reprocess soon. Three combined tracks aggregate the source data along different lines, and
 there is also one subtrack per project with the original VCF data and all the annotations
 that the project provides. The different projects use different pipelines and sequencing
 technologies. Click any of the projects above or below for a summary of their sample
 selection, sequencing assay and software pipeline. Many projects do not allow us to
 distribute the data, but we document how to request it and provide all converters, see Data Download below.
 </p>
 
 <p>
 The browser has other tracks with variant frequencies. We have of course the data 
 from <a href="hgTrackUi?g=gnomadVariants">gnomAD</a> in separate tracks. Two projects that
 provide haplotype-phased genotypes can also be found in their own tracks:
 <a href="hgTrackUi?g=tgpArchive">1000 Genomes</a> is a separate track, and the phased
 genotypes HGDP, SGDP, HGDP+1000 Genomes and Mexico Biobank are in the
 <a href="hgTrackUi?g=phasedVars">Phased Variants</a> track. Their VCF versions below show
 only the allele frequency per variant, not the phased genotypes.
 </p>
 
 <p>Please contact us (<a href="mailto:&#103;en&#111;&#109;&#101;&#64;&#115;&#111;&#101;.&#117;&#99;s&#99;.&#101;&#100;u">&#103;en&#111;&#109;&#101;&#64;&#115;&#111;&#101;.&#117;&#99;s&#99;.&#101;&#100;u</a><!-- above address is genome at soe.ucsc.edu -->) if you know of a project that we should add. So far,
 we have requested data from Regeneron&apos;s Million Exomes and the Mexico City studies (both requests rejected);
 Taiwan Biobank and the full UK Biobank WGS data requests are pending.</p>
 
 <h2>Combined Tracks</h2>
 <p>
 Three combined tracks merge variants from the individual subtracks into single bigBed files
 with predicted protein consequences and cross-database filtering. All three use the same
 filter conventions (variant type, consequence, source database, allele frequency, allele
 count, and per-database AF/AC).
 </p>
 <ul>
   <li><a href="hgTrackUi?g=varFreqsBackground"><b>Population reference</b></a> &mdash; the
       default summary view: variants seen in the population reference cohorts (gnomAD
       HGDP+1kG, TOPMed, ALFA, HRC and the national WGS projects) and in the
       unaffected/control arms of the disease cohorts. Excludes the genotyping-array
       cohorts.</li>
   <li><a href="hgTrackUi?g=varFreqsAffected"><b>Disease cohorts</b></a> &mdash;
       variants seen in the affected or case arm of five disease-study cohorts (SFARI SPARK
       WES and WGS autism probands, SCHEMA schizophrenia cases, GREGoR affected, GA4K
       rare-disease). Each variant also carries its background frequency, so case-enriched
       variants can be isolated by filtering Background AF.</li>
   <li><a href="hgTrackUi?g=varFreqsArray"><b>Genotyping Array Databases Combined</b></a>
       &mdash; 14.7 million variants from three array cohorts (TPMI Taiwan, Mexico Biobank,
       UK Biobank imputed). Kept separate because chip data has different per-variant
       confidence than sequencing.</li>
 </ul>
 
 <p>
 On the Disease and Population reference tracks, <b>Affected AF</b> and <b>Background AF</b>
 are pooled across contributing cohort arms (sum of allele counts divided by sum of allele
 numbers), not the maximum across arms, so the displayed frequency matches the carrier-count
 scale and a small cohort with a high local frequency does not dominate the value. See the
 &quot;Pooled allele frequency&quot; section on each combined track's description page for
 which cohorts contribute to the pool numerator and denominator.
 </p>
 
 <h3>Consequence filter &mdash; the &quot;Other&quot; bucket</h3>
 <p>
 All three combined tracks share the same Consequence filter (Missense, Synonymous, Stop
 Gained, Frameshift, Splice Donor, Splice Acceptor, Intron, 3' UTR, 5' UTR, Non-coding,
-Intergenic, Other). The filter uses OR logic across the comma-separated consequence tokens
+Intergenic, Other). The filter uses OR logic across the comma-separated consequence terms
 on each variant: a variant tagged <code>stop_gained,frameshift</code> is selected by either
 the &quot;Stop Gained&quot; or the &quot;Frameshift&quot; filter. The &quot;Other&quot;
 bucket catches the less common
 <a href="http://www.sequenceontology.org/" target="_blank">Sequence Ontology</a> consequence
  that don't fit the named buckets above. Examples
 include <code>splice_region</code> (variant near a splice site but outside the canonical
 donor/acceptor), <code>start_lost</code> / <code>stop_lost</code> (variant disrupts the
 start codon or replaces the stop codon with a coding amino acid),
 <code>stop_retained</code> (variant changes the stop codon but keeps it a stop),
 <code>inframe_insertion</code> / <code>inframe_deletion</code> (in-frame indel that adds or
 removes whole codons), and <code>coding_sequence</code> (CDS variant where the precise
 impact is undetermined). If you include &quot;Other&quot; in the filter selection, no
 records will be hidden by the consequence filter.
 </p>
 
 <h3>Available Datasets</h3>
 
 <style>
 /* varFreqs dataset table: the three combined tracks and the per-project datasets
    are logically two tables. Give the column headers a strong background so they
    stand out, and a light group-heading bar to separate the two sections. */
 #varFreqsTbl th {
   background-color: #00457c;
   color: #ffffff;
 }
 #varFreqsTbl tr.varFreqsGroup td {
   background-color: #d9e4f8;
   font-weight: bold;
   font-size: 1.05em;
 }
 </style>
 
 <table class="stdTbl" id="varFreqsTbl">
 <tr class="varFreqsGroup"><td colspan="7">Combined tracks</td></tr>
 <tr>
   <th>Database</th>
   <th>Region</th>
   <th>N</th>
   <th>Data Type</th>
   <th>Cohort</th>
   <th>Sub-populations</th>
   <th>Downloadable from UCSC</th>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=varFreqsAffected">Disease cohorts</a></td>
   <td>Sequencing-based disease cohorts</td>
   <td>~130k</td>
   <td>WGS/WES/long-read</td>
   <td>Affected/case arms of SFARI SPARK WES/WGS, SCHEMA, GREGoR, GA4K</td>
   <td>Affected/case AF and AC; background AF for contrast</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=varFreqsBackground">Population reference</a></td>
   <td>Sequencing-based, population + unaffected</td>
   <td>~1.5mil</td>
   <td>WGS/WES/long-read</td>
   <td>Population cohorts + unaffected/control arms</td>
   <td>Background AF and AC; per-cohort and ancestry breakdowns</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=varFreqsArray">Genotyping Array Databases Combined</a></td>
   <td>TPMI, MexBB, UKBB</td>
   <td>~530k</td>
   <td>Array / imputed</td>
   <td>14.7M variants</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr class="varFreqsGroup"><td colspan="7">Individual project datasets</td></tr>
 <tr>
   <th>Database</th>
   <th>Region</th>
   <th>N</th>
   <th>Data Type</th>
   <th>Cohort</th>
   <th>Sub-populations</th>
   <th>Downloadable from UCSC</th>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=allofus">AllOfUs v7</a></td>
   <td>USA</td>
   <td>245k</td>
   <td>WGS</td>
   <td>General population, diverse</td>
   <td>African, Indigenous American, East Asian, European, Oceanian, South Asian
       (<b>local ancestry</b>; see Notes below)</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=topmed">TOPMED Freeze 10</a></td>
   <td>USA</td>
   <td>151k</td>
   <td>WGS</td>
   <td>Heart, lung, blood, sleep disorder cohorts</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=sfariSparkExomes">SFARI SPARK WES</a></td>
   <td>USA</td>
   <td>140k</td>
   <td>WES</td>
   <td>Autism families (parents + affected children)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=sfariSparkWgs">SFARI SPARK WGS</a></td>
   <td>USA</td>
   <td>12.5k</td>
   <td>WGS</td>
   <td>Autism families (parents + affected children)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=alfaVcf">NCBI ALFA R4</a></td>
   <td>USA</td>
   <td>408k</td>
   <td>WGS/WES/array mix</td>
   <td>Aggregated dbGaP studies, mixed phenotypes</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=finngen">FinnGen R12</a></td>
   <td>Finland</td>
   <td>500k</td>
   <td>Imputed (8.5k WGS ref panel)</td>
   <td>National biobank, ~10% of population</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=ukbb">UK Biobank (Neale Lab v3)</a></td>
   <td>UK</td>
   <td>361k</td>
   <td>Imputed array (HRC+UK10K+1KGp3 ref panel)</td>
   <td>White British subset of UK Biobank, Neale Lab Round 2 GWAS</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=swefreq">SweGen</a></td>
   <td>Sweden</td>
   <td>1k</td>
   <td>WGS</td>
   <td>Cross-section of Swedish population</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=gonl">GoNL</a></td>
   <td>Netherlands</td>
   <td>498</td>
   <td>WGS (~13x)</td>
   <td>250 unrelated Dutch trios (parents only)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=schema">SCHEMA</a></td>
   <td>Multi-national</td>
   <td>121k</td>
   <td>WES</td>
   <td>Schizophrenia: 24k cases, 97k controls (Singh 2022 primary); VCF aggregates up to ~73k/~182k</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=tommo60kjpn">Japan ToMMO 61k</a></td>
   <td>Japan</td>
   <td>61k</td>
   <td>WGS</td>
   <td>General population</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=wbbc">WBBC China</a></td>
   <td>China</td>
   <td>4.5k</td>
   <td>WGS</td>
   <td>Westlake BioBank for Chinese pilot (now part of China Precision BioBank), autosomes only</td>
   <td>North Han, Central Han, South Han, Lingnan Han (by recruitment region)</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=chinamap">ChinaMAP phase 1</a></td>
   <td>China</td>
   <td>10.5k</td>
   <td>WGS</td>
   <td>China Metabolic Analytics Project, ~40x depth, 27 provinces and 8 ethnic groups, autosomes only</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=tpmi">Taiwan TPMI</a></td>
   <td>Taiwan</td>
   <td>165k</td>
   <td>Axiom SNP array (TPM1)</td>
   <td>Taiwan Precision Medicine Initiative, Han Chinese</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=mgrb">Australia MGRB</a></td>
   <td>Australia</td>
   <td>4k</td>
   <td>WGS</td>
   <td>Healthy elderly (age &ge;70)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=gasp">GenomeAsia Pilot</a></td>
   <td>Asia (219 groups)</td>
   <td>1.7k</td>
   <td>WGS</td>
   <td>Diverse populations across Asia</td>
   <td>Northeast Asian, Southeast Asian, South Asian, Oceanian, American, African,
       Western European Reference</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=abraom">ABraOM Brazil</a></td>
   <td>Brazil</td>
   <td>1.2k</td>
   <td>WGS</td>
   <td>Elderly admixed individuals (S&atilde;o Paulo)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=indigenomes">IndiGenomes</a></td>
   <td>India</td>
   <td>1k</td>
   <td>WGS</td>
   <td>Healthy individuals</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=genomeindia">GenomeIndia 9.7k</a></td>
   <td>India</td>
   <td>9.8k</td>
   <td>WGS (&ge;23x)</td>
   <td>83 anthropologically defined endogamous populations across India</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=kova">KOVA Korea</a></td>
   <td>Korea</td>
   <td>5.3k</td>
   <td>1.9k WGS + 3.4k WES</td>
   <td>Normal tissue from cancer patients, healthy parents, volunteers</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=npm">NPM Singapore</a></td>
   <td>Singapore</td>
   <td>9.8k</td>
   <td>WGS</td>
   <td>Chinese, Indian, Malay ancestry</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=saudi">Saudi Genome</a></td>
   <td>Saudi Arabia</td>
   <td>302</td>
   <td>WGS (30x)</td>
   <td>Saudi population</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=hrc">HRC</a></td>
   <td>Multi-national</td>
   <td>~30k</td>
   <td>Low-coverage WGS (7x)</td>
   <td>Imputation reference panel (excl. 1000 Genomes)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=mxbFreq">MXB Mexico Biobank</a></td>
   <td>Mexico</td>
   <td>6k</td>
   <td>Genotyping array</td>
   <td>Diverse Mexican ancestries, 898 recruitment sites</td>
   <td>By state, by ancestry</td>
   <td>No</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=sgdpFreq">SGDP</a></td>
   <td>Global</td>
   <td>279</td>
   <td>WGS</td>
   <td>142 diverse populations worldwide</td>
   <td>By population</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=gregor">GREGoR R4</a></td>
   <td>USA</td>
   <td>3.6k</td>
   <td>WGS</td>
   <td>Rare disease families (10.7k participants, 4.4k families)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=hgdp1kFreq">gnomAD HGDP+1kG</a></td>
   <td>Global</td>
   <td>4k</td>
   <td>WGS</td>
   <td>80 populations (HGDP + 1000 Genomes reprocessed)</td>
   <td>4k-cohort total AF only; per-population AF columns are <b>full gnomAD v3.1.2</b>
       release values (~76k genomes), see Notes below</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=ga4kSnv">GA4K</a></td>
   <td>USA</td>
   <td>552</td>
   <td>PacBio HiFi long-read WGS</td>
   <td>Genomic Answers for Kids: pediatric rare-disease probands and families (Children's Mercy)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=colorsDbSnv">CoLoRSdb v1.2.0</a></td>
   <td>Multi-national</td>
   <td>1,027</td>
   <td>PacBio HiFi long-read WGS</td>
   <td>Consortium of Long Read Sequencing: aggregated population-consented samples across multiple research cohorts</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=svatalogSnv">SVatalog 101</a></td>
   <td>Canada (SickKids)</td>
   <td>101</td>
   <td>10X Genomics linked short-read WGS</td>
   <td>GWAS SVatalog cohort: 101 samples with matched long-read SVs (see <a href="hgTrackUi?g=chirmade101Sv">chirmade101Sv</a>)</td>
   <td>&mdash;</td>
   <td>Yes</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=tishkoff180">Indigenous Africans 180</a></td>
   <td>Africa (Ethiopia, Tanzania, Cameroon, Botswana)</td>
   <td>180</td>
   <td>WGS (&gt;30x)</td>
   <td>12 indigenous populations across all four African language phyla (Khoesan, Niger-Congo, Nilo-Saharan, Afroasiatic)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 </table>
 
 <h2>Display Conventions</h2>
 
 <p>Most tracks only show the variant and allele frequencies on mouseover or clicks.
 When zoomed in, tracks display alleles with base-specific coloring. Homozygote
 data are shown as one letter; heterozygotes are shown with both
 letters. All VCF files are normalized, with one allele per annotation (no multi-allele
 lines).
 </p>
 
 <h2>Methods</h2>
 <p>
 Each subtrack includes the upstream project's VCF largely as-released,
 sometimes converted from other file formats; per-subtrack pipelines (coordinate
 liftover, format conversion, header normalization) are documented on each
 subtrack's own description page and recorded in the
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">build documentation</a>.
 The conversion scripts 
 live alongside the makedoc
 in the <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">scripts directory</a>.
 </p>
 <p>
 The combined Disease cohorts and Population reference tracks are built by a separate
 pipeline: each per-subtrack VCF is normalized (<code>bcftools norm</code>), all sites are
 merged into a single callset, consequence annotations are recomputed against Ensembl with
 <code>bcftools csq</code>, and the merged callset is split by phenotype. Within each combined
 track, the <b>Affected AF</b> and <b>Background AF</b> columns are
 <i>pooled</i> across contributing cohort arms (sum of allele counts divided by sum of
 allele numbers, with the per-arm AN derived from each cohort's AC and AF), so the displayed
 frequency matches the carrier-count.
 The Genotyping Array Databases Combined track is built the same
 way from the array cohorts only.
 </p>
 
 <h2>Data Access</h2>
 <p>Many of these databases have restrictions on redistribution and download.
 The table above indicates if we are allowed to distribute it in VCF format.
 Click the database link in the table above and see the &quot;Data Access&quot;
 section of the respective track for a description of where to download the
 data. When the data is freely available from our website, the Data Access
 section will also indicate the VCF file location on our download server.
 Because it contains some licensed data, the combined track is not available for
 download, but can be recreated using the conversion scripts in our <a
 href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs"
 target="_blank">GitHub repository</a> and the accompanying <a
 href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt"
 target="_blank">documentation file</a>.  </p>
 
 <h2>Credits</h2>
 
 <p>This track is only possible thanks to the data from millions of volunteers around the world, who donated blood, signed consent forms and provided health information about themselves and sometimes their families. Click any of the tracks in the list above to see the specific credits for each project. Thanks to Alex Ioannidis, UCSC, for the inspiration for this track and to Andreas Lahner, MGZ, for feedback.</p>
 
 <h2>References</h2>
 
 <p>
 All of Us Research Program Genomics Investigators.
 <a href="https://doi.org/10.1038/s41586-023-06957-x" target="_blank">
 Genomic data in the All of Us Research Program</a>.
 <em>Nature</em>. 2024 Mar;627(8003):340-346.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38374255" target="_blank">38374255</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10937371/" target="_blank">PMC10937371</a>
 </p>
 
 <p>
 Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kahari AK, Lundin P, Che H
 <em>et al</em>.
 <a href="https://doi.org/10.1038/ejhg.2017.130" target="_blank">
 SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish
 population</a>.
 <em>Eur J Hum Genet</em>. 2017 Nov;25(11):1253-1260.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28832569" target="_blank">28832569</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765326/" target="_blank">PMC5765326</a>
 </p>
 
 <p>
 Bhattacharyya C, Subramanian K, Uppili B, Biswas NK, Ramdas S, Tallapaka KB, Arvind P, Rupanagudi
 KV, Maitra A, Nagabandi T <em>et al</em>.
 <a href="https://doi.org/10.1038/s41588-025-02153-x" target="_blank">
 Mapping genetic diversity with the GenomeIndia project</a>.
 <em>Nat Genet</em>. 2025 Apr;57(4):767-773.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40200122" target="_blank">40200122</a>
 </p>
 
 <p>
 Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O,
 O&#x27;Connell J <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-018-0579-z" target="_blank">
 The UK Biobank resource with deep phenotyping and genomic data</a>.
 <em>Nature</em>. 2018 Oct;562(7726):203-209.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30305743" target="_blank">30305743</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6786975/" target="_blank">PMC6786975</a>
 </p>
 
 <p>
 Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, Xu Y, Du P, Wang T, Hu R <em>et al</em>.
 <a href="https://doi.org/10.1038/s41422-020-0322-9" target="_blank">
 The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals</a>.
 <em>Cell Res</em>. 2020 Sep;30(9):717-731.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/32355288" target="_blank">32355288</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7609296/" target="_blank">PMC7609296</a>
 </p>
 
 <p>
 Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G,
 Lin F, Keenan K, Patel RV <em>et al</em>.
 <a href="https://doi.org/10.1038/s41437-025-00809-2" target="_blank">
 GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations</a>.
 <em>Heredity (Edinb)</em>. 2025 Sep;135(3):199-210.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41203876" target="_blank">41203876</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13031531/" target="_blank">PMC13031531</a>
 </p>
 
 <p>
 Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L,
 Baybayan P, Belden B <em>et al</em>.
 <a href="https://doi.org/10.1016/j.gim.2022.02.007" target="_blank">
 Genomic answers for children: Dynamic analyses of &gt;1000 pediatric rare disease genomes</a>.
 <em>Genet Med</em>. 2022 Jun;24(6):1336-1348.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35305867" target="_blank">35305867</a>
 </p>
 
 <p>
 Cong PK, Bai WY, Li JC, Yang MY, Khederzadeh S, Gai SR, Li N, Liu YH, Yu SH, Zhao WW <em>et al</em>.
 <a href="https://doi.org/10.1038/s41467-022-30526-x" target="_blank">
 Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project</a>.
 <em>Nat Commun</em>. 2022 May 26;13(1):2939.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35618720" target="_blank">35618720</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9135724/" target="_blank">PMC9135724</a>
 </p>
 
 <p>
 Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, Ranciaro A, Hirbo J, Beggs W, Thomas
 N <em>et al</em>.
 <a href="https://doi.org/10.1016/j.cell.2023.01.042" target="_blank">
 Whole-genome sequencing reveals a complex African population demographic history and signatures of
 local adaptation</a>.
 <em>Cell</em>. 2023 Mar 2;186(5):923-939.e14.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36868214" target="_blank">36868214</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10568978/" target="_blank">PMC10568978</a>
 </p>
 
 <p>
 Feliciano P, Daniels AM, Snyder LG, Beaumont A, Camba A, Esler A, Gulsrud AG, Mason A, Nicholson A,
 Paolicelli AM <em>et al</em>; The SPARK Consortium.
 <a href="https://doi.org/10.1016/j.neuron.2018.01.015" target="_blank">
 SPARK: A US Cohort of 50,000 Families to Accelerate Autism Research</a>.
 <em>Neuron</em>. 2018 Feb 7;97(3):488-493.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/29420931" target="_blank">29420931</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7444276/" target="_blank">PMC7444276</a>
 </p>
 
 <p>
 Genome of the Netherlands Consortium.
 <a href="https://doi.org/10.1038/ng.3021" target="_blank">
 Whole-genome sequence variation, population structure and demographic history of the Dutch
 population</a>.
 <em>Nat Genet</em>. 2014 Aug;46(8):818-25.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/24974849" target="_blank">24974849</a>
 </p>
 
 <p>
 GenomeAsia100K Consortium.
 <a href="https://doi.org/10.1038/s41586-019-1793-z" target="_blank">
 The GenomeAsia 100K Project enables genetic discoveries across Asia</a>.
 <em>Nature</em>. 2019 Dec;576(7785):106-111.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31802016" target="_blank">31802016</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054211/" target="_blank">PMC7054211</a>
 </p>
 
 <p>
 Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M,
 Jolly B <em>et al</em>.
 <a href="https://doi.org/10.1093/nar/gkaa923" target="_blank">
 IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes</a>.
 <em>Nucleic Acids Res</em>. 2021 Jan 8;49(D1):D1225-D1232.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33095885" target="_blank">33095885</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7778947/" target="_blank">PMC7778947</a>
 </p>
 
 <p>
 Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM,
 Ganna A, Birnbaum DP <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-020-2308-7" target="_blank">
 The mutational constraint spectrum quantified from variation in 141,456 humans</a>.
 <em>Nature</em>. 2020 May;581(7809):434-443.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/32461654" target="_blank">32461654</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334197/" target="_blank">PMC7334197</a>
 </p>
 
 <p>
 Koenig Z, Yohannes MT, Nkambule LL, Zhao X, Goodrich JK, Kim HA, Wilson MW, Tiao G, Hao SP, Sahakian
 N <em>et al</em>.
 <a href="https://doi.org/10.1101/gr.278378.123" target="_blank">
 A harmonized public resource of deeply sequenced diverse human genomes</a>.
 <em>Genome Res</em>. 2024 Jun 25;34(5):796-809.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38749656" target="_blank">38749656</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11216312/" target="_blank">PMC11216312</a>
 </p>
 
 <p>
 Kurki MI, Karjalainen J, Palta P, Sipila TP, Kristiansson K, Donner KM, Reeve MP, Laivuori H,
 Aavikko M, Kaunisto MA <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-022-05473-8" target="_blank">
 FinnGen provides genetic insights from a well-phenotyped isolated population</a>.
 <em>Nature</em>. 2023 Jan;613(7944):508-518.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36653562" target="_blank">36653562</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9849126/" target="_blank">PMC9849126</a>
 </p>
 
 <p>
 Lacaze P, Pinese M, Kaplan W, Stone A, Brion MJ, Woods RL, McNamara M, McNeil JJ, Dinger ME,
 Thomas DM.
 <a href="https://doi.org/10.1038/s41431-018-0279-z" target="_blank">
 The Medical Genome Reference Bank: a whole-genome data resource of 4000 healthy elderly individuals.
 Rationale and cohort design</a>.
 <em>Eur J Hum Genet</em>. 2019 Feb;27(2):308-316.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30353151" target="_blank">30353151</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6336775/" target="_blank">PMC6336775</a>
 </p>
 
 <p>
 Lee S, Seo J, Park J, Nam JY, Choi A, Ignatius JS, Bjornson RD, Chae JH, Jang IJ, Lee S
 <em>et al</em>.
 <a href="https://doi.org/10.1038/s41598-017-04642-4" target="_blank">
 Korean Variant Archive (KOVA): a reference database of genetic variations in the Korean
 population</a>.
 <em>Sci Rep</em>. 2017 Jun 27;7(1):4287.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28655895" target="_blank">28655895</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5487339/" target="_blank">PMC5487339</a>
 </p>
 
 <p>
 Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S,
 Tandon A <em>et al</em>.
 <a href="https://doi.org/10.1038/nature18964" target="_blank">
 The Simons Genome Diversity Project: 300 genomes from 142 diverse populations</a>.
 <em>Nature</em>. 2016 Oct 13;538(7624):201-206.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27654912" target="_blank">27654912</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5161557/" target="_blank">PMC5161557</a>
 </p>
 
 <p>
 Malomane DK, Williams MP, Huber CD, Mangul S, Abedalthagafi M, Chiang CWK.
 <a href="https://doi.org/10.1101/2025.01.10.632500" target="_blank">
 Patterns of population structure and genetic variation within the Saudi Arabian population</a>.
 <em>bioRxiv</em>. 2025 Jan 13;.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39868174" target="_blank">39868174</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11761371/" target="_blank">PMC11761371</a>
 </p>
 
 <p>
 McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P,
 Sharp K <em>et al</em>.
 <a href="https://doi.org/10.1038/ng.3643" target="_blank">
 A reference panel of 64,976 haplotypes for genotype imputation</a>.
 <em>Nat Genet</em>. 2016 Oct;48(10):1279-83.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27548312" target="_blank">27548312</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5388176/" target="_blank">PMC5388176</a>
 </p>
 
 <p>
 Naslavsky MS, Scliar MO, Yamamoto GL, Wang JYT, Zverinova S, Karp T, Nunes K, Ceroni JRM,
 de Carvalho DL, da Silva Sim&otilde;es CE <em>et al</em>.
 <a href="https://doi.org/10.1038/s41467-022-28648-3" target="_blank">
 Whole-genome sequencing of 1,171 elderly admixed individuals from S&atilde;o Paulo, Brazil</a>.
 <em>Nat Commun</em>. 2022 Mar 4;13(1):1004.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35246524" target="_blank">35246524</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8897431/" target="_blank">PMC8897431</a>
 </p>
 
 <p>
 Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, Bass N, Bigdeli TB, Breen G,
 Bromet EJ <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-022-04556-w" target="_blank">
 Rare coding variants in ten genes confer substantial risk for schizophrenia</a>.
 <em>Nature</em>. 2022 Apr;604(7906):509-516.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35396579" target="_blank">35396579</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9805802/" target="_blank">PMC9805802</a>
 </p>
 
 <p>
 Sohail M, Palma-Mart&iacute;nez MJ, Chong AY, Quinto-Cort&eacute;s CD, Barberena-Jonas C,
 Medina-Mu&ntilde;oz SG, Ragsdale A, Delgado-S&aacute;nchez G, Cruz-Hervert LP, Ferreyra-Reyes L
 <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-023-06560-0" target="_blank">
 Mexican Biobank advances population and medical genomics of diverse ancestries</a>.
 <em>Nature</em>. 2023 Oct;622(7984):775-783.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37821706" target="_blank">37821706</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10600006/" target="_blank">PMC10600006</a>
 </p>
 
 <p>
 Tadaka S, Kawashima J, Hishinuma E, Saito S, Okamura Y, Otsuki A, Kojima K, Komaki S, Aoki Y,
 Kanno T <em>et al</em>.
 <a href="https://doi.org/10.1093/nar/gkad978" target="_blank">
 jMorp: Japanese Multi-Omics Reference Panel update report 2023</a>.
 <em>Nucleic Acids Res</em>. 2024 Jan 5;52(D1):D622-D632.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37930845" target="_blank">37930845</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10767895/" target="_blank">PMC10767895</a>
 </p>
 
 <p>
 Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM,
 Kang HM <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-021-03205-y" target="_blank">
 Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program</a>.
 <em>Nature</em>. 2021 Feb;590(7845):290-299.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33568819" target="_blank">33568819</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875770/" target="_blank">PMC7875770</a>
 </p>
 
 <p>
 Wong E, Bertin N, Hebrard M, Tirado-Magallanes R, Bellis C, Lim WK, Chua CY, Tong PML, Chua R, Mak K
 <em>et al</em>.
 <a href="https://doi.org/10.1038/s41588-022-01274-x" target="_blank">
 The Singapore National Precision Medicine Strategy</a>.
 <em>Nat Genet</em>. 2023 Feb;55(2):178-186.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36658435" target="_blank">36658435</a>
 </p>
 
 <p>
 Wu D, Dou J, Chai X, Bellis C, Wilm A, Shih CC, Soon WWJ, Bertin N, Lin CB, Khor CC <em>et al</em>.
 <a href="https://doi.org/10.1016/j.cell.2019.09.019" target="_blank">
 Large-scale whole-genome sequencing of three diverse Asian populations in Singapore</a>.
 <em>Cell</em>. 2019 Oct 17;179(3):736-749.e15.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31626772" target="_blank">31626772</a>
 </p>
 
 <p>
 Yang HC, Kwok PY, Li LH, Liu YM, Jong YJ, Lee KY, Wang DW, Tsai MF, Yang JH, Chen CH <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-025-09680-x" target="_blank">
 The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies</a>.
 <em>Nature</em>. 2025 Dec;648(8092):117-127.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41092961" target="_blank">41092961</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12675286/" target="_blank">PMC12675286</a>
 </p>