2f44ab096235810d5d621b7356bc64fbebe82494 lrnassar Wed May 13 13:02:57 2026 -0700 varFreqs: fix numeric-claim discrepancies on indigenomes.html and sgdpFreq.html. indigenomes.html: clarify that the deployed VCF is the public release subset (18,016,257 records) of the larger Jain 2021 callset (55.8M variants), and note that the public release is sites-only with a VRT variant-type INFO field and no AC/AF. sgdpFreq.html: update Methods to reflect the deployed file (44,756,737 SNV records, 601,775 multiallelic decomposed); drop the "34.4M SNPs + 2.1M indels" claim; clarify that the Mallick 2016 FermiKit indel callset is not carried in this track. refs #36642 diff --git src/hg/makeDb/trackDb/human/indigenomes.html src/hg/makeDb/trackDb/human/indigenomes.html index fd599135951..156cfc412b9 100644 --- src/hg/makeDb/trackDb/human/indigenomes.html +++ src/hg/makeDb/trackDb/human/indigenomes.html @@ -1,54 +1,63 @@ <h2>Description</h2> <p> <a href="https://clingen.igib.res.in/indigen/" target="_blank">IndiGenomes</a> provides whole genome sequencing data of 1,029 healthy Indian individuals under the pilot phase of the -"IndiGen" program. Only the allele frequency is available from this project. The website -also provides SV call and Alu insertion VCFs. +"IndiGen" program. The IndiGenomes website also provides SV call and Alu insertion VCFs. +</p> +<p> +The deployed VCF shown in this track is the public release subset distributed by the +IndiGenomes project (18,016,257 records). The full Jain 2021 callset reports 55.8 million +variants from the 1,029-genome cohort; the public release is a curated subset of those +sites. The deployed VCF is sites-only and carries a per-variant <em>VRT</em> (variant type) +INFO field. Per-variant allele counts and allele frequencies are not distributed with the +public release and therefore are not shown in this track. </p> <h2>Data Access</h2> <p> The data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For programmatic access, our <a href="https://api.genome.ucsc.edu" target="_blank">REST API</a> can be used; the track name is <em>indigenomes</em>. For bulk download, the VCF file can be obtained from <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/" target="_blank">our download server</a>. </p> <p> The original data can also be downloaded from the <a href="https://clingen.igib.res.in/indigen/" target="_blank">IndiGen website</a>. </p> <h2>Methods</h2> <p> Genomic DNA was extracted from 5 ml of peripheral blood collected via venipuncture from 1,029 self-declared healthy Indian individuals representing diverse geographic, ethnic, and linguistic groups, using the salting-out method. Whole-genome libraries were prepared using the TruSeq DNA PCR-free library preparation kit (Illumina). Sequencing was performed on the Illumina NovaSeq 6000 platform with 150×2 bp paired-end reads targeting ≥30× mean coverage. Alignment to the GRCh38 reference genome, post-processing, and default quality-filtered variant calling were performed end-to-end on the Illumina DRAGEN v3.4 Bio-IT platform, which uses field-programmable gate array (FPGA) logic for -high-throughput processing. This yielded a compendium of 55,898,122 single allelic +high-throughput processing. The full Jain 2021 callset comprises 55,898,122 single-allelic genetic variants (SNVs and indels), of which 32.23% were unique to the Indian samples and absent from global reference databases. Variants were annotated using ANNOVAR with RefGene, and allele frequencies were cross-referenced against gnomAD v3, 1000 Genomes, -ExAC, ESP6500, and the Greater Middle East Variome Project. The dataset is accessible via -the <a href="https://clingen.igib.res.in/indigen/" target="_blank">IndiGenomes database</a> +ExAC, ESP6500, and the Greater Middle East Variome Project. The +<a href="https://clingen.igib.res.in/indigen/" target="_blank">IndiGenomes database</a> +distributes a public-release subset of these variants (18,016,257 records); that subset is +the file used in this track. (Jain, Bhoyar, Scaria, Sivasubbu & the IndiGen Consortium, <a href="https://doi.org/10.1093/nar/gkaa923" target="_blank"><em>Nucleic Acids Research</em> 2021</a>). </p> <p> We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc file</a> of the track. For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target="_blank">GitHub</a>. </p> <h2>References</h2> <p> Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M, Jolly B <em>et al</em>. <a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkaa923" target="_blank"> IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes</a>. <em>Nucleic Acids Res</em>. 2021 Jan 8;49(D1):D1225-D1232.