src/hg/makeDb/trackDb/human/indigenomes.html 2f44ab096235810d5d621b7356bc64fbebe82494

2f44ab096235810d5d621b7356bc64fbebe82494
lrnassar
  Wed May 13 13:02:57 2026 -0700
varFreqs: fix numeric-claim discrepancies on indigenomes.html and sgdpFreq.html.

indigenomes.html: clarify that the deployed VCF is the public release subset
(18,016,257 records) of the larger Jain 2021 callset (55.8M variants), and
note that the public release is sites-only with a VRT variant-type INFO field
and no AC/AF.

sgdpFreq.html: update Methods to reflect the deployed file (44,756,737 SNV
records, 601,775 multiallelic decomposed); drop the "34.4M SNPs + 2.1M indels"
claim; clarify that the Mallick 2016 FermiKit indel callset is not carried in
this track. refs #36642

diff --git src/hg/makeDb/trackDb/human/indigenomes.html src/hg/makeDb/trackDb/human/indigenomes.html
index fd599135951..156cfc412b9 100644
--- src/hg/makeDb/trackDb/human/indigenomes.html
+++ src/hg/makeDb/trackDb/human/indigenomes.html
@@ -1,54 +1,63 @@
 <h2>Description</h2>
 <p>
 <a href="https://clingen.igib.res.in/indigen/" target="_blank">IndiGenomes</a> provides
 whole genome sequencing data of 1,029 healthy Indian individuals under the pilot phase of the
-&quot;IndiGen&quot; program. Only the allele frequency is available from this project. The website
-also provides SV call and Alu insertion VCFs.
+&quot;IndiGen&quot; program. The IndiGenomes website also provides SV call and Alu insertion VCFs.
+</p>
+<p>
+The deployed VCF shown in this track is the public release subset distributed by the
+IndiGenomes project (18,016,257 records). The full Jain 2021 callset reports 55.8 million
+variants from the 1,029-genome cohort; the public release is a curated subset of those
+sites. The deployed VCF is sites-only and carries a per-variant <em>VRT</em> (variant type)
+INFO field. Per-variant allele counts and allele frequencies are not distributed with the
+public release and therefore are not shown in this track.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>.
 For programmatic access, our <a href="https://api.genome.ucsc.edu" target="_blank">REST API</a> can be used; the
 track name is <em>indigenomes</em>.
 For bulk download, the VCF file can be obtained from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/" target="_blank">our download server</a>.
 </p>
 <p>
 The original data can also be downloaded from the <a href="https://clingen.igib.res.in/indigen/"
 target="_blank">IndiGen website</a>.
 </p>
 
 <h2>Methods</h2>
 <p>
 Genomic DNA was extracted from 5 ml of peripheral blood collected via venipuncture from
 1,029 self-declared healthy Indian individuals representing diverse geographic, ethnic, and
 linguistic groups, using the salting-out method. Whole-genome libraries were prepared using
 the TruSeq DNA PCR-free library preparation kit (Illumina). Sequencing was performed on the
 Illumina NovaSeq 6000 platform with 150&times;2 bp paired-end reads targeting &ge;30&times;
 mean coverage. Alignment to the GRCh38 reference genome, post-processing, and
 default quality-filtered variant calling were performed end-to-end on the Illumina DRAGEN
 v3.4 Bio-IT platform, which uses field-programmable gate array (FPGA) logic for
-high-throughput processing. This yielded a compendium of 55,898,122 single allelic
+high-throughput processing. The full Jain 2021 callset comprises 55,898,122 single-allelic
 genetic variants (SNVs and indels), of which 32.23% were unique to the Indian samples
 and absent from global reference databases. Variants were annotated using ANNOVAR with
 RefGene, and allele frequencies were cross-referenced against gnomAD v3, 1000 Genomes,
-ExAC, ESP6500, and the Greater Middle East Variome Project. The dataset is accessible via
-the <a href="https://clingen.igib.res.in/indigen/" target="_blank">IndiGenomes database</a>
+ExAC, ESP6500, and the Greater Middle East Variome Project. The
+<a href="https://clingen.igib.res.in/indigen/" target="_blank">IndiGenomes database</a>
+distributes a public-release subset of these variants (18,016,257 records); that subset is
+the file used in this track.
 (Jain, Bhoyar, Scaria, Sivasubbu &amp; the IndiGen Consortium,
 <a href="https://doi.org/10.1093/nar/gkaa923" target="_blank"><em>Nucleic Acids Research</em> 2021</a>).
 </p>
 <p>
 We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc file</a> of the track.
 For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target="_blank">GitHub</a>.
 </p>
 
 <h2>References</h2>
 <p>
 Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M,
 Jolly B <em>et al</em>.
 <a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkaa923" target="_blank">
 IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes</a>.
 <em>Nucleic Acids Res</em>. 2021 Jan 8;49(D1):D1225-D1232.