9bfd58221b1539193cb7f0a317b4e959c1c7e49a max Thu May 21 01:00:45 2026 -0700 varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642 Co-Authored-By: Claude Sonnet 4.6 diff --git src/hg/makeDb/trackDb/human/genomeindia.html src/hg/makeDb/trackDb/human/genomeindia.html index ad13b6c2855..d54a061ff4c 100644 --- src/hg/makeDb/trackDb/human/genomeindia.html +++ src/hg/makeDb/trackDb/human/genomeindia.html @@ -1,32 +1,32 @@

Description

The GenomeIndia -project is a national initiative coordinating academic and medical institutions +project is a national initiative that coordinates academic and medical institutions across India to characterize the genetic diversity of the Indian subcontinent. The -release used by this track comprises whole-genome sequencing of 9,768 healthy adults -sampled from 83 anthropologically defined endogamous populations spanning India's -ethnolinguistic and biogeographic spectrum (Indo-European, Dravidian, Austroasiatic, +release used by this track is whole-genome sequencing of 9,768 healthy adults +sampled from 83 anthropologically defined endogamous populations across India's +ethnolinguistic and biogeographic range (Indo-European, Dravidian, Austroasiatic, and Tibeto-Burman language families, plus a continentally admixed outgroup). After joint genotyping and quality filtering, 129,938,889 high-confidence biallelic variants (~121M SNVs and ~8M indels) were reported, of which roughly one third are absent from gnomAD, 1000 Genomes, and GenomeAsia. This track shows the alternate allele frequency in that 9,768-sample autosomal call set.

-Because Indian populations are profoundly underrepresented in global variant -databases, many globally rare alleles reach much higher frequencies in specific +Indian populations are underrepresented in global variant +databases, so many globally rare alleles are at much higher frequencies in specific endogamous groups. The release ships only the cohort-wide alternate allele frequency (no per-population breakdown), so this track shows the overall GenomeIndia AF; AC is derived from AF (see Methods).

Display Conventions

Variants are shown as a VCF dense track. Each row reports the genomic position, ref/alt alleles, the GenomeIndia alternate allele frequency, and a synthesized allele count. The track only includes autosomal variants (chr1–chr22); chrX, chrY, and chrM are not in the current release.

Data Access

@@ -40,54 +40,54 @@ target="_blank">our download server.

The original per-chromosome TSV summary statistics can be downloaded directly from the GenomeIndia Data Centre at ibdc.dbtindia.gov.in (the 9768GI_SummaryStats.tar.gz bundle). Use of the data is subject to the GenomeIndia data-access policy listed on that page.

Methods

PCR-free whole-genome sequencing libraries were prepared from blood-derived DNA and sequenced on Illumina NovaSeq 6000 to a per-sample average depth of at least 23×. Reads were processed with the Illumina DRAGEN v4.0.3 germline pipeline against -GRCh38, producing per-sample gVCFs that were then joint-genotyped with the Illumina +GRCh38. The resulting per-sample gVCFs were then joint-genotyped with the Illumina gVCF genotyper. Site-level filters retained only PASS variants with QUAL ≥ 30, posterior genotype probability ≥ 99.9%, GQ > 20 at every site (GQ > 40 for singletons and doubletons), heterozygous allele balance ≥ 0.2, call rate ≥ 98%, and Hardy–Weinberg equilibrium p > 1×10-11; sites with an inbreeding coefficient of 1 were also excluded as technical artefacts. Variants were annotated for protein impact with Ensembl VEP v113 plus LOFTEE; details are in the published methods (Bhattacharyya et al. 2025, see References).

The release was downloaded from ibdc.dbtindia.gov.in as 9768GI_SummaryStats.tar.gz, which contains 22 per-chromosome TSV files of CHROM, POS, ID, REF, ALT, AF (no header). The TSV files were converted to a single sorted, bgzipped, tabix-indexed VCF by the script genomeindiaToVcf.py. The release ships only AF; AC and AN are synthesized as AN = 2 × 9768 = 19536 and -AC = round(AF × AN). Because variants were retained only -when called in ≥98% of samples, AN slightly overstates the true called allele -count for some sites (worst case ~2%); the AC field should therefore be read as a -close approximation rather than the exact observed count. The exact processing +AC = round(AF × AN). Variants were kept only +when called in ≥98% of samples, so AN slightly overstates the true called allele +count for some sites (worst case ~2%); the AC field is a +close approximation, not the exact observed count. The processing steps are documented in the makeDoc file.

Credits

We thank the GenomeIndia consortium for making the 9,768-sample summary statistics publicly available. The track was built at UCSC by Max Haeussler.

References

Bhattacharyya C, Subramanian K, Uppili B, Biswas NK, Ramdas S, Tallapaka KB, Arvind P, Rupanagudi KV, Maitra A, Nagabandi T et al.