aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/indigenomes.html src/hg/makeDb/trackDb/human/indigenomes.html new file mode 100644 index 00000000000..2a07cf8c13a --- /dev/null +++ src/hg/makeDb/trackDb/human/indigenomes.html @@ -0,0 +1,57 @@ +

Description

+

+IndiGenomes provides +whole genome sequencing data of 1,029 healthy Indian individuals under the pilot phase of the +"IndiGen" program. Only the allele frequency is available from this project. The website +also provides SV call and Alu insertion VCFs. +

+ +

Data Access

+

+The data can be explored interactively with the +Table Browser or the +Data Integrator. +For programmatic access, our REST API can be used; the +track name is indigenomes. +For bulk download, the VCF file can be obtained from +our download server. +

+

+The original data can also be downloaded from the IndiGen website. +

+ +

Methods

+

+Genomic DNA was extracted from 5 ml of peripheral blood collected via venipuncture from +1,029 self-declared healthy Indian individuals representing diverse geographic, ethnic, and +linguistic groups, using the salting-out method. Whole-genome libraries were prepared using +the TruSeq DNA PCR-free library preparation kit (Illumina). Sequencing was performed on the +Illumina NovaSeq 6000 platform with 150×2 bp paired-end reads targeting ≥30× +mean coverage. Alignment to the GRCh38 reference genome, post-processing, and +default quality-filtered variant calling were performed end-to-end on the Illumina DRAGEN +v3.4 Bio-IT platform, which uses field-programmable gate array (FPGA) logic for +high-throughput processing. This yielded a compendium of 55,898,122 single allelic +genetic variants (SNVs and indels), of which 32.23% were unique to the Indian samples +and absent from global reference databases. Variants were annotated using ANNOVAR with +RefGene, and allele frequencies were cross-referenced against gnomAD v3, 1000 Genomes, +ExAC, ESP6500, and the Greater Middle East Variome Project. The dataset is accessible via +the IndiGenomes database +(Jain, Bhoyar, Scaria, Sivasubbu & the IndiGen Consortium, +Nucleic Acids Research 2021). +

+

+We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +

+ +

References

+

+Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M, +Jolly B et al. + +IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. +Nucleic Acids Res. 2021 Jan 8;49(D1):D1225-D1232. +PMID: 33095885; PMC: PMC7778947 +