383da828477aad2b3c6053880a64fdbfc2a00cd9 max Thu Mar 19 02:30:41 2026 -0700 Fix varFreqs HTML issues and trexplorer citation, from AI code review 2026-03-19, refs #36642 Fix broken $db download URLs to hg38 in 14 HTML files, correct "Japanese" to "Korean" in kova.html, fix "area" typo in schema.html, fix "Finnland" to "Finland" in varFreqs.ra, normalize GREGoR capitalization, fix grammar, quote all target=_blank attributes, capitalize GitHub consistently, and fix bioRxiv citation formatting in trexplorer.html. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/alfaVcf.html src/hg/makeDb/trackDb/human/alfaVcf.html index d4949705133..ee9d5f9ce22 100644 --- src/hg/makeDb/trackDb/human/alfaVcf.html +++ src/hg/makeDb/trackDb/human/alfaVcf.html @@ -4,41 +4,41 @@ Aggregator (ALFA) pipeline computes allele frequencies from approved, unrestricted dbGaP studies and makes them publicly available through dbSNP. Its goal is to release frequency data from over one million dbGaP subjects to aid discoveries involving common and rare variants with biological or disease relevance. The R4 release includes 408,709 subjects and allele frequencies for 15.5 million rs sites, including nearly one million ClinVar variants.

Data Access

The data can be explored interactively with the Table Browser or the Data Integrator. For programmatic access, our REST API can be used; the track name is alfaVcf. For bulk download, the VCF file can be obtained from -our download server. +our download server.

We converted the NCBI track hub to VCF format; the data is freely available. Genotype and associated individual-level data are accessible through the dbGaP authorized access request system.

Methods

The ALFA pipeline processes genotype data from approved, unrestricted dbGaP studies, including chip array, exome, and genomic sequencing data. Selected study data undergoes quality assurance and transformation to standard VCF format. Variants are converted to SPDI notation and normalized using VOCA, then aggregated, remapped, and clustered to existing dbSNP rs identifiers or assigned new ones. Sample ancestries are validated using GRAF-pop and assigned to 12 major populations. QC exclusions include variants and subjects with call rate <95%, datasets failing Ancestry Informative Markers consistency checks, and array datasets with conflicting or flipped allele orientation.

The ALFA R4 bigBed files (904M variants) were converted to VCF using a custom script, retaining the 163M variants with non-zero allele frequency (146M SNPs, 17M indels). -We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. -For some tracks, python scripts were necessary and are also available from Github. +We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from GitHub.