aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/alfaVcf.html src/hg/makeDb/trackDb/human/alfaVcf.html new file mode 100644 index 00000000000..d4949705133 --- /dev/null +++ src/hg/makeDb/trackDb/human/alfaVcf.html @@ -0,0 +1,44 @@ +

Description

+

+The NCBI ALlele Frequency +Aggregator (ALFA) pipeline computes allele frequencies from approved, unrestricted dbGaP studies +and makes them publicly available through dbSNP. Its goal is to release frequency data from over +one million dbGaP subjects to aid discoveries involving common and rare variants with biological +or disease relevance. The R4 release includes 408,709 subjects and allele frequencies for +15.5 million rs sites, including nearly one million ClinVar variants. +

+ +

Data Access

+

+The data can be explored interactively with the +Table Browser or the +Data Integrator. +For programmatic access, our REST API can be used; the +track name is alfaVcf. +For bulk download, the VCF file can be obtained from +our download server. +

+

+We converted the NCBI track hub to VCF format; the data is freely available. +Genotype and associated individual-level data are accessible through the dbGaP +authorized access request system. +

+ +

Methods

+

+The ALFA pipeline processes genotype data from approved, unrestricted dbGaP studies, including +chip array, exome, and genomic sequencing data. Selected study data undergoes quality assurance +and transformation to standard VCF format. Variants are converted to SPDI notation and normalized +using VOCA, then aggregated, remapped, and clustered to existing dbSNP rs identifiers or assigned +new ones. Sample ancestries are validated using GRAF-pop and assigned to 12 major populations. +QC exclusions include variants and subjects with call rate <95%, datasets failing Ancestry +Informative Markers consistency checks, and array datasets with conflicting or flipped allele +orientation. +

+

+The ALFA R4 bigBed files (904M variants) were converted to VCF using a custom script, retaining +the 163M variants with non-zero allele frequency (146M SNPs, 17M indels). +We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +