aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/allofus.html src/hg/makeDb/trackDb/human/allofus.html new file mode 100644 index 00000000000..5423ddf05c6 --- /dev/null +++ src/hg/makeDb/trackDb/human/allofus.html @@ -0,0 +1,63 @@ +

Description

+

+The All of Us Research Program is a +large-scale biomedical research initiative launched by the U.S. National Institutes of Health (NIH) +in 2018. Its goal is to build one of the most diverse health databases, enrolling over one +million participants who reflect the full diversity of the United States, including groups that +have been historically underrepresented in biomedical research. Participants contribute health +surveys, electronic health records (EHR), physical measurements, and biosamples for genomic +analysis. +

+ +

+This track shows allele frequencies from the v7 short-read whole-genome sequencing (srWGS) +release of 245,388 participants. A minimum allele count filter of ≥20 was applied. +Frequencies are provided both overall and broken down by genetic ancestry using local ancestry +inference: European (EUR), East Asian (EAS), African (AFR), Indigenous American (AMR), +Oceanian (OCE), and South Asian (SAS). Some variants are flagged with an "NW" tag +(not in window) when the variant was not within a genomic window covered by the ancestry +reference files; in these cases the closest available position was used for ancestry assignment. +

+ +

Data Access

+

+The data can be explored interactively with the +Table Browser or the +Data Integrator. +For programmatic access, our REST API can be used; the +track name is allofus. +For bulk download, the VCF file can be obtained from +our download server. +

+

+Variant data and individual-level data are accessible through the +All of Us Researcher Workbench, +which requires registration and completion of a training program. Aggregate allele frequency +data is freely available. +

+ +

Methods

+

+Whole-genome sequencing was performed on the Illumina NovaSeq 6000 platform with PCR-free library +preparation targeting 30x coverage. Reads were aligned to GRCh38 and variants were called using +the Illumina DRAGEN (Dynamic Read Analysis for GENomics) pipeline, which performs mapping, +alignment, sorting, duplicate marking, and variant calling (SNVs and indels) in a single +hardware-accelerated workflow. Joint genotyping was performed across all samples. Quality control +included sample-level filtering for contamination, sex discordance, and relatedness, and +variant-level filtering using VQSR. +Population-specific allele frequencies were determined using local ancestry inference at UCSC by the Ioannidis group. +The ancestry breakdown into European, East Asian, African, Indigenous American, Oceanian, +and South Asian components is part of a pending publication. +

+

+At UCSC, we provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +

+ +

Credits

+

+The All of Us Research Program is supported by the National Institutes of Health. We thank the +participants and the program for making frequency data available. +The local ancestry inference was performed by Qudsi Aljabiri and Cole Shanks under +Prof. Alexander Ioannidis, UC Santa Cruz. +