f43f1239645183b88a00b3de83f74fd5553e6ce1 max Thu Mar 12 07:52:35 2026 -0700 Add gnomAD STR genotype track under gnomadVariants supertrack 87 disease-associated STR loci from gnomAD v3.1.3, aggregated from ~1.4M individual genotypes (18,511 WGS samples, ExpansionHunter v5). Includes allele frequency distributions and population breakdowns. Added relatedTracks links to strVar supertrack, refs #35420, refs #36652 Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/hg38/gnomadStr.html src/hg/makeDb/trackDb/human/hg38/gnomadStr.html new file mode 100644 index 00000000000..574b1d5548b --- /dev/null +++ src/hg/makeDb/trackDb/human/hg38/gnomadStr.html @@ -0,0 +1,125 @@ +

Description

+

+The gnomAD STR track displays short tandem repeat (STR) genotypes at 87 +disease-associated loci from the +Genome Aggregation +Database (gnomAD) v3.1.3. The data include individual-level STR genotypes from +18,511 whole-genome sequenced samples across 10 populations, aggregated +into per-locus allele frequency distributions.

+ +

+These loci were selected because tandem repeat expansions at these sites have been +reported to cause human genetic diseases, including Huntington disease (HTT), +fragile X syndrome (FMR1), Friedreich ataxia (FXN), various +spinocerebellar ataxias, myotonic dystrophies, and other neurological and +neuromuscular disorders. Most loci (56) have motifs between 3–6 bp, while +additional loci have longer motifs of 10–24 bp.

+ +

+The genotypes were generated using +ExpansionHunter +v5 on gnomAD v3.1 whole-genome sequencing data (150 bp read lengths). Of the +samples, 64% were PCR-free, 13% PCR-plus, and 23% had unknown PCR protocol. +ExpansionHunter was selected because it had the best accuracy among existing tools +for detecting expansions at disease-associated loci. Results were generated without +off-target regions to minimize overestimation of repeat sizes. +For each locus, the data show the distribution of repeat allele sizes observed +across the gnomAD population, providing a reference for normal and expanded allele +ranges. For more details on the methods, see the +gnomAD blog post on STR calls.

+ +

Display Conventions

+

+Items are colored by the length of the repeat motif:

+ + +

+Each item is labeled by the gene name. Hovering shows the repeat motif, +gene, total sample count, and number passing quality filters. Clicking an item +links to the corresponding gnomAD STR locus page with interactive allele +frequency histograms and detailed population breakdowns.

+ +

+The detail page for each locus shows:

+ + +

Methods

+

+The gnomAD STR genotype data file +(gnomAD_STR_genotypes__2025_03_17.tsv.gz) was downloaded from the +gnomAD downloads page. This file contains individual-level +STR genotypes at 87 disease-associated loci generated using +ExpansionHunter +on gnomAD v3.1.3 whole-genome sequencing data.

+ +

+For the UCSC Genome Browser track, the individual genotype records (~1.4 million rows) +were aggregated per locus to produce summary statistics: total sample count, +PASS-filter count, allele size frequency distributions, and per-population sample counts. +Coordinates were used as provided (0-based). Some loci include genotypes for multiple +motif patterns (e.g., complex repeat structures) and for adjacent repeats; these are +represented as separate records.

+ +

+The 10 populations represented are: African/African American (afr), +Admixed American/Latino (amr), Amish (ami), Ashkenazi Jewish (asj), +East Asian (eas), Finnish (fin), Middle Eastern (mid), Non-Finnish European (nfe), +South Asian (sas), and Other (oth).

+ +

Data Access

+

+The raw data can be explored interactively with the +Table Browser or the +Data Integrator. For automated +analysis, the data may be queried from our +REST API. The underlying bigBed +file can be downloaded from our +download +server.

+ +

+The complete gnomAD STR dataset, including individual-level genotypes, is available +from the gnomAD downloads page. Interactive locus-level views with +allele frequency histograms are available at the +gnomAD STR browser.

+ +

Credits

+

+Thanks to the gnomAD +production team at the Broad Institute for generating and distributing this data.

+ +

References

+

+Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, +Watts NA, Vittal C, Gauthier LD et al. + +A genome-wide mutational constraint map quantified from variation in 76,156 human +genomes. +Nature. 2024;625:92–100. +

+ +

+Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, +Emez D, Menten B, Narzisi G, Mohiyuddin M et al. + +ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem +repeat regions. +Bioinformatics. 2019;35(22):4754–4756. +