6bb46ba4e8d91ab3670d354ef18d8bf5321ec9ee max Thu Mar 12 07:15:29 2026 -0700 Add WebSTR short tandem repeat track under new strVar supertrack New track with 1.7M STR loci from WebSTR EnsembleTR panel (hg38), with allele frequency data for 5 populations from 1000 Genomes (3,550 samples). Includes conversion script, .as schema, trackDb, and full HTML documentation, refs #36652 Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/hg38/webstr.html src/hg/makeDb/trackDb/human/hg38/webstr.html new file mode 100644 index 00000000000..35d8cae51b7 --- /dev/null +++ src/hg/makeDb/trackDb/human/hg38/webstr.html @@ -0,0 +1,115 @@ +

Description

+

+The WebSTR track displays 1,710,833 short tandem repeat (STR) loci across the +human genome from the +WebSTR database. STRs (also known +as microsatellites) are consecutive repetitions of 1–6 nucleotide motifs that are +highly polymorphic due to repeat unit insertions and deletions caused primarily by +polymerase slippage during replication. Genetic variation at STRs has been shown to +influence gene expression, cancer risk, and neurodevelopmental traits.

+ +

+This track is based on the EnsembleTR panel for the GRCh38/hg38 assembly, +which represents a combined set of tandem repeats genotyped by four separate methods +(HipSTR, GangSTR, ExpansionHunter, and AdVNTR) on data from the +1000 Genomes Project +and H3Africa. +EnsembleTR +was applied to jointly genotype all 3,550 samples, producing consensus calls at +over 1.7 million autosomal tandem repeat loci.

+ +

+The track includes allele frequency distributions for five 1000 Genomes continental +populations:

+ + +

+For each population, allele frequencies are defined as the number of copies of each allele +divided by the total number of alleles in that population. Alleles are represented as +the number of repeat unit copies.

+ +

Display Conventions

+

+Items are colored by the length of the repeat motif (period):

+ + +

+Each item is labeled by its WebSTR repeat ID. Hovering over an item shows the repeat +motif, number of reference copies, and motif period. Clicking an item links to the +corresponding +WebSTR locus page, which provides +interactive allele frequency histograms and additional annotations.

+ +

Methods

+

+The EnsembleTR reference panel was constructed as follows:

+
    +
  1. Tandem repeat reference sets from four genotyping tools (HipSTR, GangSTR, +ExpansionHunter, and AdVNTR) were merged.
  2. +
  3. Each tool was run independently on 1000 Genomes and H3Africa whole-genome +sequencing data.
  4. +
  5. EnsembleTR +was used to produce joint consensus genotype calls across all four methods.
  6. +
  7. Loci called in fewer than 75% of samples were removed, yielding 1,710,833 loci.
  8. +
  9. Allele frequencies were computed per population.
  10. +
+ +

+For the UCSC Genome Browser track, the source data were converted from CSV to bigBed +format. The 1-based start coordinates from the WebSTR database were converted to 0-based +half-open coordinates for the BED format. Per-population allele frequency distributions +are stored as extra bigBed fields.

+ +

Data Access

+

+The raw data can be explored interactively with the +Table Browser or the +Data Integrator. For automated +analysis, the data may be queried from our +REST API. The underlying bigBed +file can be downloaded from our +download +server.

+ +

+The complete WebSTR dataset, including additional cohorts and data types not included in +this track, is available from the +WebSTR web portal. Programmatic +access to the full WebSTR database is available through the +WebSTR REST API.

+ +

Credits

+

+Thanks to Melissa Gymrek (UC San Diego), Oxana Sachenkova Lundström +(Stockholm University / ZHAW), and the WebSTR team for providing the data for this track.

+ +

References

+

+Sachenkova Lundström O, Adriaan Verbiest M, Xia F, Jam HZ, Zlobec I, +Anisimova M, Gymrek M. + +WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans. +J Mol Biol. 2023 Oct 15;435(20):168260. +PMID: 37678708 +

+ +

+Jam HZ, Revoir P, Gadgil R, Sun Y, Gymrek M. + +EnsembleTR: a tool for combining tandem repeat genotyping results. +Nat Biotechnol. 2024. +