6bb46ba4e8d91ab3670d354ef18d8bf5321ec9ee max Thu Mar 12 07:15:29 2026 -0700 Add WebSTR short tandem repeat track under new strVar supertrack New track with 1.7M STR loci from WebSTR EnsembleTR panel (hg38), with allele frequency data for 5 populations from 1000 Genomes (3,550 samples). Includes conversion script, .as schema, trackDb, and full HTML documentation, refs #36652 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> diff --git src/hg/makeDb/doc/hg38/webstr.txt src/hg/makeDb/doc/hg38/webstr.txt new file mode 100644 index 00000000000..796f3fdc157 --- /dev/null +++ src/hg/makeDb/doc/hg38/webstr.txt @@ -0,0 +1,40 @@ +# WebSTR Short Tandem Repeat track (part of strVar supertrack) +# 2026-03-12 (max) + +# Data provided by Melissa Gymrek lab (UC San Diego) via WebSTR +# https://webstr.ucsd.edu/ +# Paper: Sachenkova Lundstrom et al. J Mol Biol 2023, PMID 37678708 +# EnsembleTR panel, hg38 coordinates, 1,710,833 STR loci +# Allele frequency data from 1000 Genomes (AFR, AMR, EAS, EUR, SAS cohorts) +# 3,550 individuals total + +# Source files in WebSTRDataDumpForMax/: +# hg38_repeats_withlinks.csv.gz - repeat loci with coordinates and metadata +# hg38_afreqs.csv.gz - allele frequency distributions per repeat per cohort +# Note: afreqs file has typo "repeadid" in header (should be "repeatid") +# Note: source coordinates are 1-based; script converts to 0-based BED + +mkdir -p /hive/data/genomes/hg38/bed/str/webstr +cd /hive/data/genomes/hg38/bed/str/webstr + +# Convert CSV data to BED9+ format with allele frequency fields +# Colors items by motif period, encodes per-population allele freqs as extra fields +python3 ~/kent/src/hg/makeDb/scripts/webstr/webstrToBed.py WebSTRDataDumpForMax > webstr.bed + +# Sort and convert to bigBed +bedSort webstr.bed webstr.bed +bedToBigBed webstr.bed /hive/data/genomes/hg38/chrom.sizes webstr.bb \ + -type=bed9+ -tab -as=$HOME/kent/src/hg/makeDb/scripts/webstr/webstr.as + +# Symlink into /gbdb +mkdir -p /gbdb/hg38/webstr +ln -sf /hive/data/genomes/hg38/bed/str/webstr/webstr.bb /gbdb/hg38/webstr/webstr.bb + +# trackDb: webstr track is inside the strVar supertrack +# trackDb entry: ~/kent/src/hg/makeDb/trackDb/human/hg38/webstr.ra +# HTML docs: ~/kent/src/hg/makeDb/trackDb/human/hg38/webstr.html (full) +# ~/kent/src/hg/makeDb/trackDb/human/hg38/strVar.html (supertrack summary) + +# Load trackDb +cd ~/kent/src/hg/makeDb/trackDb +make DBS=hg38