17b7d3c37be41135afaf8e91e365e3847af96ca5
lrnassar
  Mon Jun 22 10:56:56 2026 -0700
Add TAD (topologically associating domains) track set on hg19, hg38, mm10, mm39. refs #21599

New "tads" superTrack collecting published TAD calls, alpha-gated via include tad.ra
alpha in each assembly's trackDb.ra.

hg38 (all five sources): Dixon 2012 domains, Schmitt 2016 boundaries, McArthur & Capra
2021 boundary stability, ENCODE contact domains (faceted composite over 117 biosamples),
and 3D Genome Browser 2.0 domains (faceted composite over 464 datasets).
hg19: the three sources with hg19-compatible data (Dixon, Schmitt, McArthur).
mm10/mm39 (domains only; the boundary sources have no mouse data): Dixon, ENCODE
(faceted, 16 biosamples), and 3D Genome Browser (faceted, 30 datasets); mm39 lifted
from mm10, lift noted in the long labels.

Faceted composites are organ-colored from a TAD-owned organ_colors.json symlinked into
/gbdb/<asm>/bbi/tad/. Build scripts and autoSql are version-controlled under
makeDb/scripts/tad/ and symlinked into the per-source build dirs. Provenance and fetch
for every dataset are documented in the makedocs (doc/hg38/tad.txt, doc/mm10/tad.txt,
doc/mm39/tad.txt, and the hg19 TAD section in doc/hg19.txt).

diff --git src/hg/makeDb/doc/mm10/tad.txt src/hg/makeDb/doc/mm10/tad.txt
new file mode 100644
index 00000000000..c3eb0e30c63
--- /dev/null
+++ src/hg/makeDb/doc/mm10/tad.txt
@@ -0,0 +1,105 @@
+# Mouse TADs supertrack (tads) - topologically associating domains, mm10 + mm39
+# Redmine #21599.  Mouse counterpart of human hg38 tads (domains only; the human Schmitt +
+# McArthur boundary tracks have no mouse data). Built by reusing the human pipelines + liftOver.
+# This doc covers BOTH mm10 and mm39 (the build scripts emit both); mm39 trackDb mirrors mm10.
+# Build scripts and autoSql are version-controlled at ~/kent/src/hg/makeDb/scripts/tad/ and
+# symlinked into the per-source build dirs, so the commands below run the in-tree copies.
+# Organ colors come from the TAD-owned /hive/data/outside/tad/organ_colors.json (symlinked into
+# /gbdb/<asm>/bbi/tad/organ_colors.json) - a copy of the wgEncodeReg4 map, isolated so TAD
+# additions never touch the shared file.
+
+##############################################################################
+# Dixon 2012 TAD domains (tadsDixon)  2026-06-21 (lou)
+##############################################################################
+# Source: Dixon JR et al. 2012, Nature 485:376, PMID 22495300, doi:10.1038/nature11082.
+# Mouse half of the paper (the human half is the hg38 tadsDixon track).
+#
+# DATA PROVENANCE + FETCH:
+#   - Nature Supplementary Table S3 (Domains) = supplemental file MOESM330, staged at
+#     /hive/users/lrnassar/claude/RM21599/dixon2012/41586_2012_BFnature11082_MOESM330_ESM.xls .
+#   - Mouse sheets "mESC Combined" (2,200 domains) + "cortex Combined" (1,518), plain BED, 40 kb,
+#     assembly mm9 (verified by chrom-length test). Extracted to BED4 (name = cell type):
+#     /hive/users/lrnassar/claude/RM21599/lifttest/mm9_{mesc,cortex}_domains.bed (chr1-19 + X, no chrY).
+# BUILD (liftOver mm9 -> mm10 and mm9 -> mm39; no native mm10 calls exist, so BOTH assemblies are lifted):
+cd /hive/data/outside/tad/dixon2012/build
+bash buildDixonMouse.sh
+#   For each cell type x {mm10,mm39}: set col4 = cell-type name; liftOver -bedPlus=4 -tab with
+#   /gbdb/mm9/liftOver/mm9To{Mm10,Mm39}.over.chain.gz; drop *_random/_alt; bedClip; sort;
+#   bedToBigBed -type=bed4 -tab -as=/hive/data/outside/tad/tadDomain.as ->
+#   build/{mm10,mm39}/tadsDixon{MESC,Cortex}.bb . Lift drop: mESC ~2%, cortex ~2% (reported by the script).
+# Symlink into gbdb:
+for asm in mm10 mm39; do mkdir -p /gbdb/$asm/bbi/tad
+  ln -sfn /hive/data/outside/tad/dixon2012/build/$asm/tadsDixonMESC.bb   /gbdb/$asm/bbi/tad/tadsDixonMESC.bb
+  ln -sfn /hive/data/outside/tad/dixon2012/build/$asm/tadsDixonCortex.bb /gbdb/$asm/bbi/tad/tadsDixonCortex.bb
+done
+# trackDb: tadsDixon composite in mouse/{mm10,mm39}/tad.ra (longLabel "lifted from mm9" on both);
+#   html tads.html + tadsDixon.html. Gated: include tad.ra alpha in mouse/{mm10,mm39}/trackDb.ra.
+cd ~/kent/src/hg/makeDb/trackDb && make DBS=mm10 FIND=find && make DBS=mm39 FIND=find
+
+##############################################################################
+# ENCODE contact domains - faceted composite (tadsEncode)  2026-06-21 (lou)
+##############################################################################
+# Source: ENCODE portal, mouse Hi-C contact domains (Arrowhead/Juicer). Native mm10; lifted to mm39.
+#
+# DATA PROVENANCE + FETCH (reproducible; ENCODE open):
+#   1. Manifest of mm10 contact-domain files (105, all bedpe) via the ENCODE search API:
+#        https://www.encodeproject.org/search/?type=File&output_type=contact+domains&assembly=mm10&format=json&limit=all
+#      saved /hive/data/outside/tad/encode/mouse/source/_cd_select.json (16 biosamples / 46 experiments).
+#   2. Per-file download (href from manifest):
+#        curl -L "https://www.encodeproject.org/files/<ENCFF>/@@download/<ENCFF>.bedpe.gz" -o <ENCFF>.bedpe.gz
+#      -> /hive/data/outside/tad/encode/mouse/contact_domains/ (105 files).
+#   3. Per-experiment facet metadata + perturbed flag via the ENCODE search API ->
+#      mouse/source/encode_meta_all.json + encode_perturbed.json.
+#
+# BUILD - same script as human, assembly-aware (mm10 branch; SLIM2KEY adds lymph node->Lymphoid tissue;
+# Calls value "Arrowhead (mm10)"; out-of-bounds/degenerate domains dropped via the chrom-size guard -
+# 3 T-cell biosamples have spurious chrY domains past mm10 chrY end):
+cd /hive/data/outside/tad/encode/build
+python3 buildTadsEncode.py mm10
+#   -> build/mm10/tadsEncode/<sym>.bb (16, bigBed 4+5) + tadsEncode_metadata.tsv + tadsEncode.ra
+#      (primaryKey Accession -> ENCODE portal via subtrackUrls; default-on: mouse ESC, CH12.LX, heart,
+#       left cerebral cortex).
+# mm39 = lift the finished mm10 bigBeds (-bedPlus=4 -tab carries the 5 score cols), copy TSV, transform
+# the stanza (gbdb path + "lifted from mm10" longLabels):
+bash liftEncodeMouse.sh
+# gbdb + trackDb:
+for asm in mm10 mm39; do
+  ln -sfn /hive/data/outside/tad/encode/build/$asm/tadsEncode              /gbdb/$asm/bbi/tad/tadsEncode
+  ln -sfn /hive/data/outside/tad/encode/build/$asm/tadsEncode_metadata.tsv /gbdb/$asm/bbi/tad/tadsEncode_metadata.tsv
+  cp /hive/data/outside/tad/encode/build/$asm/tadsEncode.ra ~/kent/src/hg/makeDb/trackDb/mouse/$asm/tadsEncode.ra
+done
+#   mouse/{mm10,mm39}/tad.ra: include tadsEncode.ra ; html tadsEncode.html.
+cd ~/kent/src/hg/makeDb/trackDb && make DBS=mm10 FIND=find && make DBS=mm39 FIND=find
+
+##############################################################################
+# 3D Genome Browser TAD domains - faceted composite (tads3dgb)  2026-06-21 (lou)
+##############################################################################
+# Source: 3D Genome Browser 2.0 (Yu et al. 2026, NAR 54:D48-D54, PMID 41206958). CC BY-NC.
+# Native mm10; lifted to mm39.
+#
+# DATA PROVENANCE + FETCH (download endpoint reachable from hgwdev; verified):
+#   1. Catalog: /hive/users/lrnassar/claude/RM21599/3dgenome/datasets_api.json (the full 3DGB catalog).
+#      Mouse selection: species=Mouse, assembly=mm10, dataType in (Hi-C,Micro-C) -> 53 candidate ids.
+#   2. Per-dataset download + extract the TAD bed (only 30 of the 53 actually ship a *_tad.bed; the
+#      other 23 provide only a compartment bigWig and have no TAD calls):
+#        curl -L "http://3dgenome.fsm.northwestern.edu/api/data/download?dataset_ids=<id>" -o <id>.zip
+#        # zip holds <name>.bedpe (loops) + <name>_cis_pc1.bw + <name>_tad.bed (when present)
+#      extract <name>_tad.bed -> /hive/data/outside/tad/3dgenome/mouse/tad_beds/<id>.bed (30 files).
+#      Metadata (organ, cellType, dataType, year, refNo, description) comes from datasets_api.json.
+#
+# BUILD - same script as human, assembly-aware (mm10 branch iterates the 30 mouse datasets from the
+# API; the human-only Condition/Treatment/Provenance facets are omitted; ships all subtracks off):
+cd /hive/data/outside/tad/3dgenome/build
+python3 buildTads3dgb.py mm10
+#   -> build/mm10/tads3dgb/<id>.bb (30, bed4) + tads3dgb_metadata.tsv (facets Organ/Cell_type/Assay/
+#      Year/Study) + tads3dgb.ra (primaryKey DatasetId).
+# mm39 = lift the finished mm10 bed4 bigBeds, copy TSV, transform stanza ("lifted from mm10"):
+bash liftTads3dgbMouse.sh
+# gbdb + trackDb:
+for asm in mm10 mm39; do
+  ln -sfn /hive/data/outside/tad/3dgenome/build/$asm/tads3dgb              /gbdb/$asm/bbi/tad/tads3dgb
+  ln -sfn /hive/data/outside/tad/3dgenome/build/$asm/tads3dgb_metadata.tsv /gbdb/$asm/bbi/tad/tads3dgb_metadata.tsv
+  cp /hive/data/outside/tad/3dgenome/build/$asm/tads3dgb.ra ~/kent/src/hg/makeDb/trackDb/mouse/$asm/tads3dgb.ra
+done
+#   mouse/{mm10,mm39}/tad.ra: include tads3dgb.ra ; html tads3dgb.html.
+cd ~/kent/src/hg/makeDb/trackDb && make DBS=mm10 FIND=find && make DBS=mm39 FIND=find