17b7d3c37be41135afaf8e91e365e3847af96ca5
lrnassar
  Mon Jun 22 10:56:56 2026 -0700
Add TAD (topologically associating domains) track set on hg19, hg38, mm10, mm39. refs #21599

New "tads" superTrack collecting published TAD calls, alpha-gated via include tad.ra
alpha in each assembly's trackDb.ra.

hg38 (all five sources): Dixon 2012 domains, Schmitt 2016 boundaries, McArthur & Capra
2021 boundary stability, ENCODE contact domains (faceted composite over 117 biosamples),
and 3D Genome Browser 2.0 domains (faceted composite over 464 datasets).
hg19: the three sources with hg19-compatible data (Dixon, Schmitt, McArthur).
mm10/mm39 (domains only; the boundary sources have no mouse data): Dixon, ENCODE
(faceted, 16 biosamples), and 3D Genome Browser (faceted, 30 datasets); mm39 lifted
from mm10, lift noted in the long labels.

Faceted composites are organ-colored from a TAD-owned organ_colors.json symlinked into
/gbdb/<asm>/bbi/tad/. Build scripts and autoSql are version-controlled under
makeDb/scripts/tad/ and symlinked into the per-source build dirs. Provenance and fetch
for every dataset are documented in the makedocs (doc/hg38/tad.txt, doc/mm10/tad.txt,
doc/mm39/tad.txt, and the hg19 TAD section in doc/hg19.txt).

diff --git src/hg/makeDb/trackDb/human/hg38/tads.html src/hg/makeDb/trackDb/human/hg38/tads.html
new file mode 100644
index 00000000000..f171171d054
--- /dev/null
+++ src/hg/makeDb/trackDb/human/hg38/tads.html
@@ -0,0 +1,112 @@
+<h2>Description</h2>
+<p>
+This track set displays <b>topologically associating domains (TADs)</b> and TAD
+<b>boundaries</b> in the human genome, assembled from several published Hi-C studies.
+TADs are self-interacting regions of the genome, typically hundreds of kilobases
+to about a megabase, and themselves nested, with smaller contact domains contained within
+larger top-level TADs. Their boundaries (frequently bound by CTCF and cohesin) insulate
+neighboring regions and constrain enhancer-promoter contacts. Disruption of a TAD boundary
+can rewire gene regulation and cause disease, and TADs are widely used to nominate candidate
+target genes for non-coding variants.
+</p>
+<p>The set contains five complementary sources:</p>
+<ul>
+  <li><b>Dixon 2012 TADs</b> &ndash; the original TAD <em>domains</em> in hESC and IMR90
+      cells (lifted from hg18).</li>
+  <li><b>ENCODE contact domains</b> &ndash; uniformly called TAD <em>domains</em> across 117
+      ENCODE human biosamples (hg38 only), browsable by a faceted selector (organ, biosample
+      type, assay). These are finer-resolution (5 kb) sub-TAD contact domains.</li>
+  <li><b>3D Genome Browser domains</b> &ndash; TAD <em>domains</em> across 464 human
+      Hi-C/Micro-C datasets (normal and cancer, baseline and perturbation), exactly as
+      called and published by the 3D Genome Browser, browsable by a faceted selector
+      (organ, cell type, assay, condition, treatment, year, study).</li>
+  <li><b>Schmitt 2016 boundaries</b> &ndash; TAD <em>boundaries</em> across 21 human
+      tissues and cell lines.</li>
+  <li><b>TAD boundary stability</b> &ndash; how recurrent each boundary is across 37
+      cell-type maps (McArthur &amp; Capra 2021).</li>
+</ul>
+
+<h2>How to Use These Tracks</h2>
+<p>
+The <b>domain</b> tracks (Dixon, ENCODE, 3D Genome Browser) answer &quot;are my variant
+and a candidate gene in the same TAD?&quot; and help prioritize target genes at
+non-coding GWAS loci. The <b>boundary</b> tracks (Schmitt, stability) answer &quot;does my
+structural variant disrupt an insulating boundary?&quot; and help interpret
+the regulatory impact of deletions, duplications, and inversions. Because the domain tracks
+are nested (ENCODE calls smaller sub-TAD contact domains; Dixon and the 3D Genome Browser
+call larger top-level TADs), &quot;which TAD?&quot; is answered at different scales by
+different tracks.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Each source is shown as a separate track because TAD calls are <b>not directly
+comparable across studies</b>: different algorithms (directionality index/HMM,
+insulation score, Arrowhead) and resolutions (5&ndash;100 kb) produce different calls
+of the same underlying biology. <b>Domains</b> are drawn as boxes spanning each
+self-interacting region; <b>boundaries</b> are drawn as the short bins that divide
+adjacent domains. Because calls are made on binned data, domain edges are uncertain to
+roughly the caller's bin size (from a few kilobases for the ENCODE 5 kb calls up to about
+&plusmn;50 kb for the 100 kb stability bins), and the bin width of a boundary feature
+reflects this localization precision, not a measured physical width. Domains do not
+tile the genome end to end; the gaps between domain boxes are inter-domain or unorganized
+regions, not display artifacts. The <b>ENCODE</b> and <b>3D Genome Browser</b> tracks each
+contain many biosamples and are browsable with a faceted selector on their track
+configuration pages; a small default set is shown and the rest are enabled through the
+facets.
+</p>
+
+<h2>Methods</h2>
+<p>
+See the individual subtrack description pages for full methods, source publications, and
+assembly/liftOver details for each dataset. In brief: Dixon domains were called with the
+directionality-index HMM at 40 kb; Schmitt boundaries with the insulation-score method at
+40 kb; ENCODE contact domains with Arrowhead (Juicer) on the ENCODE uniform Hi-C
+pipeline; the 3D Genome Browser domains are that resource's own per-dataset TAD calls
+(25 kb) across 464 human datasets, shown verbatim (format normalization only); and the
+boundary-stability track counts, per 100 kb window, how many of 37 re-processed cell-type
+maps share a boundary (McArthur &amp; Capra 2021).
+</p>
+
+<h2>Data Access</h2>
+<p>
+The raw data can be explored interactively with the
+<a href="hgTables" target="_blank">Table Browser</a> or the
+<a href="hgIntegrator" target="_blank">Data Integrator</a>. For programmatic access, the
+track can be accessed using the Genome Browser's
+<a href="https://genome.ucsc.edu/goldenPath/help/api.html" target="_blank">REST API</a>.
+The underlying bigBed files can be downloaded from our
+<a href="https://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/tad/" target="_blank">download server</a>.
+</p>
+
+<h2>References</h2>
+<p>
+Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B.
+Topological domains in mammalian genomes identified by analysis of chromatin
+interactions. <em>Nature</em>. 2012;485(7398):376-80.
+<a href="https://doi.org/10.1038/nature11082" target="_blank">doi:10.1038/nature11082</a>
+</p>
+<p>
+McArthur E, Capra JA. Topologically associating domain boundaries that are stable across
+diverse cell types are evolutionarily constrained and enriched for heritability.
+<em>Am J Hum Genet</em>. 2021;108(2):269-283.
+<a href="https://doi.org/10.1016/j.ajhg.2021.01.001" target="_blank">doi:10.1016/j.ajhg.2021.01.001</a>
+</p>
+<p>
+Rao SS, Huntley MH, Durand NC, Stamenova EK, <em>et al.</em>
+A 3D map of the human genome at kilobase resolution reveals principles of chromatin
+looping. <em>Cell</em>. 2014;159(7):1665-80.
+<a href="https://doi.org/10.1016/j.cell.2014.11.021" target="_blank">doi:10.1016/j.cell.2014.11.021</a>
+</p>
+<p>
+Schmitt AD, Hu M, Jung I, Xu Z, <em>et al.</em>
+A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human
+Genome. <em>Cell Rep</em>. 2016;17(8):2042-2059.
+<a href="https://doi.org/10.1016/j.celrep.2016.10.061" target="_blank">doi:10.1016/j.celrep.2016.10.061</a>
+</p>
+<p>
+Yu S, Fu Y, Wong JH, Wang J, Zhao H, Zhao J, Yue F.
+The 3D Genome Browser 2.0: an enhanced online platform for visualizing and analyzing 3D
+genome architecture. <em>Nucleic Acids Res</em>. 2026;54(D1):D48-D54.
+<a href="https://doi.org/10.1093/nar/gkaf1109" target="_blank">doi:10.1093/nar/gkaf1109</a>
+</p>