d9568e415754516b14bff72b0daf67a0f7f69999
braney
  Tue May 12 16:53:34 2026 -0700
quickLiftBench: add nSweep + posSweep orchestrators and paper Table 1, refs #37445

nSweep.py rebuilds testHub at each (N, BW_STEP) point and runs the bench,
tagging every row with N + bw_step so the merged sweep.tsv plots
"quickLift overhead vs feature count". buildTestHub.sh now also takes a
FEATURE_W env var so the BED12 block model scales down at high N
(auto-picked by nSweep so requested N fits the region without overlap).

posSweep.py mirrors the orchestrator shape but varies the viewed window
on a fixed hub; built-in canonical positions cover 0..5000 in-window
features against the default testHub.

paper_table1.md collects the headline Mode C cells from the
2026-05-12 N + position sweeps: bigBed quickLift ratio scales 4.3x ->
10.3x with feature count, bigWig stays flat ~5x; sparse windows show
near-zero quickLift overhead.

diff --git src/utils/qa/quickLiftBench/README.md src/utils/qa/quickLiftBench/README.md
index dd0eab6c151..1027c91bea1 100644
--- src/utils/qa/quickLiftBench/README.md
+++ src/utils/qa/quickLiftBench/README.md
@@ -143,30 +143,108 @@
 2. Add a stanza to `cases.yaml` using string variants of the form
    `user/sessionName`.
 
 **Hub variants** (Mode C, same assembly + same position):
 
 1. Build (or pick) a hub where two trackDb stanzas reference the same
    conceptual data, one with `quickLiftUrl` and one without. The included
    `testHub/buildTestHub.sh` is a working example: it generates 5000
    synthetic BED12 features on hg38, lifts them to hs1, copies the
    hg38→hs1 quickLift chain in alongside, and writes a 2-stanza hub.txt.
 2. Add a stanza to `cases.yaml` using mapping variants (see schema above).
 
 Either way, smoke-test with `--cases <new_id> --iterations 1 --warmup 0 -v`
 to verify the URL works and timings parse out.
 
+## Density sweep (`nSweep.py`)
+
+`nSweep.py` rebuilds `testHub/` at a sequence of (N, BW_STEP) sizes and runs
+the bench at each one, tagging every row with `N` and `bw_step`. The merged
+`sweep.tsv` is the raw data behind the paper's "quickLift overhead vs.
+density" curve.
+
+```
+./nSweep.py [--n-values 500,1000,5000,10000,20000]
+            [--bw-step-values 1000]
+            [--cases mode_b_bb,mode_b_bw,mode_c_hs1_bb,mode_c_hs1_bw]
+            [--iterations 10] [--warmup 1]
+            [--region-start 15000000] [--region-end 50000000]
+            [--feature-w AUTO]
+            [--hub-dest-base ~/public_html/quickLiftBench/sweep]
+            [--out DIR] [--clean-builds] [--skip-existing] [--dry-run]
+```
+
+Per (N, BW_STEP) point the script:
+
+1. Auto-picks `FEATURE_W` so N features fit in the region without overlap
+   (clamped to [50, 5000]; override with `--feature-w`). Runs
+   `testHub/buildTestHub.sh` with `N`, `BW_STEP`, `FEATURE_W`, and the
+   region bounds into `<hub-dest-base>/N{N}_S{S}_W{W}/`.
+2. Loads `cases.yaml`, filters to `--cases`, rewrites each hub variant's
+   `hubUrl` from `.../testHub/...` to `.../sweep/N{N}_S{S}_W{W}/...`, and
+   drops the rewritten config into the output dir.
+3. Invokes `quickLiftBench.py` with that config; per-point outputs land in
+   `<out>/N{N}_S{S}_W{W}/`.
+4. Appends `results.tsv` to a single `sweep.tsv` with `N` and `bw_step`
+   prepended to each row.
+
+After all points run, `sweep_summary.tsv` has two sections:
+
+- Per (N, bw_step, case, variant): n_ok, total/load/draw median + p90.
+- Per (N, bw_step, case): native vs. lifted total medians and the
+  `lifted/native` ratio — the headline curve.
+
+`--skip-existing` reuses hub dirs that already contain `hub_hs1.txt`, which
+is handy when iterating on bench config without rebuilding identical hubs.
+
+## Position sweep (`posSweep.py`)
+
+`posSweep.py` keeps the hub fixed and varies the *viewed window*, so each
+row measures quickLift overhead at a specific in-window feature density.
+At the default testHub (N=5000 features uniformly distributed across
+chr22:15M-50M at 7kb stride), the built-in canonical positions exercise:
+
+| name | position | in-window features at N=5000 |
+| --- | --- | --- |
+| `sparse` | chr22:1M-2M | 0 (1Mb outside feature region) |
+| `narrow_dense` | chr22:25M-25.1M | ~14 (100kb inside region) |
+| `medium` | chr22:20M-25M | ~714 (5Mb inside region) |
+| `wide` | chr22:15M-50M | ~5000 (full 35Mb region) |
+
+```
+./posSweep.py [--positions name1:chr:start-end,name2:chr:start-end,...]
+              [--cases mode_b_bb,mode_b_bw,mode_c_hs1_bb,mode_c_hs1_bw]
+              [--config cases.yaml]
+              [--iterations 10] [--warmup 1]
+              [--out DIR] [--dry-run]
+```
+
+Per position the script:
+
+1. Loads `cases.yaml`, filters to `--cases`, and rewrites every hub
+   variant's `position` field with the swept position. Saved-session
+   variants are left as-is (the saved session's position can't be
+   overridden) and a warning is printed.
+2. Invokes `quickLiftBench.py`; per-point outputs land in
+   `<out>/<position_name>/`.
+3. Appends `results.tsv` to a single `sweep.tsv` with `position_name` and
+   `position` prepended.
+
+`sweep_summary.tsv` mirrors the N sweep's two-section format: per
+(position_name, case, variant) medians/p90, then per (position_name, case)
+`lifted/native` ratio.
+
 ## Output
 
 Two TSVs are written to `results/<YYYYMMDD-HHMMSS>/`:
 
 - `results.tsv` — one row per (case, variant, iteration) with
   http_ms, load_ms_sum, draw_ms_sum, n_tracks, total_ms, status_code, error.
 - `summary.tsv` — two sections:
   1. per (case, variant): n, n_ok, http/load_sum/draw_sum/total median and p90.
   2. per (case, compare-pair): left vs right total medians and the
      `right/left` ratio for each metric.
 - `phases.tsv` (only with `--phases`) — long-form rows of every
   `<span class='timing'>label: NNN millis</span>` marker emitted by
   hgTracks (chromAliasSetup, trackDbLoad, parallel data fetch, image
   generation, cart write, etc.), one row per (case, variant, iteration,
   phase). A per-(case, variant, phase) median+p90 summary is appended.