d9568e415754516b14bff72b0daf67a0f7f69999 braney Tue May 12 16:53:34 2026 -0700 quickLiftBench: add nSweep + posSweep orchestrators and paper Table 1, refs #37445 nSweep.py rebuilds testHub at each (N, BW_STEP) point and runs the bench, tagging every row with N + bw_step so the merged sweep.tsv plots "quickLift overhead vs feature count". buildTestHub.sh now also takes a FEATURE_W env var so the BED12 block model scales down at high N (auto-picked by nSweep so requested N fits the region without overlap). posSweep.py mirrors the orchestrator shape but varies the viewed window on a fixed hub; built-in canonical positions cover 0..5000 in-window features against the default testHub. paper_table1.md collects the headline Mode C cells from the 2026-05-12 N + position sweeps: bigBed quickLift ratio scales 4.3x -> 10.3x with feature count, bigWig stays flat ~5x; sparse windows show near-zero quickLift overhead. diff --git src/utils/qa/quickLiftBench/README.md src/utils/qa/quickLiftBench/README.md index dd0eab6c151..1027c91bea1 100644 --- src/utils/qa/quickLiftBench/README.md +++ src/utils/qa/quickLiftBench/README.md @@ -143,30 +143,108 @@ 2. Add a stanza to `cases.yaml` using string variants of the form `user/sessionName`. **Hub variants** (Mode C, same assembly + same position): 1. Build (or pick) a hub where two trackDb stanzas reference the same conceptual data, one with `quickLiftUrl` and one without. The included `testHub/buildTestHub.sh` is a working example: it generates 5000 synthetic BED12 features on hg38, lifts them to hs1, copies the hg38→hs1 quickLift chain in alongside, and writes a 2-stanza hub.txt. 2. Add a stanza to `cases.yaml` using mapping variants (see schema above). Either way, smoke-test with `--cases --iterations 1 --warmup 0 -v` to verify the URL works and timings parse out. +## Density sweep (`nSweep.py`) + +`nSweep.py` rebuilds `testHub/` at a sequence of (N, BW_STEP) sizes and runs +the bench at each one, tagging every row with `N` and `bw_step`. The merged +`sweep.tsv` is the raw data behind the paper's "quickLift overhead vs. +density" curve. + +``` +./nSweep.py [--n-values 500,1000,5000,10000,20000] + [--bw-step-values 1000] + [--cases mode_b_bb,mode_b_bw,mode_c_hs1_bb,mode_c_hs1_bw] + [--iterations 10] [--warmup 1] + [--region-start 15000000] [--region-end 50000000] + [--feature-w AUTO] + [--hub-dest-base ~/public_html/quickLiftBench/sweep] + [--out DIR] [--clean-builds] [--skip-existing] [--dry-run] +``` + +Per (N, BW_STEP) point the script: + +1. Auto-picks `FEATURE_W` so N features fit in the region without overlap + (clamped to [50, 5000]; override with `--feature-w`). Runs + `testHub/buildTestHub.sh` with `N`, `BW_STEP`, `FEATURE_W`, and the + region bounds into `/N{N}_S{S}_W{W}/`. +2. Loads `cases.yaml`, filters to `--cases`, rewrites each hub variant's + `hubUrl` from `.../testHub/...` to `.../sweep/N{N}_S{S}_W{W}/...`, and + drops the rewritten config into the output dir. +3. Invokes `quickLiftBench.py` with that config; per-point outputs land in + `/N{N}_S{S}_W{W}/`. +4. Appends `results.tsv` to a single `sweep.tsv` with `N` and `bw_step` + prepended to each row. + +After all points run, `sweep_summary.tsv` has two sections: + +- Per (N, bw_step, case, variant): n_ok, total/load/draw median + p90. +- Per (N, bw_step, case): native vs. lifted total medians and the + `lifted/native` ratio — the headline curve. + +`--skip-existing` reuses hub dirs that already contain `hub_hs1.txt`, which +is handy when iterating on bench config without rebuilding identical hubs. + +## Position sweep (`posSweep.py`) + +`posSweep.py` keeps the hub fixed and varies the *viewed window*, so each +row measures quickLift overhead at a specific in-window feature density. +At the default testHub (N=5000 features uniformly distributed across +chr22:15M-50M at 7kb stride), the built-in canonical positions exercise: + +| name | position | in-window features at N=5000 | +| --- | --- | --- | +| `sparse` | chr22:1M-2M | 0 (1Mb outside feature region) | +| `narrow_dense` | chr22:25M-25.1M | ~14 (100kb inside region) | +| `medium` | chr22:20M-25M | ~714 (5Mb inside region) | +| `wide` | chr22:15M-50M | ~5000 (full 35Mb region) | + +``` +./posSweep.py [--positions name1:chr:start-end,name2:chr:start-end,...] + [--cases mode_b_bb,mode_b_bw,mode_c_hs1_bb,mode_c_hs1_bw] + [--config cases.yaml] + [--iterations 10] [--warmup 1] + [--out DIR] [--dry-run] +``` + +Per position the script: + +1. Loads `cases.yaml`, filters to `--cases`, and rewrites every hub + variant's `position` field with the swept position. Saved-session + variants are left as-is (the saved session's position can't be + overridden) and a warning is printed. +2. Invokes `quickLiftBench.py`; per-point outputs land in + `//`. +3. Appends `results.tsv` to a single `sweep.tsv` with `position_name` and + `position` prepended. + +`sweep_summary.tsv` mirrors the N sweep's two-section format: per +(position_name, case, variant) medians/p90, then per (position_name, case) +`lifted/native` ratio. + ## Output Two TSVs are written to `results//`: - `results.tsv` — one row per (case, variant, iteration) with http_ms, load_ms_sum, draw_ms_sum, n_tracks, total_ms, status_code, error. - `summary.tsv` — two sections: 1. per (case, variant): n, n_ok, http/load_sum/draw_sum/total median and p90. 2. per (case, compare-pair): left vs right total medians and the `right/left` ratio for each metric. - `phases.tsv` (only with `--phases`) — long-form rows of every `label: NNN millis` marker emitted by hgTracks (chromAliasSetup, trackDbLoad, parallel data fetch, image generation, cart write, etc.), one row per (case, variant, iteration, phase). A per-(case, variant, phase) median+p90 summary is appended.