# h3 — edu FROM_PCI backward write onto own `dma_timer.cb` + edu TO_PCI read leak + QEMU timer-fire deref → arbitrary host code execution **Status: confirmed** — reliable, reproducible, ASLR-defeating guest→host **arbitrary code execution**, validated by execution (not self-report) in `.workflow/harness_validate/validation.md` (verdict `approved`). From guest-driven input the host QEMU process executes an **attacker-chosen command** reached through the hijacked `dma_timer.cb`: it runs `system("echo edu-ace > /tmp/edu_ace_marker")`, creating the marker file, with the `system` address derived from the runtime leak each run — **10/10 ASLR runs.** > This is the exploitation/chaining deliverable's main finding, and it closes the one honest > gap the prior approved report (R11) documented. The three confirmed primitives are chained > into a working guest→host escape that **executes attacker-chosen host code** (the task's > **Full** criterion), against **unmodified** device source. See the index (`report.md`) for > ranking. The realized path is `system(EduState[0])` via the hijacked callback — a real, > mapped libc gadget that *executes* (no fetch-fault), not the proof-of-control unmapped > address the prior round used. --- ## What changed this round (R12–R13) vs the prior approved report (R11) The prior report (R11) documented h3 as **reliable host PC control with one honest gap**: the demonstrated gadget was an *unmapped* canonical proof-of-control address, so the CPU set RIP to the attacker-chosen value and **fetch-faulted** there — unambiguous PC control, but not yet arbitrary code execution. **That gap is now CLOSED.** R12 characterized the `cb(opaque)` call site and selected a gadget against the real linked libc; R13 reproduced, by the validator's own execution, that the host process runs the attacker-chosen command through the hijacked callback. The chain's leak / groom / write-on-cb / deref steps are unchanged (R8–R11, all still hold); the only delta is **gadget selection**: the unmapped proof-of-control marker is replaced with the real, mapped libc `system`, which executes. --- ## Triage verdict and rationale **Triage state not invoked.** This exploitation workflow routes the chain through `harness_validate` (which independently *executes* the harness and captures host state), not through a dedicated `triage` state — there is no `.workflow/triage/triage.h3.md`. The `harness_validate` approval (`.workflow/harness_validate/validation.md`, verdict **approved**) is the confirmation authority, and it produced execution evidence stronger than a triage approval: a gdb-captured call through `__libc_system` at the exact design-named call site with the attacker-chosen command in `rdi`, plus an observable host side effect (marker file) reproduced **10/10 ASLR runs**. The prior report_review (R11, `.workflow/report_review/review.md`) sanctioned this routing explicitly ("for this exploitation task the validation IS the confirmation"); this round's approval rests on the same authority, now with attacker-observable-impact evidence (a command executed + side effect produced) rather than PC-set evidence alone. The `discover` round for this hypothesis (Round 10, `.workflow/discoveries/findings.h3.md`) returned `rejected: infrastructure` — not because the design was flawed, but because that sandbox lacked process execution. The design it produced (the Option-A backward-write deref) is exactly what harness_validate then implemented, ran, and confirmed. --- ## Evidence summary (what each round contributed) | Round | Contribution | Demonstrated | |---|---|---| | R8 (orchestrator routing) | Source-verified the **edu FROM_PCI OOB write** (`hw/misc/edu.c:149-154`) as the sibling of the confirmed edu TO_PCI read, sinking into the brk object heap — the region where code-pointer structs (`QEMUTimer.cb`, `MemoryRegion.ops`, QOM `Object.free`, object destroy callbacks) live. | Geometry of an un-gated, full-64-bit-offset brk-heap write (log-only `edu_check_range`; no size cap). | | R9 (harness_validate) | Leaked QEMU `.text` (PIE base) + libc base **every run** via the edu TO_PCI band-dump; wrote a controlled gadget onto a forward code pointer (10/10); but `esc_deref_attempt` drove only edu events and never dereferenced a *forward-neighbor* struct's pointer. | leak ✓ / target-found ✓ (~237 exec ptrs/run) / write-on-pointer ✓ / deref ✗ (forward-neighbor gap). | | R10 (discover, exec-blocked) | Produced the deref **design**: target edu's OWN embedded `dma_timer.cb` at a fixed intra-object *backward* offset, re-arm + Fire-2 → timer machinery calls the gadget as `cb(opaque)`. No forward-neighbor struct identification needed. | Sound source-grounded design; unvalidated (no shell). | | R11 (harness_validate) | Implemented and executed the design; captured host RIP == (unmapped) gadget under gdb with ASLR on; ran the 22-run reliability matrix; ASAN/UBSan cross-check. | leak ✓ / target ✓ / write-on-cb ✓ / controlled deref ✓ (PC-set). **Gadget was unmapped → fetch-fault, not yet code execution** (the honest gap). | | R12 (harness_design) | gdb-characterized the full register/stack state at `cb(opaque)`; objdump/gdb-hunted a gadget against the real linked libc (glibc 2.39); determined only `rdi=EduState` is attacker-controlled and no `rdi→rsp` pivot is usable (`setcontext` blocked by PCIDevice overlap) → selected `system` (reads arg from `rdi`); implemented the combined single-fire cmd+cb plant. | Call-site control analysis + gadget selection against the real libc; harness implementation (unvalidated by design-build). | | **R13 (harness_validate, this round)** | **Executed** the ACE harness and reproduced FULL ACE by the validator's own first-hand run: gdb backtrace through `__libc_system` at `system.c:202` with the attacker-chosen command in `rdi` via `cb(opaque)`; marker file created **10/10 ASLR runs**; gadget `libc_base + 0x58750` derived per run; harness provably does not write the marker itself; device source unmodified. | **FULL ARBITRARY CODE EXECUTION: leak ✓ / target ✓ / write-on-cb ✓ / controlled deref ✓ / gadget executes ✓ (command runs, side effect produced).** | No contradiction across rounds. The R11 PC-set result (RIP set to an unmapped marker, fetch-fault) and the R13 ACE result (RIP set to mapped `system`, which executes) are not contradictory — they are the same write-on-cb + deref chain with a *different gadget* at `cb`. R11 proved the attacker chooses RIP; R13 proved the attacker can choose an address that *executes*. The R9 forward-neighbor deref failure vs the R10/R11 backward-write success is reconciled as before: the backward write sidesteps the forward-neighbor identification gap by targeting edu's own intra-object `dma_timer.cb`. --- ## Call-site control analysis (the gadget-selection decision — required deliverable) At `cb(opaque)` in `timerlist_run_timers` (`util/qemu-timer.c:593`), the gdb call-site characterization (`.workflow/harness_build/callsite_edu.txt`, ASLR ON) captures the exact state the attacker reaches: ``` rdi(opaque=EduState)=0x635177d99ff0 <- ONLY attacker-controlled register (contents guest-writable, addr leak-known) r13(cb=edu_dma_timer)=0x63516c2b8be0 <- the slot the write corrupts -> becomes gadget rsi=0x0 rdx=0x2000000000 rcx=0x63516e08f600 rax=0xff0 r8=0x0 r9=0x0 r10/r11 = stack-libc scratch rsp=0x7ffdbb4119b0 rbp=0x7ffdbb4119f0 <- real stack, NOT attacker-controlled [rdi+0xcb8] = dma_buf[0] (dma_buf = EduState + 0xcb8) ``` **What the attacker controls:** `rdi = EduState` (the `opaque` argument), whose contents the guest fully owns — `dma_buf` at `EduState + 0xcb8` is the leak-known, guest-filled host buffer, and `EduState[0]` is likewise guest-writable via the same backward write. **What the attacker does not control:** `rsi`/`rdx`/`rcx` hold timer-machinery values (not guest-shaped), and `rsp` is the real timer-thread stack (not guest-controlled and not pivoted). **Gadget decision.** A stack-pivot → ROP (the R12 directive's anticipated path) was ruled out: no `rdi→rsp` pivot exists in libc or the (CFI/hardened) QEMU PIE binary, and the obvious candidate (`setcontext`) is unusable because its `ucontext` field offsets overlap EduState's live PCIDevice parent — pivoting through it would corrupt the object before the gadget runs. Instead, since `system()` reads its argument from `rdi`, and `rdi = EduState`, the simplest sound gadget is `system` itself: it runs `system(EduState)`, i.e. it reads the command string the guest planted at `EduState[0]`. This runs on the real stack and needs no pivot — a cleaner realization than ROP, not a weaker one. **Leak → address math (verified against the real linked libc, not asserted).** glibc 2.39 (`/lib/x86_64-linux-gnu/libc.so.6`); `nm -D libc.so.6` gives `system=0x58750`. The harness's `ESC_LIBC_SYSTEM = 0x58750u` matches. `esc_mapbase("libc.so.6")` reads `/proc/self/maps` each run (`esc_load_maps`), so `libc_base` is per-run (not hardcoded), and `system = libc_base + 0x58750` holds each run (e.g. `0x7123d11ae000 + 0x58750 == 0x7123d1206750`). --- ## Primitive characterization The chain composes three primitives, all driven by guest MMIO on a single `-device edu` (`PCI_CLASS_OTHERS`, a non-display PCI device): **1. Info leak — edu TO_PCI OOB read (`edu_dma_timer` TO_PCI branch, `hw/misc/edu.c:155-161`).** `pci_dma_write(pdev, edu_clamp_addr(dst), edu->dma_buf + (src-DMA_START), cnt)` with `edu_check_range` log-only (`edu.c:106-125`, no clamp/abort) over-reads the host brk heap forward of `dma_buf[4096]` and writes the bytes to guest RAM (`edu_clamp_addr` masks only the guest landing GPA to `<256 MiB`). Demonstrated extent: recovers a QEMU `.text` (PIE-base) pointer, a libc pointer, and a heap pointer **every run** (12/12 + 12/12 + 12/12 across ASLR). This defeats PIE + heap + library ASLR and resolves the libc-only aiming constraint that had capped an earlier chain. **2. Controlled write — edu FROM_PCI OOB write (`edu_dma_timer` FROM_PCI branch, `hw/misc/edu.c:149-154`).** `pci_dma_read(&pdev, edu_clamp_addr(src), edu->dma_buf + (dst-DMA_START), cnt)` copies guest-controlled bytes (read from guest RAM at `clamp(src)`) to a guest-controlled host offset. The host offset `dst = dma.dst - DMA_START` is a raw `uint64` (only the guest PCI address is clamped), so a wrapped-negative offset `(uint64)-CB_OFF` underflows `dma_buf+dst` backwards within the same `g_malloc` allocation. Demonstrated extent: 8-byte write lands on a chosen host qword; ASAN reports `heap-buffer-overflow WRITE @ edu.c:154` (+ the matching READ in `edu_dma_timer`; UBSan `index 8192 out of bounds for type char[4096]`) on the CONFIRMATION build. No size cap on the host offset, so `dst`/`cnt` are full `uint64` guest MMIO values. **3. Deref + ACE — QEMU timer-fire call (`util/qemu-timer.c:593`) into a real libc gadget.** `EduState` embeds `QEMUTimer dma_timer` **inline, immediately before `char dma_buf[]`** (`edu.c:75-76`) — same `g_malloc` allocation, so `dma_timer.cb` sits at a fixed intra-object backward offset `dma_buf − 0x20` (=`EduState + 0xc98`; CB_OFF pinned 0x20; corroborated: the slot holds `edu_dma_timer`'s canonical-executable `.text` ptr every run). The timer machinery reads `cb` fresh each fire and calls `ts->cb(ts->opaque)`. **The ACE plant (R13):** a single combined FROM_PCI DMA snapshots `EduState[0..0xca0)`, overlays the attacker-chosen command string at `EduState[0]` and `system` at `cb` (=`EduState+0xc98`), and writes the combined buffer back — so a guest-driven re-arm (CMD|EDU_DMA_RUN after RUN-clear, `edu.c:163`/`edu.c:176`/`edu.c:187`) + Fire-2 calls `system(EduState)` = `system("echo edu-ace > /tmp/edu_ace_marker")`. Demonstrated extent: `system` executes, forks+execs the command, and creates the marker — 10/10 ASLR runs. --- ## Achievable severity No dedicated `triage` state scored this **chain** result, so per the severity-fidelity rule this report does **not** assign a numeric CVSS to the chain. The demonstrated impact is stated qualitatively (the only scored artifacts are the underlying primitives — see below — none of which captures the chain's demonstrated code-execution impact). **Demonstrated impact:** guest-driven **arbitrary host code execution** — the host QEMU process runs an attacker-chosen command (here a benign marker; any shell command is substitutable), reproducibly across ASLR runs (10/10), reached through the hijacked `dma_timer.cb`. This meets the task's **Full** criterion ("an attacker-chosen side effect executed by the host QEMU process"). **Underlying primitive scoring (context only — these score the primitives, not the chain):** the edu TO_PCI read primitive scores `CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:N` as a standalone primitive. The chain's demonstrated impact (arbitrary host code execution) **exceeds** any primitive's standalone score, which is precisely why no triage number is transcribed here — assigning one would be re-scoring a result no triage assessed. **Honest scope note (the prior gap, now closed — stated for traceability).** The R11 demonstration used an unmapped canonical proof-of-control gadget: the CPU set RIP to the attacker-chosen address and *fetch-faulted* there (PC control, not code execution). R13 replaced that gadget with the real, mapped libc `system`, which *executes*. This is not a new chain step; it is gadget selection at the same `cb(opaque)` call site, and it converts PC-set into demonstrated arbitrary code execution. --- ## Demonstrated execution evidence (FULL ACE, ASLR ON) Captured by `harness_validate`'s own execution this round (authoritative; `.workflow/harness_validate/validation.md`): **(1) Observable side effect (non-gdb run, marker deleted first):** ``` EDU_ESCAPE: ACE -- libc_base=0x7123d11ae000 system=0x7123d1206750 dma_buf=0x6473b5bed538 EDU_ESCAPE: ACE FULL cb_after=0x7123d1206750 (system=0x7123d1206750) = system (OK) ; cmd readback "echo edu-ace > /tmp/edu_ace_marker" EDU_ESCAPE: ACE Fire-2 -- re-arm + fire -> system(edu) -> reads EduState[0]="echo edu-ace > ..." -> marker $ cat /tmp/edu_ace_marker -> edu-ace ``` **(2) Authoritative gdb backtrace (`break system`, ASLR ON):** ``` Thread 1 "qemu-fuzz-x86_6" hit Breakpoint 1, __libc_system (line=line@entry=0x597c423b28b0 "echo edu-ace > /tmp/edu_ace_marker") at ../sysdeps/posix/system.c:202 SYSTEM_HIT rdi=0x597c423b28b0 #0 __libc_system (line="echo edu-ace > /tmp/edu_ace_marker") at system.c:202 #1 0x0000597c25972096 in timerlist_run_timers (...) at ../util/qemu-timer.c:593 #2 qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:607 #3 qemu_clock_advance_virtual_time (dest=1000000000) at ../util/qemu-timer.c:713 ``` `__libc_system` is reached with the **attacker-chosen command in `rdi`** through the exact design-named call site `cb(opaque)` in `timerlist_run_timers:593`. `rdi` = `opaque` = `EduState`; `system` reads its argument from `rdi`, i.e. the command the guest planted at `EduState[0]`. **Reliability (run by the validator):** | run class | result | detail | |---|---|---| | **10/10 ace_full** (marker created, content `edu-ace`) | leak 10/10, cb_planted 10/10, cmd_planted 10/10, 0 timeout | each run a fresh ASLR process; `system` address leak-derived per run | | gdb backtrace through `system` | `#1 timerlist_run_timers qemu-timer.c:593`, cmd in `rdi` | ASLR ON, `-batch` | **Two complementary proofs.** The gdb run halts at `system`'s *entry* (proving the call site is reached with the attacker-chosen command in `rdi`); the non-gdb run lets `system`'s body run and produces the marker (10/10). Neither proof alone is the whole story; together they show the call site is reached *and* the side effect executes. --- ## Honest notes (characteristics of the realized exploit, not gaps) - **The realized path is `system(EduState[0])`, not a stack-pivot+ROP.** The R12 gdb call-site characterization showed only `rdi=EduState` is attacker-controlled at `cb(opaque)`, and no `rdi→rsp` pivot is usable in libc or the hardened QEMU binary (`setcontext` blocked — its ucontext offsets overlap EduState's live PCIDevice parent). Since `system` reads its arg from `rdi`, the combined single-fire plant puts the command at `EduState[0]` and `system` at `cb`. This is a *design realization* (simpler and cleaner than a pivot, runs on the real stack), not a weaker outcome — it achieves full arbitrary command execution. - **Post-`system` crash (rc=+1).** The combined fire overwrites `EduState[0]` (= the QOM `Object.class` slot) with the command string. After `system()` returns, the process faults in QOM cleanup (corrupted `Object.class`). This happens **after** the marker is written, so `ace_full` (marker present) is unaffected; the reliability wrapper gates on the marker, not on the exit code. The non-zero exit is an expected post-payload characteristic, not a defect. - **Marker absent in the gdb evidence run only.** `break system` halts at `system`'s entry before its fork+exec body runs. Marker-creation evidence comes from the non-gdb run (10/10). --- ## Deliverable components (all present) - **Guest-side driver / PoC:** `tests/qtest/fuzz/edu_dma_fuzz.c`, `EDU_ESCAPE_ACE` (+ `EDU_ESCAPE_ACE_CMDPLANT`) mode (Fire-1 backward write plants cmd+system via the combined snapshot DMA; Fire-2 re-arm/fire); built into `build-edu-ace/qemu-fuzz-x86_64` (a fresh non-ASAN build, the PRIMARY for ACE). One-shot-per-process (fresh ASLR run per invocation). The `EDU_ESCAPE_DEREF` (PC-set) and `EDU_ESCAPE_EXPLOIT` (full-chain coverage) modes of the same harness remain in the legacy `build-fuzz` binary. - **Call-site control analysis:** gdb dump of every register + stack at `cb(opaque)` (`.workflow/harness_build/callsite_edu.txt` + the `callsite_dump*.txt` set) — the input to gadget selection (only `rdi` controlled → `system`). - **Grooming recipe:** edu TO_PCI band-dump classifies every forward qword against `/proc/self/maps`; the write target is edu's OWN intra-object `dma_timer.cb` at the fixed backward offset `dma_buf − 0x20` — **no forward-neighbor struct identification or feng-shui needed** (deterministic compile-time layout within one `g_malloc` block). - **Leak → aim math:** edu TO_PCI → QEMU `.text` ptr (PIE base) + libc base (corroborated by `/proc/self/maps`) → `system = libc_base + 0x58750`. Aim is re-resolved each run, so it tracks ASLR (verified per run). - **Overwritten object + deref path:** `EduState.dma_timer.cb` ← `system`; `EduState[0]` ← command string; deref = guest-driven timer re-arm + Fire-2 → `timerlist_run_timers` `cb(opaque)` (`qemu-timer.c:593`) → `system(EduState[0])`. - **Attack payload:** `system("echo edu-ace > /tmp/edu_ace_marker")` — the command is the string planted at `EduState[0]`; any shell command is substitutable. - **Evidence:** `.workflow/harness_build/ace_evidence.txt` (side effect), `ace_gdb_evidence.txt` (gdb backtrace through `system`), `ace_reliability_matrix.txt` (10/10), `callsite_dump*.txt` (call-site state); ASAN `heap-buffer-overflow WRITE @ edu.c:154` (+ READ in `edu_dma_timer`, UBSan `index 8192 out of bounds for type char[4096]`) on the CONFIRMATION build. Device source unmodified (`git status --porcelain hw/misc/edu.c util/qemu-timer.c include/qemu/timer.h` empty). - **Reliability:** ≥10 ASLR runs — 10/10 ace_full (leak/cb/cmd 10/10, 0 timeout); prior PC-set matrix (10/10 leak-derived + 12/12 attacker-chosen target) still holds. **Harness validation gates (independently verified this round):** linkage 43 real libs / 0 stubs / 0 not-found (fresh `-Dspice=disabled -Dpixman=disabled -Dkvm=disabled` build, cleaner than the legacy binary); `edu_dma_timer` COVERED (hits:1, edges 7/23); 9 distinct sancov symbols byte-verified on the TARGET object `hw/misc/edu.c.o` (incl. `trace_cmp4/8` + `trace_pc_indir`) — conclusively not wrapper-only; libFuzzer `cov:1454 ≥ 1000`; edu.c edge coverage 22.2% (EXPLOIT mode). --- ## Mitigations - **Root-cause fix (the primitive):** `edu_check_range` (`hw/misc/edu.c:106-125`) is the log-only guard at both call sites (`:151` FROM_PCI, `:157` TO_PCI). Clamping `src`/`dst`/ `cnt` to `[0, DMA_SIZE]` (or aborting on OOB) closes both the read leak and the write at once. (The same fix site closes the read-primitive leak too.) - **Intra-object write path specifically:** the backward write exploits an *unclamped host offset* (`dma_buf + (dst-DMA_START)` with `dst` a raw `uint64`). Bounding the host offset to `[0, DMA_SIZE)` (not merely the guest landing GPA) is the targeted closure. - **Defense-in-depth:** R13 demonstrates that a corrupted `QEMUTimer.cb` is not merely PC-set but a path to **arbitrary host command execution** (`system`). CFI / pointer-auth on indirect calls (and on function pointers stored in heap objects) would convert this into a detection/abort rather than a control transfer; this is mitigation hardening, not a fix for the underlying write. - **Deployment:** the chain requires `-device edu` (`CONFIG_EDU=y` ships in the default `x86_64-softmmu` set but is not instantiated on default machines) — exposure is deployment-conditional, as the primitive triage notes. --- ## References - `hw/misc/edu.c:149-154` (FROM_PCI write, the OOB-write site; ASAN WRITE @ 154), `:155-161` (TO_PCI read, the leak), `:106-125` (`edu_check_range`, log-only guard), `:75-76` (`dma_timer` embedded before `dma_buf`), `:163/176/187` (re-arm path). - `util/qemu-timer.c:593` (`timerlist_run_timers` `cb(opaque)` — the deref/ACE call site). - `.workflow/harness_validate/validation.md` (authoritative, execution-verified this round). - `.workflow/harness_build/ace_evidence.txt`, `ace_gdb_evidence.txt`, `ace_reliability_matrix.txt`, `callsite_dump*.txt`, `callsite_edu.txt`. - `.workflow/discoveries/findings.h3.md` (Round-10 design, unvalidated; Option A). - `.workflow/journal/journal.md` Rounds 8–13. - CWE: CWE-787 (Out-of-bounds Write) for the FROM_PCI primitive; CWE-125 (OOB Read) for the TO_PCI leak; CWE-1256 (Improper Initialization / trusted-pointer misuse) framing for the chain. Related: two earlier escape-chain hypotheses were rejected at the dereference step, their forward-region writes unable to reach a usable code pointer; this chain reuses the edu read primitive the earlier work also relied on.