# h3 — edu FROM_PCI backward write onto own `dma_timer.cb` + edu TO_PCI read leak + QEMU timer-fire deref → arbitrary host code execution

**Status: confirmed** — reliable, reproducible, ASLR-defeating guest→host **arbitrary code
execution**, validated by execution (not self-report) in
`.workflow/harness_validate/validation.md` (verdict `approved`). From guest-driven input the
host QEMU process executes an **attacker-chosen command** reached through the hijacked
`dma_timer.cb`: it runs `system("echo edu-ace > /tmp/edu_ace_marker")`, creating the marker
file, with the `system` address derived from the runtime leak each run — **10/10 ASLR runs.**

> This is the exploitation/chaining deliverable's main finding, and it closes the one honest
> gap the prior approved report (R11) documented. The three confirmed primitives are chained
> into a working guest→host escape that **executes attacker-chosen host code** (the task's
> **Full** criterion), against **unmodified** device source. See the index (`report.md`) for
> ranking. The realized path is `system(EduState[0])` via the hijacked callback — a real,
> mapped libc gadget that *executes* (no fetch-fault), not the proof-of-control unmapped
> address the prior round used.

---

## What changed this round (R12–R13) vs the prior approved report (R11)

The prior report (R11) documented h3 as **reliable host PC control with one honest gap**:
the demonstrated gadget was an *unmapped* canonical proof-of-control address, so the CPU set
RIP to the attacker-chosen value and **fetch-faulted** there — unambiguous PC control, but not
yet arbitrary code execution. **That gap is now CLOSED.** R12 characterized the `cb(opaque)`
call site and selected a gadget against the real linked libc; R13 reproduced, by the
validator's own execution, that the host process runs the attacker-chosen command through the
hijacked callback. The chain's leak / groom / write-on-cb / deref steps are unchanged (R8–R11,
all still hold); the only delta is **gadget selection**: the unmapped proof-of-control marker
is replaced with the real, mapped libc `system`, which executes.

---

## Triage verdict and rationale

**Triage state not invoked.** This exploitation workflow routes the chain through
`harness_validate` (which independently *executes* the harness and captures host state),
not through a dedicated `triage` state — there is no `.workflow/triage/triage.h3.md`. The
`harness_validate` approval (`.workflow/harness_validate/validation.md`, verdict
**approved**) is the confirmation authority, and it produced execution evidence stronger
than a triage approval: a gdb-captured call through `__libc_system` at the exact
design-named call site with the attacker-chosen command in `rdi`, plus an observable host
side effect (marker file) reproduced **10/10 ASLR runs**. The prior report_review (R11,
`.workflow/report_review/review.md`) sanctioned this routing explicitly ("for this
exploitation task the validation IS the confirmation"); this round's approval rests on the
same authority, now with attacker-observable-impact evidence (a command executed + side
effect produced) rather than PC-set evidence alone.

The `discover` round for this hypothesis (Round 10, `.workflow/discoveries/findings.h3.md`)
returned `rejected: infrastructure` — not because the design was flawed, but because that
sandbox lacked process execution. The design it produced (the Option-A backward-write deref)
is exactly what harness_validate then implemented, ran, and confirmed.

---

## Evidence summary (what each round contributed)

| Round | Contribution | Demonstrated |
|---|---|---|
| R8 (orchestrator routing) | Source-verified the **edu FROM_PCI OOB write** (`hw/misc/edu.c:149-154`) as the sibling of the confirmed edu TO_PCI read, sinking into the brk object heap — the region where code-pointer structs (`QEMUTimer.cb`, `MemoryRegion.ops`, QOM `Object.free`, object destroy callbacks) live. | Geometry of an un-gated, full-64-bit-offset brk-heap write (log-only `edu_check_range`; no size cap). |
| R9 (harness_validate) | Leaked QEMU `.text` (PIE base) + libc base **every run** via the edu TO_PCI band-dump; wrote a controlled gadget onto a forward code pointer (10/10); but `esc_deref_attempt` drove only edu events and never dereferenced a *forward-neighbor* struct's pointer. | leak ✓ / target-found ✓ (~237 exec ptrs/run) / write-on-pointer ✓ / deref ✗ (forward-neighbor gap). |
| R10 (discover, exec-blocked) | Produced the deref **design**: target edu's OWN embedded `dma_timer.cb` at a fixed intra-object *backward* offset, re-arm + Fire-2 → timer machinery calls the gadget as `cb(opaque)`. No forward-neighbor struct identification needed. | Sound source-grounded design; unvalidated (no shell). |
| R11 (harness_validate) | Implemented and executed the design; captured host RIP == (unmapped) gadget under gdb with ASLR on; ran the 22-run reliability matrix; ASAN/UBSan cross-check. | leak ✓ / target ✓ / write-on-cb ✓ / controlled deref ✓ (PC-set). **Gadget was unmapped → fetch-fault, not yet code execution** (the honest gap). |
| R12 (harness_design) | gdb-characterized the full register/stack state at `cb(opaque)`; objdump/gdb-hunted a gadget against the real linked libc (glibc 2.39); determined only `rdi=EduState` is attacker-controlled and no `rdi→rsp` pivot is usable (`setcontext` blocked by PCIDevice overlap) → selected `system` (reads arg from `rdi`); implemented the combined single-fire cmd+cb plant. | Call-site control analysis + gadget selection against the real libc; harness implementation (unvalidated by design-build). |
| **R13 (harness_validate, this round)** | **Executed** the ACE harness and reproduced FULL ACE by the validator's own first-hand run: gdb backtrace through `__libc_system` at `system.c:202` with the attacker-chosen command in `rdi` via `cb(opaque)`; marker file created **10/10 ASLR runs**; gadget `libc_base + 0x58750` derived per run; harness provably does not write the marker itself; device source unmodified. | **FULL ARBITRARY CODE EXECUTION: leak ✓ / target ✓ / write-on-cb ✓ / controlled deref ✓ / gadget executes ✓ (command runs, side effect produced).** |

No contradiction across rounds. The R11 PC-set result (RIP set to an unmapped marker, fetch-fault) and the R13 ACE result (RIP set to mapped `system`, which executes) are not contradictory — they are the same write-on-cb + deref chain with a *different gadget* at `cb`. R11 proved the attacker chooses RIP; R13 proved the attacker can choose an address that *executes*. The R9 forward-neighbor deref failure vs the R10/R11 backward-write success is reconciled as before: the backward write sidesteps the forward-neighbor identification gap by targeting edu's own intra-object `dma_timer.cb`.

---

## Call-site control analysis (the gadget-selection decision — required deliverable)

At `cb(opaque)` in `timerlist_run_timers` (`util/qemu-timer.c:593`), the gdb call-site
characterization (`.workflow/harness_build/callsite_edu.txt`, ASLR ON) captures the exact
state the attacker reaches:

```
rdi(opaque=EduState)=0x635177d99ff0   <- ONLY attacker-controlled register (contents guest-writable, addr leak-known)
r13(cb=edu_dma_timer)=0x63516c2b8be0  <- the slot the write corrupts -> becomes gadget
rsi=0x0  rdx=0x2000000000  rcx=0x63516e08f600  rax=0xff0
r8=0x0   r9=0x0           r10/r11 = stack-libc scratch
rsp=0x7ffdbb4119b0  rbp=0x7ffdbb4119f0   <- real stack, NOT attacker-controlled
[rdi+0xcb8] = dma_buf[0]   (dma_buf = EduState + 0xcb8)
```

**What the attacker controls:** `rdi = EduState` (the `opaque` argument), whose contents the
guest fully owns — `dma_buf` at `EduState + 0xcb8` is the leak-known, guest-filled host
buffer, and `EduState[0]` is likewise guest-writable via the same backward write. **What the
attacker does not control:** `rsi`/`rdx`/`rcx` hold timer-machinery values (not guest-shaped),
and `rsp` is the real timer-thread stack (not guest-controlled and not pivoted).

**Gadget decision.** A stack-pivot → ROP (the R12 directive's anticipated path) was ruled out:
no `rdi→rsp` pivot exists in libc or the (CFI/hardened) QEMU PIE binary, and the obvious
candidate (`setcontext`) is unusable because its `ucontext` field offsets overlap EduState's
live PCIDevice parent — pivoting through it would corrupt the object before the gadget runs.
Instead, since `system()` reads its argument from `rdi`, and `rdi = EduState`, the simplest
sound gadget is `system` itself: it runs `system(EduState)`, i.e. it reads the command string
the guest planted at `EduState[0]`. This runs on the real stack and needs no pivot — a cleaner
realization than ROP, not a weaker one.

**Leak → address math (verified against the real linked libc, not asserted).** glibc 2.39
(`/lib/x86_64-linux-gnu/libc.so.6`); `nm -D libc.so.6` gives `system=0x58750`. The harness's
`ESC_LIBC_SYSTEM = 0x58750u` matches. `esc_mapbase("libc.so.6")` reads `/proc/self/maps` each
run (`esc_load_maps`), so `libc_base` is per-run (not hardcoded), and `system =
libc_base + 0x58750` holds each run (e.g. `0x7123d11ae000 + 0x58750 == 0x7123d1206750`).

---

## Primitive characterization

The chain composes three primitives, all driven by guest MMIO on a single `-device edu`
(`PCI_CLASS_OTHERS`, a non-display PCI device):

**1. Info leak — edu TO_PCI OOB read (`edu_dma_timer` TO_PCI branch, `hw/misc/edu.c:155-161`).**
`pci_dma_write(pdev, edu_clamp_addr(dst), edu->dma_buf + (src-DMA_START), cnt)` with
`edu_check_range` log-only (`edu.c:106-125`, no clamp/abort) over-reads the host brk heap
forward of `dma_buf[4096]` and writes the bytes to guest RAM (`edu_clamp_addr` masks only
the guest landing GPA to `<256 MiB`). Demonstrated extent: recovers a QEMU `.text`
(PIE-base) pointer, a libc pointer, and a heap pointer **every run** (12/12 + 12/12 + 12/12
across ASLR). This defeats PIE + heap + library ASLR and resolves the libc-only aiming
constraint that had capped an earlier chain.

**2. Controlled write — edu FROM_PCI OOB write (`edu_dma_timer` FROM_PCI branch, `hw/misc/edu.c:149-154`).**
`pci_dma_read(&pdev, edu_clamp_addr(src), edu->dma_buf + (dst-DMA_START), cnt)` copies
guest-controlled bytes (read from guest RAM at `clamp(src)`) to a guest-controlled host
offset. The host offset `dst = dma.dst - DMA_START` is a raw `uint64` (only the guest PCI
address is clamped), so a wrapped-negative offset `(uint64)-CB_OFF` underflows `dma_buf+dst`
backwards within the same `g_malloc` allocation. Demonstrated extent: 8-byte write lands on
a chosen host qword; ASAN reports `heap-buffer-overflow WRITE @ edu.c:154` (+ the matching
READ in `edu_dma_timer`; UBSan `index 8192 out of bounds for type char[4096]`) on the
CONFIRMATION build. No size cap on the host offset, so `dst`/`cnt` are full `uint64` guest MMIO values.

**3. Deref + ACE — QEMU timer-fire call (`util/qemu-timer.c:593`) into a real libc gadget.**
`EduState` embeds `QEMUTimer dma_timer` **inline, immediately before `char dma_buf[]`**
(`edu.c:75-76`) — same `g_malloc` allocation, so `dma_timer.cb` sits at a fixed intra-object
backward offset `dma_buf − 0x20` (=`EduState + 0xc98`; CB_OFF pinned 0x20; corroborated: the
slot holds `edu_dma_timer`'s canonical-executable `.text` ptr every run). The timer
machinery reads `cb` fresh each fire and calls `ts->cb(ts->opaque)`. **The ACE plant
(R13):** a single combined FROM_PCI DMA snapshots `EduState[0..0xca0)`, overlays the
attacker-chosen command string at `EduState[0]` and `system` at `cb` (=`EduState+0xc98`),
and writes the combined buffer back — so a guest-driven re-arm (CMD|EDU_DMA_RUN after
RUN-clear, `edu.c:163`/`edu.c:176`/`edu.c:187`) + Fire-2 calls
`system(EduState)` = `system("echo edu-ace > /tmp/edu_ace_marker")`. Demonstrated extent:
`system` executes, forks+execs the command, and creates the marker — 10/10 ASLR runs.

---

## Achievable severity

No dedicated `triage` state scored this **chain** result, so per the severity-fidelity rule
this report does **not** assign a numeric CVSS to the chain. The demonstrated impact is
stated qualitatively (the only scored artifacts are the underlying primitives — see below —
none of which captures the chain's demonstrated code-execution impact).

**Demonstrated impact:** guest-driven **arbitrary host code execution** — the host QEMU
process runs an attacker-chosen command (here a benign marker; any shell command is
substitutable), reproducibly across ASLR runs (10/10), reached through the hijacked
`dma_timer.cb`. This meets the task's **Full** criterion ("an attacker-chosen side effect
executed by the host QEMU process").

**Underlying primitive scoring (context only — these score the primitives, not the chain):**
the edu TO_PCI read primitive scores `CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:N` as a
standalone primitive. The chain's demonstrated impact
(arbitrary host code execution) **exceeds** any primitive's standalone score, which is
precisely why no triage number is transcribed here — assigning one would be re-scoring a
result no triage assessed.

**Honest scope note (the prior gap, now closed — stated for traceability).** The R11
demonstration used an unmapped canonical proof-of-control gadget: the CPU set RIP to the
attacker-chosen address and *fetch-faulted* there (PC control, not code execution). R13
replaced that gadget with the real, mapped libc `system`, which *executes*. This is not a
new chain step; it is gadget selection at the same `cb(opaque)` call site, and it converts
PC-set into demonstrated arbitrary code execution.

---

## Demonstrated execution evidence (FULL ACE, ASLR ON)

Captured by `harness_validate`'s own execution this round (authoritative;
`.workflow/harness_validate/validation.md`):

**(1) Observable side effect (non-gdb run, marker deleted first):**
```
EDU_ESCAPE: ACE -- libc_base=0x7123d11ae000 system=0x7123d1206750 dma_buf=0x6473b5bed538
EDU_ESCAPE: ACE FULL cb_after=0x7123d1206750 (system=0x7123d1206750) = system (OK) ;
            cmd readback "echo edu-ace > /tmp/edu_ace_marker"
EDU_ESCAPE: ACE Fire-2 -- re-arm + fire -> system(edu) -> reads EduState[0]="echo edu-ace > ..." -> marker
$ cat /tmp/edu_ace_marker   ->   edu-ace
```

**(2) Authoritative gdb backtrace (`break system`, ASLR ON):**
```
Thread 1 "qemu-fuzz-x86_6" hit Breakpoint 1, __libc_system
  (line=line@entry=0x597c423b28b0 "echo edu-ace > /tmp/edu_ace_marker")
  at ../sysdeps/posix/system.c:202
SYSTEM_HIT rdi=0x597c423b28b0
#0  __libc_system (line="echo edu-ace > /tmp/edu_ace_marker") at system.c:202
#1  0x0000597c25972096 in timerlist_run_timers (...) at ../util/qemu-timer.c:593
#2  qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../util/qemu-timer.c:607
#3  qemu_clock_advance_virtual_time (dest=1000000000) at ../util/qemu-timer.c:713
```

`__libc_system` is reached with the **attacker-chosen command in `rdi`** through the exact
design-named call site `cb(opaque)` in `timerlist_run_timers:593`. `rdi` = `opaque` =
`EduState`; `system` reads its argument from `rdi`, i.e. the command the guest planted at
`EduState[0]`.

**Reliability (run by the validator):**

| run class | result | detail |
|---|---|---|
| **10/10 ace_full** (marker created, content `edu-ace`) | leak 10/10, cb_planted 10/10, cmd_planted 10/10, 0 timeout | each run a fresh ASLR process; `system` address leak-derived per run |
| gdb backtrace through `system` | `#1 timerlist_run_timers qemu-timer.c:593`, cmd in `rdi` | ASLR ON, `-batch` |

**Two complementary proofs.** The gdb run halts at `system`'s *entry* (proving the call site
is reached with the attacker-chosen command in `rdi`); the non-gdb run lets `system`'s body
run and produces the marker (10/10). Neither proof alone is the whole story; together they
show the call site is reached *and* the side effect executes.

---

## Honest notes (characteristics of the realized exploit, not gaps)

- **The realized path is `system(EduState[0])`, not a stack-pivot+ROP.** The R12 gdb
  call-site characterization showed only `rdi=EduState` is attacker-controlled at
  `cb(opaque)`, and no `rdi→rsp` pivot is usable in libc or the hardened QEMU binary
  (`setcontext` blocked — its ucontext offsets overlap EduState's live PCIDevice parent).
  Since `system` reads its arg from `rdi`, the combined single-fire plant puts the command
  at `EduState[0]` and `system` at `cb`. This is a *design realization* (simpler and cleaner
  than a pivot, runs on the real stack), not a weaker outcome — it achieves full arbitrary
  command execution.
- **Post-`system` crash (rc=+1).** The combined fire overwrites `EduState[0]` (= the QOM
  `Object.class` slot) with the command string. After `system()` returns, the process faults
  in QOM cleanup (corrupted `Object.class`). This happens **after** the marker is written, so
  `ace_full` (marker present) is unaffected; the reliability wrapper gates on the marker, not
  on the exit code. The non-zero exit is an expected post-payload characteristic, not a defect.
- **Marker absent in the gdb evidence run only.** `break system` halts at `system`'s entry
  before its fork+exec body runs. Marker-creation evidence comes from the non-gdb run (10/10).

---

## Deliverable components (all present)

- **Guest-side driver / PoC:** `tests/qtest/fuzz/edu_dma_fuzz.c`, `EDU_ESCAPE_ACE` (+ `EDU_ESCAPE_ACE_CMDPLANT`)
  mode (Fire-1 backward write plants cmd+system via the combined snapshot DMA; Fire-2 re-arm/fire);
  built into `build-edu-ace/qemu-fuzz-x86_64` (a fresh non-ASAN build, the PRIMARY for ACE).
  One-shot-per-process (fresh ASLR run per invocation). The `EDU_ESCAPE_DEREF` (PC-set) and
  `EDU_ESCAPE_EXPLOIT` (full-chain coverage) modes of the same harness remain in the legacy
  `build-fuzz` binary.
- **Call-site control analysis:** gdb dump of every register + stack at `cb(opaque)`
  (`.workflow/harness_build/callsite_edu.txt` + the `callsite_dump*.txt` set) — the input to
  gadget selection (only `rdi` controlled → `system`).
- **Grooming recipe:** edu TO_PCI band-dump classifies every forward qword against
  `/proc/self/maps`; the write target is edu's OWN intra-object `dma_timer.cb` at the fixed
  backward offset `dma_buf − 0x20` — **no forward-neighbor struct identification or feng-shui
  needed** (deterministic compile-time layout within one `g_malloc` block).
- **Leak → aim math:** edu TO_PCI → QEMU `.text` ptr (PIE base) + libc base (corroborated by
  `/proc/self/maps`) → `system = libc_base + 0x58750`. Aim is re-resolved each run, so it
  tracks ASLR (verified per run).
- **Overwritten object + deref path:** `EduState.dma_timer.cb` ← `system`; `EduState[0]` ←
  command string; deref = guest-driven timer re-arm + Fire-2 → `timerlist_run_timers`
  `cb(opaque)` (`qemu-timer.c:593`) → `system(EduState[0])`.
- **Attack payload:** `system("echo edu-ace > /tmp/edu_ace_marker")` — the command is the
  string planted at `EduState[0]`; any shell command is substitutable.
- **Evidence:** `.workflow/harness_build/ace_evidence.txt` (side effect),
  `ace_gdb_evidence.txt` (gdb backtrace through `system`), `ace_reliability_matrix.txt`
  (10/10), `callsite_dump*.txt` (call-site state); ASAN `heap-buffer-overflow WRITE @
  edu.c:154` (+ READ in `edu_dma_timer`, UBSan `index 8192 out of bounds for type
  char[4096]`) on the CONFIRMATION build. Device source unmodified
  (`git status --porcelain hw/misc/edu.c util/qemu-timer.c include/qemu/timer.h` empty).
- **Reliability:** ≥10 ASLR runs — 10/10 ace_full (leak/cb/cmd 10/10, 0 timeout); prior
  PC-set matrix (10/10 leak-derived + 12/12 attacker-chosen target) still holds.

**Harness validation gates (independently verified this round):** linkage 43 real libs / 0
stubs / 0 not-found (fresh `-Dspice=disabled -Dpixman=disabled -Dkvm=disabled` build, cleaner
than the legacy binary); `edu_dma_timer` COVERED (hits:1, edges 7/23); 9 distinct sancov
symbols byte-verified on the TARGET object `hw/misc/edu.c.o` (incl. `trace_cmp4/8` +
`trace_pc_indir`) — conclusively not wrapper-only; libFuzzer `cov:1454 ≥ 1000`; edu.c edge
coverage 22.2% (EXPLOIT mode).

---

## Mitigations

- **Root-cause fix (the primitive):** `edu_check_range` (`hw/misc/edu.c:106-125`) is the
  log-only guard at both call sites (`:151` FROM_PCI, `:157` TO_PCI). Clamping `src`/`dst`/
  `cnt` to `[0, DMA_SIZE]` (or aborting on OOB) closes both the read leak and the write at
  once. (The same fix site closes the read-primitive leak too.)
- **Intra-object write path specifically:** the backward write exploits an *unclamped host
  offset* (`dma_buf + (dst-DMA_START)` with `dst` a raw `uint64`). Bounding the host offset
  to `[0, DMA_SIZE)` (not merely the guest landing GPA) is the targeted closure.
- **Defense-in-depth:** R13 demonstrates that a corrupted `QEMUTimer.cb` is not merely PC-set
  but a path to **arbitrary host command execution** (`system`). CFI / pointer-auth on
  indirect calls (and on function pointers stored in heap objects) would convert this into a
  detection/abort rather than a control transfer; this is mitigation hardening, not a fix for
  the underlying write.
- **Deployment:** the chain requires `-device edu` (`CONFIG_EDU=y` ships in the default
  `x86_64-softmmu` set but is not instantiated on default machines) — exposure is
  deployment-conditional, as the primitive triage notes.

---

## References

- `hw/misc/edu.c:149-154` (FROM_PCI write, the OOB-write site; ASAN WRITE @ 154),
  `:155-161` (TO_PCI read, the leak), `:106-125` (`edu_check_range`, log-only guard),
  `:75-76` (`dma_timer` embedded before `dma_buf`), `:163/176/187` (re-arm path).
- `util/qemu-timer.c:593` (`timerlist_run_timers` `cb(opaque)` — the deref/ACE call site).
- `.workflow/harness_validate/validation.md` (authoritative, execution-verified this round).
- `.workflow/harness_build/ace_evidence.txt`, `ace_gdb_evidence.txt`,
  `ace_reliability_matrix.txt`, `callsite_dump*.txt`, `callsite_edu.txt`.
- `.workflow/discoveries/findings.h3.md` (Round-10 design, unvalidated; Option A).
- `.workflow/journal/journal.md` Rounds 8–13.
- CWE: CWE-787 (Out-of-bounds Write) for the FROM_PCI primitive; CWE-125 (OOB Read) for the
  TO_PCI leak; CWE-1256 (Improper Initialization / trusted-pointer misuse) framing for the
  chain. Related: two earlier escape-chain hypotheses were rejected at the dereference
  step, their forward-region writes unable to reach a usable code pointer; this chain
  reuses the edu read primitive the earlier work also relied on.