Overview
A Hawk Agent is a second agent that monitors the development and reasoning process of the primary (implementer) agent. Unlike an evaluator that checks output quality, the Hawk watches how the agent thinks — its reasoning traces, decision points, and problem-solving approach.
The key insight: there are thousands of possible implementations for any given task, but only a few valid reasoning paths given the current project state. You catch misalignment much earlier by watching the process than by inspecting the final output.
Why Reasoning Over Output?
Consider a task: “Fix the payment integration.” There are hundreds of valid implementations. But given the current codebase state, tech stack, and the specific bug, there are only a handful of sensible reasoning paths.
A traditional evaluator sees the final code and asks “does this work?” The Hawk sees the reasoning and asks “is this agent solving the right problem in the right way?” This catches:
- Reward hacking — the agent satisfies tests while bypassing the actual problem (see Reward Hacking).
- Refactor avoidance — the agent works around a legacy issue instead of fixing it.
- Context contamination — the agent copies a bad pattern from elsewhere in the codebase.
No test, linter, or set of deterministic rules can catch these. They live in the process, not the output.
Cross-Provider Pairing
The Hawk should use a different model provider from the implementer.
| Role | Recommended | Rationale |
|---|---|---|
| Implementer | Codex / GPT | Excellent engineer, fast, prone to reward hacking |
| Hawk | Claude | Well-aligned, sensitive to reward hacking patterns |
Why cross-provider? Same-provider models share training data, RLHF conventions, and implicit biases. They tend to agree on stylistic choices and failure modes. Cross-provider pairing introduces genuine diversity of perspective — a Claude Hawk is more likely to flag when a Codex implementer takes a shortcut because their training biases are structurally different.
How It Works
def hawk_monitored_session(task, implementer, hawk):
"""Run an implementer agent with Hawk oversight."""
session = implementer.start(task)
for turn in session:
# The implementer produces reasoning + actions
reasoning = turn.reasoning_trace
actions = turn.actions
# The Hawk evaluates the reasoning path
verdict = hawk.evaluate(
task=task,
reasoning=reasoning,
actions=actions,
project_state=get_project_state()
)
if verdict.misaligned:
session.pause()
return {
"status": "flagged",
"reason": verdict.explanation,
"turn": turn.number,
"suggestion": verdict.suggested_correction
}
return {"status": "complete", "output": session.result}
The Hawk receives the full reasoning trace — not just the code output — after each turn. It evaluates whether the reasoning path is sound given the project context and flags misalignment before it compounds.
When to Use
- High-stakes changes where reward hacking would be costly (payment systems, auth, data migrations).
- Long-running agent sessions where small misalignments compound over many turns.
- Codebases with legacy patterns that agents are likely to imitate rather than fix.
- Any environment where you have observed agents taking shortcuts that satisfy tests but miss intent.
When Not to Use
- Simple, well-defined tasks where the output can be verified mechanically (tests, type checks).
- Cost-sensitive environments where the overhead of a second agent is not justified.
- Tasks where the reasoning path is obvious and there is little room for misalignment.
Trade-offs
| Pros | Catches misalignment that no deterministic check can detect. Early detection prevents compounding errors. Cross-provider diversity improves coverage. |
| Cons | Higher cost (roughly 2x per session). More complex setup — you need to pipe reasoning traces between agents across providers. Adds latency to each turn. |
Related Patterns
- Evaluator-Optimizer — Evaluates output quality. The Hawk evaluates process quality. They complement each other.
- Reflection — Self-review by the same model. The Hawk provides external review by a different model.
- Two-Agent Architecture — Splits work between roles. The Hawk splits oversight from execution.