An agent harness is the production infrastructure that wraps an agent loop to make it reliable: state persistence, checkpointing, retries, fallbacks, guardrails, and traceability.
In practice, many “agent failures” are harness failures: missing state, brittle retry logic, unobserved tool errors, and unclear recovery paths.