Consistency metrics measure whether an agent can succeed repeatedly, not just once.
Why It Matters
In production, “works one time” is not enough. Many systems show steep drop-offs when you run the same task multiple times.
Measure not only capability, but reliability across repeated runs.
Consistency metrics measure whether an agent can succeed repeatedly, not just once.
In production, “works one time” is not enough. Many systems show steep drop-offs when you run the same task multiple times.