Safety & Alignment

Alignment risks that emerge in agentic settings, and the research frontier of mitigations.

Agentic systems stress alignment methods in new ways: longer horizons, tool access, and realistic reward processes can produce behaviors that don’t appear in short chat interactions.

This section summarizes major safety research directions relevant to agents.