Today we’re releasing research with @apolloaievals.
In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it.
While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing for. https://t.co/qDbvzWiL34