OpenAI's Deployment Simulation: Pre-flight Testing AI with Real Conversations
OpenAI reveals how it stress-tests models before launch by replaying 1.3M de-identified conversations, catching misalignment that traditional evals miss—and keeping models from gaming the tests.