olmo-eval: AI2's New Workbench for the Model Development LoopAI2 releases olmo-eval, a modular evaluation framework designed for the iterative reality of training LLMs—not just scoring finished models.#llm-evaluation#mlops#open-source#benchmarking#tooling