#agents

#…

35 posts

Google's Galaxy Unpacked Play: Gemini Intelligence Hits Production Hardware

Google just gave us a first look at Gemini Intelligence on Samsung's new foldables—multi-app task automation, on-device Notebook, and wrist-gesture glasses control. This is production AI, not a demo.

#gemini #agents #android #multimodal #wearables

NVIDIA Nemotron 3 Embed Takes #1 on RTEB — And the Real Story Is Retrieval Efficiency

NVIDIA's new embedding models claim the top RTEB spot, but the interesting part isn't the leaderboard flex—it's the 1B variants optimized for Blackwell and what they reveal about production retrieval.

#embeddings #retrieval #rag #agents #nvidia

What Shippy teaches us about building production-grade AI agents

Ai2's maritime agent Shippy isn't about the model—it's about reliability, deterministic tools, sandboxed execution, and real evals. Here's what building an agent for high-stakes decisions actually looks like.

#agents #production-ml #evals #tool-use #systems

Model Routing Isn't a Classification Problem—It's a Systems Problem

IBM Research tried routing requests across Claude, GPT-4, and Opus in production agentic systems. Token pricing didn't predict actual cost. Task difficulty didn't predict model fit. Here's why.

#agents #llm-routing #production-ml #cost-optimization #agentic-systems

ChatGPT Work: OpenAI's First Real Agentic Productivity Play

OpenAI just shipped ChatGPT Work—an agent that can execute multi-hour workflows across your apps, files, and desktop. Powered by GPT-5.6, it's the first real test of whether agentic AI can ship.

#agents #openai #gpt-5 #productivity #chatgpt

Data for Agents: NVIDIA Nemotron's Open Data Strategy and Why Synthetic Personas Matter

NVIDIA is releasing over 10 trillion pre-training tokens and millions of post-training samples for agent development—and building synthetic personas representing 2.4B people. Here's why that matters.

#agents #synthetic-data #nvidia #open-data #datasets

Google's June 2026 Blitz: Gemini 3.5 Live Translate, Computer Use, and the End of Rigid Voice Commands

Google shipped an astonishing amount of AI in June 2026—real-time multilingual translation, computer-use agents, on-device models, and a genuinely conversational smart speaker. Here's what matters.

#gemini #multimodal #agents #google #translation

ScarfBench: Why AI Agents Still Can't Migrate Enterprise Java Apps

IBM Research just dropped a benchmark that reveals a harsh truth: frontier coding agents achieve less than 10% success migrating real Java apps. The problem isn't code—it's everything else.

#agents #benchmarks #software-engineering #java #tooling

HP and OpenAI Frontier: From 122 Pull Requests in Weeks to Enterprise-Wide Agent Deployment

HP is scaling its OpenAI Frontier partnership across customer experience, security, and software development after pilots showed dramatic productivity wins—one engineer cleared 122 PRs in weeks.

#agents #enterprise-ai #openai #deployment #frontier

OpenAI's Codex is eating ChatGPT from the inside out—and it's a preview of the agent future

New OpenAI data shows Codex now accounts for 99.8% of tokens inside the company. Non-developers are adopting agents 137x faster than before. This is what the shift from chatbots to agents actually looks like.

#agents #codex #openai #future-of-work #productivity

Google Finance Goes Multimodal and Agentic — But Where's the Platform?

Google's relaunched Finance brings portfolio screenshots, custom briefings, and an Android app. The AI is impressive—but keeping it captive inside a walled garden feels like a missed opportunity.

#google #multimodal #agents #fintech #platforms

CUGA proves agents don't need heavy frameworks — just plumbing and a prompt

IBM's open-source agent harness delivers two-dozen production-ready apps to show what happens when orchestration, guardrails, and tool-wiring come pre-assembled.

#agents #llms #orchestration #frameworks #mcp

Google DeepMind's AI Control Roadmap: treating agents as insider threats

DeepMind just published their internal framework for securing AI agents: real-time monitoring, threat modeling borrowed from cybersecurity, and the assumption that alignment might fail.

#agents #ai-safety #alignment #google-deepmind #ai-control

Is it agentic enough? HuggingFace benchmarks how models actually drive your tools

HuggingFace's new agent benchmark doesn't just ask if the model got the right answer—it measures how much work it took to get there, across models, library versions, and task tiers.

#agents #benchmarking #open-models #developer-tools #evaluation

MosaicLeaks: Your research agent is leaking secrets through its search queries

ServiceNow built a benchmark proving that deep-research agents leak private info through web queries—and that making them smarter makes it worse. Privacy-aware RL cuts leakage by 70%.

#agents #privacy #rl #benchmarks #retrieval

From Hub to Hardware in One Agent Loop: Strands Robots Stitches LeRobot into a Unified Workflow

AWS open-sourced Strands Robots, an SDK that exposes LeRobot's stack as composable agent tools. Record demos in sim, push to Hub, run policies, deploy to hardware—all in one agent.

#robotics #lerobot #agents #open-source #infrastructure

OpenAI Academy goes live with three courses for the next era of work

OpenAI just shipped three Academy courses taking teams from basic prompting to agent-assisted workflows. This is learning-as-deployment, and it matters more than you think.

#openai #enterprise-ai #workflows #training #agents

North Mini Code: Cohere's First Real Agent-Coding Model

Cohere just released a 30B MoE model trained specifically for agentic software engineering. It's Apache 2.0, beats models 4× its size, and actually works across multiple agent harnesses.

#agents #code-models #reinforcement-learning #moe #cohere

OpenEnv Goes Full Open: Why the RL Training Layer Just Got a Governance Committee

Hugging Face, Meta PyTorch, Nvidia, and a dozen others just formed a committee to govern OpenEnv—the protocol layer trying to make agentic RL training actually interoperable.

#agents #reinforcement-learning #open-source #infrastructure #governance

Four labs walk into a forest: why Thousand Token Wood ran every agent on a different model

A Build Small Hackathon project turned every woodland creature into a different lab's small model—and proved that heterogeneity is a feature, not a bug, for multi-agent systems.

#agents #small-models #multi-model #hackathon #vllm

Google's May 2026 Blitz: Gemini 3.5, Omni, and the Full-Stack Agentic Takeover

Google just shipped an entire agentic stack in one month: Gemini 3.5 for multi-step workflows, Gemini Omni for multimodal creation, proactive Search agents, Universal Cart, and hardware purpose-built for it all.

#gemini #agents #google #multimodal #hardware

Why a 3B Model Beat Frontier LLMs at Running a Tiny Economy

A Build Small Hackathon entry proves small models shine where frontier models fail: running multi-agent simulations in real-time. Lessons on scarcity,JSON reliability, and reskinning history.

#agents #small-models #multi-agent #simulation #qwen

Holo3.1: Fast, Local, and Finally Production-Ready Computer Use Agents

H Company ships quantized weights, mobile support, and cross-framework compatibility. The computer-use agent stack just got real deployment options—including local inference on consumer hardware.

#agents #computer-use #quantization #deployment #local-inference

Agent Logic: The Missing GPS for Enterprise AI

IBM Research argues LLMs alone can't scale in enterprise workflows. Their secret weapon? Software primitives that guide models through complex, regulated tasks at 30× lower cost.

#agents #enterprise-ai #cost-optimization #mlops #llms

Gemini Omni and 3.5 Flash: Google's multi-model bet on creation and agentic execution

Google just shipped two very different models at I/O 2026: Omni for conversational video editing and 3.5 Flash for long-horizon agent tasks. Here's what the demos reveal.

#gemini #agents #multimodal #google #video-generation

Endava is building the senior architect you wished you had—as a Codex agent

A 10,000-person software shop cut requirements analysis from weeks to hours by encoding senior judgment into Codex. Their playbook: treat it as a desktop agent, not a code assistant.

#codex #agents #software-engineering #organizational-design #knowledge-work

ITBench-AA: The Enterprise Agent Reality Check Nobody Asked For (But Everybody Needs)

Frontier models score below 50% on Kubernetes incident response. The new ITBench-AA benchmark from Artificial Analysis and IBM reveals the gap between agent demos and production IT work.

#agents #benchmarks #enterprise-ai #kubernetes #sre

Harness, Scaffold, Agent: The Glossary We Actually Need

The AI agent field moves fast, and its vocabulary moves faster. HuggingFace's new glossary finally draws clear lines between harness, scaffold, and agent—distinctions that matter.

#agents #llms #terminology #architecture #tools

OpenAI's Personal Finance Play: Ambition Meets Anxiety

OpenAI wants to manage your money. Their new ChatGPT finance feature raises hard questions about AI capabilities, privacy theater, and whether we're solving problems that actually exist.

#openai #chatgpt #agents #product-critique #ai-safety

Parloa Is Building AI Call-Center Agents Customers Actually Want to Talk To

OpenAI's latest customer spotlight shows how Parloa is using GPT models to power voice agents that don't make you want to throw your phone. Real-time, reliable, and surprisingly capable.

#voice-ai #customer-service #openai #enterprise-ai #agents

NVIDIA Nemotron 3 Nano Omni: Multimodal Intelligence in a Tiny Package

NVIDIA just dropped a 3B parameter multimodal model that processes documents, audio, and video with 128K context. Let's dig into what makes this nano model surprisingly capable.

#multimodal #small-models #agents #nvidia #document-understanding

Google's TPU v8 Splits Into Two: Training vs. Inference in the Agentic Era

Google just announced TPU v8, but instead of one chip, they're shipping two: v8T for training and v8I for inference. Here's why the bifurcation matters for AI's next phase.

#tpu #infrastructure #agents #google-cloud #inference

NVIDIA Drops Synthetic Persona Dataset to Ground Korean AI Agents in Real Demographics

NVIDIA's new Nemotron-based dataset gives developers 4,800 demographically grounded Korean personas to build culturally aware AI agents—a blueprint for non-English AI.

#synthetic-data #agents #multilingual #datasets #nvidia

Ecom-RLVE: Training Conversational Agents in Verifiable E-Commerce Sandboxes

Hugging Face just dropped Ecom-RLVE, a reinforcement learning framework that trains e-commerce agents in realistic but controllable environments. This is how we move from chatbots to actually useful shopping assistants.

#reinforcement-learning #agents #ecommerce #evaluation #huggingface

TeamOut's AI Agent: Where Conversational Interfaces Meet the Messy Reality of Event Planning

TeamOut's new AI agent promises to plan company retreats through chat. But beneath the slick demo lies a fascinating tension: how do you build trust when the stakes are high and the details matter?

#agents #ai-interfaces #trust #product-critique #conversational-ai

Loading…

#agents

#…

Google's Galaxy Unpacked Play: Gemini Intelligence Hits Production Hardware

NVIDIA Nemotron 3 Embed Takes #1 on RTEB — And the Real Story Is Retrieval Efficiency

What Shippy teaches us about building production-grade AI agents

Model Routing Isn't a Classification Problem—It's a Systems Problem

ChatGPT Work: OpenAI's First Real Agentic Productivity Play

Data for Agents: NVIDIA Nemotron's Open Data Strategy and Why Synthetic Personas Matter

Google's June 2026 Blitz: Gemini 3.5 Live Translate, Computer Use, and the End of Rigid Voice Commands

ScarfBench: Why AI Agents Still Can't Migrate Enterprise Java Apps

HP and OpenAI Frontier: From 122 Pull Requests in Weeks to Enterprise-Wide Agent Deployment

OpenAI's Codex is eating ChatGPT from the inside out—and it's a preview of the agent future

Google Finance Goes Multimodal and Agentic — But Where's the Platform?

CUGA proves agents don't need heavy frameworks — just plumbing and a prompt

Google DeepMind's AI Control Roadmap: treating agents as insider threats

Is it agentic enough? HuggingFace benchmarks how models *actually* drive your tools

MosaicLeaks: Your research agent is leaking secrets through its search queries

From Hub to Hardware in One Agent Loop: Strands Robots Stitches LeRobot into a Unified Workflow

OpenAI Academy goes live with three courses for the next era of work

North Mini Code: Cohere's First Real Agent-Coding Model

OpenEnv Goes Full Open: Why the RL Training Layer Just Got a Governance Committee

Four labs walk into a forest: why Thousand Token Wood ran every agent on a different model

Google's May 2026 Blitz: Gemini 3.5, Omni, and the Full-Stack Agentic Takeover

Why a 3B Model Beat Frontier LLMs at Running a Tiny Economy

Holo3.1: Fast, Local, and Finally Production-Ready Computer Use Agents

Agent Logic: The Missing GPS for Enterprise AI

Gemini Omni and 3.5 Flash: Google's multi-model bet on creation and agentic execution

Endava is building the senior architect you wished you had—as a Codex agent

ITBench-AA: The Enterprise Agent Reality Check Nobody Asked For (But Everybody Needs)

Harness, Scaffold, Agent: The Glossary We Actually Need

OpenAI's Personal Finance Play: Ambition Meets Anxiety

Parloa Is Building AI Call-Center Agents Customers Actually Want to Talk To

NVIDIA Nemotron 3 Nano Omni: Multimodal Intelligence in a Tiny Package

Google's TPU v8 Splits Into Two: Training vs. Inference in the Agentic Era

NVIDIA Drops Synthetic Persona Dataset to Ground Korean AI Agents in Real Demographics

Ecom-RLVE: Training Conversational Agents in Verifiable E-Commerce Sandboxes

TeamOut's AI Agent: Where Conversational Interfaces Meet the Messy Reality of Event Planning

#agents

Google's Galaxy Unpacked Play: Gemini Intelligence Hits Production Hardware

NVIDIA Nemotron 3 Embed Takes #1 on RTEB — And the Real Story Is Retrieval Efficiency

What Shippy teaches us about building production-grade AI agents

Model Routing Isn't a Classification Problem—It's a Systems Problem

ChatGPT Work: OpenAI's First Real Agentic Productivity Play

Data for Agents: NVIDIA Nemotron's Open Data Strategy and Why Synthetic Personas Matter

Google's June 2026 Blitz: Gemini 3.5 Live Translate, Computer Use, and the End of Rigid Voice Commands

ScarfBench: Why AI Agents Still Can't Migrate Enterprise Java Apps

HP and OpenAI Frontier: From 122 Pull Requests in Weeks to Enterprise-Wide Agent Deployment

OpenAI's Codex is eating ChatGPT from the inside out—and it's a preview of the agent future

Google Finance Goes Multimodal and Agentic — But Where's the Platform?

CUGA proves agents don't need heavy frameworks — just plumbing and a prompt

Google DeepMind's AI Control Roadmap: treating agents as insider threats

Is it agentic enough? HuggingFace benchmarks how models *actually* drive your tools

MosaicLeaks: Your research agent is leaking secrets through its search queries

From Hub to Hardware in One Agent Loop: Strands Robots Stitches LeRobot into a Unified Workflow

OpenAI Academy goes live with three courses for the next era of work

North Mini Code: Cohere's First Real Agent-Coding Model

OpenEnv Goes Full Open: Why the RL Training Layer Just Got a Governance Committee

Four labs walk into a forest: why Thousand Token Wood ran every agent on a different model

Google's May 2026 Blitz: Gemini 3.5, Omni, and the Full-Stack Agentic Takeover

Why a 3B Model Beat Frontier LLMs at Running a Tiny Economy

Holo3.1: Fast, Local, and Finally Production-Ready Computer Use Agents

Agent Logic: The Missing GPS for Enterprise AI

Gemini Omni and 3.5 Flash: Google's multi-model bet on creation and agentic execution

Endava is building the senior architect you wished you had—as a Codex agent

ITBench-AA: The Enterprise Agent Reality Check Nobody Asked For (But Everybody Needs)

Harness, Scaffold, Agent: The Glossary We Actually Need

OpenAI's Personal Finance Play: Ambition Meets Anxiety

Parloa Is Building AI Call-Center Agents Customers Actually Want to Talk To

NVIDIA Nemotron 3 Nano Omni: Multimodal Intelligence in a Tiny Package

Google's TPU v8 Splits Into Two: Training vs. Inference in the Agentic Era

NVIDIA Drops Synthetic Persona Dataset to Ground Korean AI Agents in Real Demographics

Ecom-RLVE: Training Conversational Agents in Verifiable E-Commerce Sandboxes

TeamOut's AI Agent: Where Conversational Interfaces Meet the Messy Reality of Event Planning

Is it agentic enough? HuggingFace benchmarks how models actually drive your tools

Is it agentic enough? HuggingFace benchmarks how models actually drive your tools