The invisible leak channel
Your research agent is quietly shipping your secrets to the internet, one innocuous search query at a time.
ServiceNow just dropped MosaicLeaks, a benchmark that exposes a failure mode most teams aren't even measuring: when deep-research agents combine private documents with web search, their external queries become a side channel. No single query gives away the game, but string them together and an adversary watching traffic can reconstruct sensitive enterprise information—the mosaic effect.
The kicker? When they trained agents to be better at research, leakage got worse. Task-only RL increased answer leakage from 34.0% to 51.7% while boosting task success. The fix required a privacy-aware training method that treats query construction as a learned skill, not a prompt-engineering problem.
What the mosaic effect looks like in practice
Here's the canonical example from the benchmark. A healthcare research agent is answering a multi-hop question. It fires off searches about cloud migration milestones, a January 2024 security disclosure, and vendor details. Each query looks benign in isolation.
But stitch them together and you get: MediConn had migrated 70% of its infrastructure to cloud by January 2025—a fact that existed only in private internal documents.
MosaicLeaks measures three escalating levels of leakage, all from observing just the web query log:
- Intent leakage: The adversary can infer what private questions the agent is investigating
- Answer leakage: Given a question about private info, the adversary can answer it from the queries alone
- Full-information leakage: The adversary can state verifiably true private claims without even being told what questions to ask
Full-information leakage is the nightmare scenario. The observer discovers and articulates your secrets autonomously.
The benchmark design
MosaicLeaks contains 1,001 multi-hop research chains that deliberately interleave local enterprise documents and a controlled web corpus. The construction is clever: answers from one hop become bridge entities for the next, forcing the agent to retrieve private information before it can form useful web queries.
Local documents come from DRBench-style enterprise tasks. Web documents come from BrowseComp-Plus. The final split: 559 training chains, 98 validation, 344 held-out test chains.
Each chain creates situations with high likelihood of leakage but that can still be solved without leaking. That's the design constraint that makes this interesting—privacy and performance aren't strictly opposed, they're just under-optimized.
Prompting doesn't fix it
The obvious first move: just tell the agent not to leak. Add a line to the planning prompt instructing it not to issue web queries that expose local information.
It barely worked. For Qwen3-4B, the prompt dropped answer/full-information leakage from 34.0% to 25.5%—but strict chain success fell from 48.7% to 44.5%. The behavioral change was fewer web queries overall, not consistently safer query construction.
Substantial leakage remained, and task performance regressed. You can't prompt privacy in.
The central tension: better agents leak more
Before building a privacy-aware solution, the team tried the obvious baseline: train the agent purely for task success using outcome-based RL.
It worked beautifully. Strict chain success jumped from 48.7% to 59.3%.
Answer/full-information leakage climbed right alongside it, from 34.0% to 51.7%.
The agent had learned to pack more context into web queries—specific metrics, dates, entity names—which helped retrieve the right documents but handed fragments to any observer. This is the core problem MosaicLeaks exposes: richer queries are often better for retrieval and worse for privacy.
You need a training objective that optimizes both.
Privacy-Aware Deep Research (PA-DR)
PA-DR combines two rewards. The first is a situational task reward. A single research trajectory can involve dozens of model calls, so giving them all the same final score is extremely weak credit assignment. Instead, PA-DR judges each call against other calls made at the same stage and hop, with the same information available.
A Plan call is rewarded for searching the correct source and retrieving the right document. If that document is already in hand, it's rewarded for not searching again. A Choose call is rewarded for selecting the document holding the answer.
The second reward is a learned privacy classifier. Whenever the agent produces web queries, a Qwen3-4B classifier estimates two risks: whether current queries leak private info directly, and whether adding them to the existing query log creates a new mosaic leak. PA-DR penalizes the larger of the two, landing the privacy cost on the exact planning decision that made the log more revealing.
The results
| Method | Strict chain success | Answer/full-info leakage |
|---|---|---|
| Base Qwen3-4B | 48.7% | 34.0% |
| Task reward only | 59.3% | 51.7% |
| Task + PA-DR | 58.7% | 9.9% |
PA-DR keeps almost all the task-performance gain (58.7% vs. 59.3%) while cutting leakage by more than 70% (9.9% vs. 51.7%). That 9.9% is lower than the untrained base model's 34.0%—training for privacy didn't just cancel the harm from task training, it left the agent safer than it started.
It doesn't search less, it searches smarter
PA-DR actually issues more web queries than the base model. The improvement isn't defensive silence—it's learned query construction. The trained agent drops revealing details like specific percentages, years, and answer-type hints while still finding the right public documents.
It retrieves what it needs without carrying private fragments in the query text.
Sample efficiency as a bonus
Situational rewards pay off during training itself. Because they compare matching calls instead of scoring whole rollouts, they assign credit far more precisely—no separate value model, no index alignment across trajectories.
They're also dramatically more sample-efficient. The situational task reward reaches outcome-RL-level task performance with roughly 5-6× fewer generated samples. PA-DR maintains that efficiency while adding the privacy gain.
| Training reward | Generated samples | Strict success | Samples to 55% success |
|---|---|---|---|
| Outcome reward | 963k | 55.4% | 963k |
| Situational task | 842k | 59.3% | 146k |
| Task + PA-DR | 706k | 58.7% | 183k |
Situational credit assignment is both more data-efficient and more aligned with the granular decision points where privacy actually breaks.
What this doesn't show
MosaicLeaks is a controlled benchmark. The enterprise documents are synthetic, the web corpus is fixed, the chains span three company contexts, and everything runs through a single agent harness designed for multi-hop QA.
That control is what makes hop-by-hop leakage measurement possible, but broader tasks, real deployments, retrieval-augmented generation in the wild, and other agent architectures still need their own study. This is a proof of concept for the failure mode and a training method, not a deployment-readiness claim.
The takeaway
You can't prompt privacy in. You have to train it in.
Telling agents to be careful barely moves the needle. Training them only for task success makes leakage worse. Privacy has to be part of the reward signal, landed on the specific actions—query construction, tool selection—where information disclosure happens.
MosaicLeaks proves the mosaic effect is real, measurable, and currently unmitigated in standard agentic training. PA-DR shows it's solvable with the right RL setup.
If you're building research agents that touch private data and external tools, this benchmark should be on your eval suite. The privacy risk isn't theoretical—it's structural, and it scales with capability unless you train against it explicitly.