Observability was built for humans. AI agents need something different

A line of robots typing at computers — (Image credit: Getty Images)

For the past decade, observability vendors have been locked in an interface war. Competing on how tools look and feel. As data ingestion became commoditized through standards like OpenTelemetry, the real differentiation moved up the stack into the user interface (UI).

Mike Shi, Arno van Driel

At ClickHouse, Mike Shi is Principal Product Manager, and Arno van Driel is VP EMEA.

While vendors competed on visualization, dashboards and workflows, a deeper shift toward unifying logs, traces and metrics into a single, exploratory experience started to happen.

This meant teams could see all activity in a single view and more easily understand what was happening across their systems in real time. Observability became, in many ways, a UI problem, as value centered on how effectively humans could navigate and interrogate complex data to turn it into actionable insights.

The consumer is changing

As agentic AI systems emerge across the enterprise, the primary consumer of observability data is shifting from a human operator to a machine.

When that happens, the value of polished workflows with signal unification diminishes and the center of gravity moves down the stack.

The question is no longer how efficiently a person can navigate telemetry and reach a root cause. It is whether the underlying system has the right data, the right retention and the right properties for machines to reason over it.

This transition is not fully here yet, but the direction is unmistakable. AI agents are already capable of identifying patterns and correlations across large volumes of telemetry, even if they still struggle with true causal reasoning.

That gap is being actively closed. Every major cloud provider and AI lab is investing in agent capabilities that go beyond chat interfaces into autonomous decision-making. The more pressing question is whether existing observability platforms are designed for this future.

The trade-offs that made sense no longer do

Today's dominant observability platforms were built around assumptions that held when humans were the only operators in the loop. In that environment, systems were designed around how engineers investigate and troubleshoot issues manually.

As an effect, data retention windows are short, sometimes only days, because engineers rarely need to look back further.

Similarly, sampling and rollups are aggressive because a skilled operator could fill in the gaps with experience and intuition. Even pricing models reflect this reality, being optimized for human-driven, relatively infrequent queries rather than continuous analysis.

Each of these trade-offs was rational for humans. They become liabilities the moment machines are expected to do the analytical work.

Short retention windows prevent AI agents from spotting trends, seasonality and relationships across incidents. An AI agent that can only see the last 72 hours of data cannot, e.g., learn that a particular traffic spike recurs on a predictable cycle tied to seasonal trends.

Without long-term context, AI agents are stuck in the same reactive loop that observability was supposed to help organizations escape.

Aggressive sampling creates a different problem. Rollups and pre-aggregation remove the detailed signals that machines need for accurate reasoning.

A human reviewing a latency chart can make a judgement call about whether the underlying distribution matters. An AI agent cannot afford that shortcut. It needs full-fidelity data, because the signals it depends on are precisely the ones that sampling discards.

Then there is the economics question. Platforms that charge per query, cap concurrency or tie access to named human users are fundamentally misaligned with how AI agents work. AI agents do not run one query and study a graph. They run continuous, parallel analysis across multiple dimensions simultaneously.

A pricing model that penalizes high-volume machine access will either drive unsustainable costs or force teams to artificially constrain the very capabilities they are trying to enable.

These patterns also already influence how underlying data infrastructure evolves. Database management systems that treat observability as a first-class workload are emerging.

Instead of separating logs, metrics and traces into different systems or heavily sampling data, they are designed to store and query complete telemetry datasets at scale within a single database layer.

This makes it possible to retain and analyze all data, rather than working from reduced views of system behavior.

Preparing now for what is clearly coming

The good news is that organizations do not need to wait for fully autonomous observability to start preparing. The requirements are already visible and they map to decisions that leaders can make today.

Retention matters more than it used to. If a platform only keeps a few days of high-resolution data, it creates a hard ceiling on future AI agent capabilities before they are even deployed.

Full-fidelity data is not a luxury. The trend toward sampling made sense when storage and compute were the bottleneck and humans were the only consumer. As the cost of storing and querying raw telemetry continues to fall, keeping the original data becomes the more defensible choice.

Economics need to align with machine access patterns. This means evaluating not just the sticker price of an observability platform, but how it charges for the kind of high-concurrency, continuous workloads that AI agents generate.

Organizations that get this right will be able to deploy AI agents confidently. Those that do not hold their own capabilities back.

We've featured the best AI tool.

This article was produced as part of TechRadar Pro Perspectives, our channel to feature the best and brightest minds in the technology industry today.

The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/pro/perspectives-how-to-submit

TOPICS

Principal Product Manager at ClickHouse.

The consumer is changing

The trade-offs that made sense no longer do

Preparing now for what is clearly coming

Useful links