AG-UI and A2UI: The Contracts Agent Apps Actually Need

Most agent apps start the same way: a chat box, a backend, and a pile of custom glue.

At first it feels fine. Stream some text. Show a spinner. Maybe render tool calls. Maybe show intermediate steps. Then the agent gets more capable and the frontend-backend contract falls apart.

Now you need to handle token streaming, tool lifecycle events, partial state updates, interrupts, human approval, and maybe even UI generated by the agent itself.

This is where AG-UI and A2UI matter.

They solve two different problems in the same stack.

AG-UI standardizes how agent backends and user-facing apps talk.

A2UI standardizes how agents describe UI safely.

If you separate those two concerns, agent apps get much easier to build.

The Problem: Agents Break Normal API Design

Traditional app architecture assumes a simple loop:

Client sends request
Server returns response
Client renders
Interaction ends

That works for CRUD apps. It does not work well for agents.

Agent runs are long-lived. They stream partial text. They call tools. They update state over time. They may pause for approval. They may emit structured data and plain language in the same run. Sometimes they need the UI to react before the run is over.

You can keep solving this with custom WebSocket payloads and frontend conditionals. Many teams do.

The result is usually brittle.

Every backend invents its own event format. Every frontend invents its own parser. Tool calls become UI-specific hacks. Streaming becomes transport-specific glue. Swapping frameworks becomes painful because the "protocol" only exists inside your app.

That is the real problem these protocols are trying to solve.

What AG-UI Solves

AG-UI stands for Agent-User Interaction protocol.

It is an open, event-based protocol for the connection between an agentic backend and a user-facing application. Instead of forcing every team to invent its own stream format, AG-UI defines a shared contract for the flow of agent events.

The core idea is simple:

The backend emits typed events
The frontend consumes those events in real time
Both sides agree on the lifecycle of a run

That contract covers the hard parts normal APIs do badly:

streaming text
tool call lifecycle
state updates
frontend actions
interrupts
custom events
multimodal attachments

This is why AG-UI feels useful immediately. It is not trying to replace HTTP or WebSockets. It sits on top of them and gives agent apps a vocabulary they usually lack.

According to the official docs, AG-UI was born from CopilotKit's early work with LangGraph and CrewAI and then opened up into a broader protocol. That origin makes sense. The protocol came from practical frontend pain, not from theory.

If you read the AG-UI docs or the Hackernoon breakdown of AG-UI events, the same theme keeps showing up: agent UX is really event orchestration.

Not just messages. Events.

What A2UI Solves

A2UI solves a different problem.

Sometimes text is not enough. The agent does not just want to say something. It wants to show something interactive:

a search results panel
a form
a dashboard card
a chart
a review/approval surface

The bad way to do this is to let the model generate raw code and somehow execute it in the client.

That is unsafe and hard to control.

A2UI takes the opposite approach. Instead of executable code, the agent sends declarative UI descriptions. The client renders those descriptions using its own native component catalog.

That gives you a safer model:

the agent describes intent
the app decides what components are allowed
the renderer stays in control

The official A2UI site describes it as a protocol for agent-driven interfaces that render across web, mobile, and desktop without arbitrary code execution. It was created by Google with contributions from CopilotKit and the open source community.

That "without arbitrary code execution" part is the key design choice.

It means the agent can generate interfaces, but only within the boundaries your application accepts.

So if AG-UI standardizes interaction, A2UI standardizes presentation.

How They Work Together

This is the clean mental model:

AG-UI is the event pipe
A2UI is the UI description format

AG-UI answers: "How do frontend and backend communicate during an agent run?"

A2UI answers: "How does an agent safely ask the app to render UI?"

Those are related questions, but not the same question.

In practice, they fit together well because agent apps usually need both:

A runtime contract for streamed execution
A UI contract for structured rendering

An agent can stream progress, tool activity, and state changes through AG-UI.

Then, when it needs richer interaction than plain text, it can emit declarative UI payloads that the client interprets through an A2UI-style renderer.

That separation is healthy.

Without it, teams often overload one mechanism to do everything:

text messages carrying JSON blobs
tool results pretending to be UI
frontend-specific flags hidden inside backend responses

It works for demos. It does not scale cleanly.

How This Helped in My Setup

This clicked for me when I used assistant-ui on the frontend and LangGraph Deep Agents on the backend.

On the backend side, LangGraph already has strong streaming primitives. Its streaming docs make the model explicit: stream updates for state changes, messages for token-level output, and custom for app-specific signals. That is exactly the kind of runtime behavior a frontend needs to observe in real time.

On top of that, ag-ui-langgraph provides a concrete AG-UI integration for LangGraph. So instead of inventing another custom adapter layer, I can expose a backend that speaks a standardized agent-user event protocol.

On the frontend side, assistant-ui already has runtime abstractions for different backends, including AG-UI support. That matters because the frontend no longer has to know every LangGraph detail directly. It can consume a cleaner runtime contract.

The result is better separation:

LangGraph focuses on execution
AG-UI focuses on transport semantics
assistant-ui focuses on rendering and interaction

That made the frontend-backend boundary much clearer for me.

It also reduced a common source of agent-app mess: nobody arguing anymore about what a "message" means.

Text is text. Tool events are tool events. State updates are state updates. UI rendering has its own layer.

That clarity is more valuable than it sounds.

Why These Protocols Exist

The reason both protocols exist is simple: agent apps are becoming multi-step, stateful, and interactive, but most web app contracts are still shaped like request/response APIs.

That gap creates repeated infrastructure work.

AG-UI exists because agent backends need a standard way to stream execution and interaction semantics to the UI.

A2UI exists because agents need a safe way to describe rich interfaces without shipping raw frontend code across trust boundaries.

Both protocols are attempts to turn ad hoc integration pain into explicit standards.

That is why they matter.

Not because protocols are exciting. Usually they are not.

They matter because once the contract is clear, the rest of the app gets simpler.

The Practical Takeaway

If you are building agent apps, do not treat the frontend as a dumb chat shell.

You need to think in layers:

execution
event transport
rendering
interaction

AG-UI gives you a better contract for the transport and interaction layer.

A2UI gives you a better contract for the rendering layer when agents need to drive UI.

Used together, they make the frontend-backend boundary much less ambiguous.

That has been the biggest benefit in my own setup.

Not hype. Not novelty.

Just clearer data flow, clearer UI flow, and less glue code.