Back to Thoughts

The Hard Part of AI Agents Isn't Code

Jan 29, 2026

5 min read

View raw

Everyone's building agents. Most are building them wrong.

Not because they can't code. Because they skip the decisions that matter.

I've built agents for research, data extraction, web scraping, and orchestration. The ones that failed didn't fail in implementation. They failed in design — questions I didn't ask before writing a single line.

Here's what I learned: the hard part of AI agents isn't code. It's decisions.


The Real Work Happens Before Code

Agent tutorials teach you frameworks. They don't teach you when to use them.

Should this be an agent or a workflow? Big model or small? How much autonomy? What happens when it fails?

These questions shape everything downstream. Skip them and you'll build something that works in demos and breaks in production.


Agent vs Workflow: The First Fork

This is the decision most people get wrong.

An agent decides what to do next. It has autonomy. It picks tools, loops, adjusts. Powerful — but unpredictable.

A workflow follows a fixed path. Step one, then step two, then step three. Predictable — but rigid.

The rule: if the steps are known upfront, use a workflow. If the LLM must decide what to do based on context, use an agent.

Most people default to agents when a workflow would do. They add complexity they don't need, then spend weeks debugging autonomy they never wanted.

Start with a workflow. Graduate to an agent only when you need the flexibility.


The Guardrail Paradox

Too few guardrails: the agent breaks on basics. Formats wrong. Types wrong. Misses obvious expectations. You expected it to handle simple things. It didn't.

Too many guardrails: the agent stops thinking. Give it strict examples and hard rules, it copies instead of learns. It imitates your patterns instead of understanding your intent.

The balance isn't a formula. It's taste — developed through iteration.

My approach: start loose, tighten where it fails. Don't over-specify upfront. Let the agent show you where it needs constraints.


The Questions That Matter

Before building any agent, answer these:

Architecture

Should this be an agent or a workflow? If steps are predictable, workflow. If decisions depend on context, agent. Don't add autonomy you don't need.

Does each agent have a single, clear responsibility? One agent, one job. Researcher researches. Parser parses. Orchestrator coordinates. Blur the lines and you'll debug for days.

Is an LLM call necessary here? If a Python function can do it precisely and deterministically, skip the LLM. Not everything needs intelligence. Some things need reliability.

Guardrails & Structure

Should the output be structured or prose? Prefer structured (JSON, schema, typed objects) when downstream code consumes it. Use prose only when humans read it directly.

How much autonomy should this agent have? Define the boundaries. Can it pick tools? Trigger subagents? Loop indefinitely? Autonomy without boundaries is chaos.

Are we providing enough context without overloading? Feed only what's needed. Too little context and it guesses wrong. Too much and it gets confused. Structure long context with clear sections or XML tags.

Failure Modes

What happens when parsing fails? It will fail. Plan for it. Retry? Fallback parser? Ask the LLM to try again? Decide before it happens.

How will we know if this agent is working? Define success. Latency? Accuracy? Completion rate? If you can't measure it, you can't improve it.


The Model Decision

Not every task needs your biggest model.

Small models for simple, deterministic tasks. Classification. Extraction. Formatting. Fast, cheap, reliable.

Large models for reasoning, planning, multi-step decisions. When the agent needs to think, let it think — but only then.

"Thinking mode" is expensive. Enable it for complex orchestration. Disable it for everything else.


Small Agents, Clear Responsibilities

The best agent architectures look simple.

One orchestrator coordinates. Small specialized agents handle specific tasks — research, parsing, fetching, analysis. Each one does one thing well.

The orchestrator delegates and combines. The subagents execute and return. Clean boundaries. Predictable behavior.

When an agent does too much, it fails in unpredictable ways. When responsibilities are clear, failures are obvious and fixable.


The Engineers Who Struggle

They're not bad at coding. They're skipping the design phase.

They start with frameworks instead of questions. They add agents when workflows would do. They over-engineer autonomy, then fight to constrain it.

The engineers who build reliable agents slow down before they speed up. They answer the hard questions first. Then the code writes itself.


The Checklist

Before you build, ask:

  1. Agent or workflow?
  2. Single responsibility per agent?
  3. LLM call or Python function?
  4. Structured output or prose?
  5. How much autonomy? What are the limits?
  6. Enough context? Too much?
  7. What's the fallback when parsing fails?
  8. How do we measure success?

Answer these first. Then write code.


The hard part of AI agents isn't code. It's knowing what to build before you build it.

Frameworks won't save you. Tutorials won't save you. The right questions will.

Ask them early. Your future self will thank you.

Gopal Khadka