Skip to content

Agents

Agents are multi-step LLM workflows defined declaratively as a graph spec. A graph spec is JSON: a set of nodes connected by edges, where each node is either an LLM call, a tool call, or an end marker. The agent runtime executes the graph step by step, persists checkpoints between steps so runs can resume, and streams events back to the caller over SSE.

Agents are stored per-app. They can be invoked from your dashboard, the CLI, the MCP server, the SDK, or — for public agents — anonymous HTTP.

{
"spec_version": "1",
"entry": "classify",
"nodes": {
"classify": {
"type": "llm",
"model": "anthropic/claude-3.5-haiku",
"system_prompt": "Classify the user's request as 'billing', 'support', or 'sales'.",
"input_template": "{{ input.message }}",
"output_key": "category",
"tools": []
},
"respond": {
"type": "llm",
"model": "anthropic/claude-3.5-sonnet",
"system_prompt": "You are a {{ state.category }} agent.",
"input_template": "{{ input.message }}",
"output_key": "reply",
"tools": [
{ "source": "function", "name": "lookup_account" }
]
},
"done": { "type": "end", "output_template": "{{ state.reply }}" }
},
"edges": [
{ "from": "classify", "to": "respond" },
{ "from": "respond", "to": "done" }
],
"tools": {
"builtin": ["select_rows"],
"mcp_servers": [],
"functions": ["lookup_account"]
},
"limits": {
"max_steps": 20,
"max_tool_calls": 50,
"max_parallel_tools": 4,
"timeout_seconds": 300,
"human_timeout_seconds": 86400
}
}

The spec is versioned (spec_version: "1"), validated server-side, and rejected at run time if it references a tool that does not exist.

TypePurpose
llmCall a model. Renders system_prompt and input_template against current state, optionally exposes tools to the model, writes the response (or any tool results) into output_key.
toolInvoke a tool directly without an LLM. Useful for deterministic preprocessing (select_rows, a known function call, etc).
endRender output_template against state and emit it as the run’s final output.

Edges describe sequencing — each node, when it finishes, transitions to its outgoing edge’s target. The graph must be a DAG that reaches end from entry.

Three sources can be exposed to an agent:

Platform-managed tools that talk to your app’s database, storage, KV, etc. Listed in tools.builtin by name. The runtime enforces RLS by mapping the caller (end_userbutterbase_user, otherwise butterbase_service).

Tools served by a Model Context Protocol server you’ve registered via POST /v1/{app_id}/mcp-servers. Reference them in tools.mcp_servers by server_id + the subset of tool names you want exposed.

Any serverless function you’ve deployed and marked as an agent tool is callable by the agent. Mark a function:

POST /v1/{app_id}/functions
{
"name": "lookup_account",
"code": "...",
"agent_tool": true,
"agent_tool_description": "Look up a customer by email or account ID.",
"agent_tool_mode": "read_only",
"agent_tool_exposed_to": "developer_only"
}

Then list it in the agent spec’s tools.functions array. If you reference a function that does not have agent_tool: true, the runtime silently skips it — the dashboard editor warns you about this before save.

agent_tool_mode:

  • read_only — no approval needed.
  • read_write — the runtime pauses the run and emits a run_paused event with an approval payload; resume the run after a human approves.

agent_tool_exposed_to:

  • developer_only — usable only from dashboard / CLI test runs.
  • end_user — also usable from public agent invocations.
queued → running → (paused →) running → completed | failed | cancelled

A run progresses through the graph one step at a time. After each node, a checkpoint is written. If the runtime restarts, the run resumes from the last checkpoint.

Pausing happens when a read_write tool is invoked or the agent explicitly calls a HITL primitive — the next POST /runs/{id}/resume (or dashboard “Approve” button) continues from the checkpoint.

The complete event stream over SSE:

EventWhen
run_startRun accepted, before any node runs.
node_start / node_endA node is about to run / has finished. Includes node_id and step.
tool_call_start / tool_call_endA tool is about to be invoked / has returned. Includes tool_name, args, result.
llm_token_usagePer-LLM-call token counts.
run_pausedA read_write tool needs human approval.
run_cancelledThe caller cancelled the run.
run_failedAn error terminated the run.
run_endRun completed successfully; payload contains the rendered output_template.

Every POST /runs carries an implicit idempotency key — a SHA-256 hash of the request body. Re-posting the same body returns the existing run (HTTP 200) instead of creating a duplicate. Submitting a different body with the same agent name within the dedupe window returns 409 conflict.

Each spec declares its own limits. The runtime hard-caps them server-side:

LimitCap
max_steps200 graph traversals per run
max_tool_calls500 tool invocations per run
max_parallel_tools16 tools in flight
timeout_seconds1 hour wall-clock
human_timeout_seconds7 days while paused for approval

Apps also have per-agent access controls — max_runs_per_user_per_hour, daily_budget_usd, max_concurrent_runs, etc. — set via the dashboard’s “Access and limits” section or PATCH /v1/{app_id}/agents/{name}.

VisibilityWho can start runs
privateOwner of the app (dashboard, CLI, MCP).
authenticatedAny logged-in user of the app.
publicAnyone with the app’s public anon key. Use with caution — combine with strict rate limits and a daily_budget_usd.