ampersend

Shift From API Keys to Per-Request Payments: Solving Agent Loop Spending

How x402, ampersend, and BlockRun Prevent Runaway AI Agent Costs

Edge & Node|February 24, 2026

A multi-agent system mistakenly burned $47,000+ in API costs. No hacker. No breach. Just bad infrastructure controls.

Two AI agents were stuck in a recursive loop for 11 days, each one asking the other for clarification, each one convinced it was making progress. Nobody noticed until the invoice arrived.

If you're building with LLMs today, this is not an edge case. It's a problem many teams will eventually face.

This is what's referred to as an agent loop problem, and it exposes a deeper issue with AI infrastructure.

These agents were handed API keys, the equivalent of giving them corporate credit cards, with no real-time spending governance. When the loop started, nothing existed at the infrastructure layer to stop it.

The good news: Edge & Node has built an open-source system called ampersend that makes this type of failure impossible.

With ampersend, every LLM call becomes a real USDC payment with spending limits enforced at the wallet level instead of application code. That means that when the agent's budget runs out, the agent stops spending money even if the code keeps running.

This post showcases why the problem is harder than it looks, and how to solve it at the protocol level with the combination of x402, BlockRun.ai and ampersend.

Agent loops are an infrastructure problem, not a code problem

Most teams building agent systems know all the usual advice: add step limits, set token caps, monitor for repeated outputs.

These are all good best practices but they're also not enough to prevent an agent loop problem.

Here's why.

Step limits don't survive composition.

Agent A calls Agent B, which calls Agent C. Step limits are local to each agent.

If each agent is allowed 50 steps, the system can easily execute 150 total steps. When recursive calls are involved, costs compound quickly. In a system where each call has a cost, costs can easily exceed the budget when step limits don't work as intended.

Token caps are estimates, not enforcement.

Most LLM APIs let you set max_tokens on a response. This limits output length, not spending.

An agent that sends 50 requests with modest outputs can still accumulate serious spend and setting max_tokens doesn't limit the number of requests, which is the actual vector for runaway costs.

Monitoring is reactive.

Observability dashboards tell you what happened but by the time you see a cost spike, the money has already been spent.

In the $47K incident, monitoring was in place, but it simply reported outcomes rather than intervening. Application-level budget checks can be bypassed.

If your code checks a counter before each API call, that counter lives in the same trust domain as the agent.

A bug that causes the loop can also break the counter. The agent that doesn't know it's looping also doesn't know it should stop spending.

In other words, anything that depends on the agent's own logic to limit its spend will fail in exactly the scenarios where limits matter most: when the agent is misbehaving.

You need a control layer that is external to the agent, that can't be circumvented by application bugs, and that enforces hard economic boundaries on every single request.

The solution: make every LLM call a payment

The budget problems described above share a root cause: payment and execution are decoupled.

The x402 protocol addresses this by redefining how agents can access LLM inference. Instead of authenticating with an API key and settling costs later via an invoice, each request is a discrete payment transaction.

Blockrun AI is a platform that breaks the traditional payment model by enabling pay-per-use access to many mainstream LLMs via the x402 payment protocol.

No API key.
No subscription tier.
No monthly bill.
Each request either pays or it doesn't execute.

This is a fundamental shift.

With API keys, spending authority is granted once (when the key is issued) and revoked manually (when someone notices a problem).

With x402 spending authority is exercised and verified on every single request. The feedback loop is immediate: if the payment doesn't go through, the inference doesn't happen.

But pay-per-request alone doesn't prevent runaway spending. An agent stuck in a loop will keep paying as long as it has funds.

Introducing ampersend: the wallet that enforces your budget

This is the gap ampersend was built to address.

ampersend is agentic payment infrastructure that gives autonomous agents programmable wallets with built-in spending controls and real-time observability.

With ampersend enabled, when an agent requests a payment signature:

If the agent's daily spend set within ampersend is under the limit, the wallet signs the transaction and the request proceeds.
If the daily spend has reached the limit, the wallet refuses to sign. The request fails. The agent is economically dead, it can keep running, but ampersend won't let it pay for anything.

This is the critical distinction, with ampersend, the spending limit is not in the application code. It lives in the wallet policy. The agent's code cannot override it, bypass it, or accidentally skip it. Even if the agent is stuck in an infinite loop, prompt-injected, or broken by orchestration bugs, the wallet remains the final authority.

Architecture diagram showing the x402 payment flow between BlockRun, Agent, and ampersend with policy enforcement and USDC on-chain settlement — The x402 payment flow: Agent sends a request, BlockRun responds with 402, ampersend checks policy and signs payment, agent retries with proof, BlockRun returns the LLM response.

BlockRun: LLM inference that accepts x402 payments

BlockRun is redefining how developers access LLM inference.

Instead of issuing API keys, accounts, or subscriptions, BlockRun uses x402 as its native payment layer. Every request is authenticated by payment. If no payment is attached, the API returns 402 Payment Required on every request. You pay per query in USDC.

No keys to manage.
No credit cards on file.
No billing surprises.

BlockRun supports both mainnet (Base) and testnet (Base Sepolia) with OpenAI-compatible endpoints, so existing code that uses the OpenAI SDK can point at BlockRun with minimal changes.

BlockRun doesn't just improve billing, it replaces traditional LLM APIs:

Traditional API	BlockRun (x402)
API key authentication	Payment is the authentication
Post-hoc billing (monthly invoice)	Pre-paid per request (instant settlement)
Spending limit = credit card limit	Spending limit = wallet policy
Revocation requires key rotation	Revocation is automatic (wallet limit)
Cost attribution is manual	Cost is on-chain and auditable

For agent builders, this means you can give an agent access to GPT-class models without giving it an API key that could be leaked, shared, or exploited beyond your intended budget.

How ampersend, BlockRun, and x402 break the agent loop problem

Agent sends an inference request to an LLM API (BlockRun).
BlockRun responds with HTTP 402 Payment Required and includes payment details (amount, destination address, network).
The agent's ampersend treasurer checks the request against the wallet's spending policy — daily limit, per-transaction cap, allowlisted destinations. If the policy allows it, the treasurer signs a USDC payment. If the limit has been reached, the treasurer refuses to sign and the request dies here.
The agent retries the request with proof of payment attached.
BlockRun verifies the on-chain payment and returns the inference result.

Does ampersend really stop runaway payments for agents stuck in a loop?

We built a load test that deliberately simulates a disaster scenario. It fires requests in an infinite loop as fast as possible, with configurable concurrency until something stops it.

With a traditional API key, nothing stops it. The loop runs until the credit card is maxed out or someone manually intervenes.

With ampersend:

The first N requests succeed. Each one is a real USDC payment. The spend counter increments.
When the agent's daily limit is reached, the ampersend treasurer refuses to sign the next payment.
The request fails. The agent receives an error instead of an LLM response.
After several consecutive failures (the load test uses 10 as the threshold), the system recognizes the wallet is blocked and exits.
The total spend is exactly the daily limit you configured, not a dollar more.

Sequence diagram showing a runaway agent sending payment requests to the ampersend wallet, which signs them while under the daily limit and refuses when the limit is reached, blocking the agent — ampersend blocks a runaway agent: requests are signed while under the daily limit, then refused once the limit is reached.

The loop may continue because logically the code still wants to send requests, but financially, it's dead. The wallet, not the code, is the circuit breaker.

Why this matters for agent builders

If you're building systems where AI agents call LLM APIs, whether that's a single coding agent, a multi-agent pipeline or an autonomous agent swarm, the loop spending problem will eventually find you. It might not be thousands of dollars. It might be $200 on a weekend when nobody's watching. But the structural vulnerability is the same: unbounded financial authority granted via API keys, with no infrastructure-level enforcement.

The shift we're advocating is simple:

Replace API keys with per-request payments. x402 makes every LLM call an explicit economic transaction.
Enforce budgets at the wallet layer, not the application layer. ampersend's spending limits can't be bypassed by bugs in your agent code.
Make costs on-chain and auditable. Every payment is a USDC transaction, visible on-chain. No more guessing where the spend went.

This isn't about crypto ideology. It's about using programmable money to solve a real engineering problem: how do you give an autonomous system access to expensive resources without giving it unlimited spending authority?

The answer is the same one that every other infrastructure domain has learned: governance belongs at the platform layer, not the application layer. Kubernetes doesn't trust your containers to self-limit CPU usage. Rate limiters don't trust your services to self-throttle. Your agent infrastructure shouldn't trust your agents to self-budget.

Try it yourself

The full reference implementation is open source:

Repository: ampersend-blockrun-agentops
Demo: https://youtu.be/92nyRNCFVZ8
ampersend SDK: github.com/edgeandnode/ampersend-sdk
BlockRun: blockrun.ai
x402 Protocol: x402.org

Request beta access at https://ampersend.ai.

Citations

← Back to Blog