Token Efficiency

Learn how AgentML minimizes redundant token usage through system prompt caching and runtime snapshots.

Efficient Token Usage

Large Language Model calls can be expensive, so AgentML is designed to minimize redundant token usage. Two key mechanisms help achieve this: system prompt caching and runtime snapshots.

System Prompt Caching

How It Works

The AgentML interpreter constructs a system prompt that includes the static context of the agent – essentially the SCXML/AgentML document itself (which defines states, transitions, and schemas) plus the current runtime snapshot.

This system prompt can be kept constant across many interactions, with only small updates, so many LLM providers will allow caching it or reusing the same conversation context. Because the agent definition rarely changes at runtime, the LLM doesn't need that repeated every time beyond the initial prompt.

Runtime Snapshots

AgentML provides the LLM with a snapshot of the agent's state at runtime. This snapshot includes:

•The active state configuration (which states are currently active, including the full hierarchy)
•The values of all <data> variables in the datamodel
•The event queues (internal and external events pending)
•The "available transitions" i.e. which events the agent is expecting or can handle next, including their schemas

Benefits of Runtime Snapshots

Because the LLM is informed of exactly what state it's in and what events it can trigger next, you do not need to include lengthy descriptions of all possible actions in your user prompt.

The prompt can remain minimal (e.g. "Classify the user's request into one of the available intents") because the LLM already knows from context what the valid intents are. This greatly reduces prompt size and eliminates repetition.

Key Insight

It's effectively like giving the LLM a structured memory of the conversation so far (via state and data) and a map of what it can do.

Token Minimization Example

Without AgentML (Naive Approach)

Suppose your agent has 5 possible next events. You might list them every time:

"If the user wants to search a flight, output JSON X; if they want to book a hotel, output JSON Y; if they want to rent a car, output JSON Z; ..."

With AgentML

The agent runtime already provided those possibilities and schemas in the context, so your user prompt might be as short as:

"User said: 'I need to book a hotel.' Determine the user's intent."

The LLM sees the state snapshot that includes an intent.hotel event schema and others, so it will choose to output the intent.hotel event with the proper structure, without you explicitly telling it the schema in the prompt each time.

Next Steps

Learn how namespaces extend AgentML with additional functionality like LLM integration and memory storage.

Namespaces →