Token Efficiency
Learn how AgentML minimizes redundant token usage through system prompt caching and runtime snapshots.
Efficient Token Usage
Large Language Model calls can be expensive, so AgentML is designed to minimize redundant token usage. Two key mechanisms help achieve this: system prompt caching and runtime snapshots.
System Prompt Caching
How It Works
The AgentML interpreter constructs a system prompt that includes the static context of the agent – essentially the SCXML/AgentML document itself (which defines states, transitions, and schemas) plus the current runtime snapshot.
This system prompt can be kept constant across many interactions, with only small updates, so many LLM providers will allow caching it or reusing the same conversation context. Because the agent definition rarely changes at runtime, the LLM doesn't need that repeated every time beyond the initial prompt.
Runtime Snapshots
AgentML provides the LLM with a snapshot of the agent's state at runtime. This snapshot includes:
- •The active state configuration (which states are currently active, including the full hierarchy)
- •The values of all
<data>
variables in the datamodel - •The event queues (internal and external events pending)
- •The "available transitions" i.e. which events the agent is expecting or can handle next, including their schemas
Benefits of Runtime Snapshots
Because the LLM is informed of exactly what state it's in and what events it can trigger next, you do not need to include lengthy descriptions of all possible actions in your user prompt.
The prompt can remain minimal (e.g. "Classify the user's request into one of the available intents") because the LLM already knows from context what the valid intents are. This greatly reduces prompt size and eliminates repetition.
Token Minimization Example
Without AgentML (Naive Approach)
Suppose your agent has 5 possible next events. You might list them every time:
"If the user wants to search a flight, output JSON X; if they want to book a hotel, output JSON Y; if they want to rent a car, output JSON Z; ..."
With AgentML
The agent runtime already provided those possibilities and schemas in the context, so your user prompt might be as short as:
"User said: 'I need to book a hotel.' Determine the user's intent."
The LLM sees the state snapshot that includes an intent.hotel
event schema and others, so it will choose to output the intent.hotel
event with the proper structure, without you explicitly telling it the schema in the prompt each time.
Next Steps
Learn how namespaces extend AgentML with additional functionality like LLM integration and memory storage.
Namespaces →