Akshay Parkhi's Weblog

Subscribe

How AWS Strands Agent Loop Works

7th March 2026

The Pattern: ReAct (Reason + Act) — Not Prompt Chaining

Strands uses the ReAct pattern — a recursive loop where the LLM reasons, optionally calls tools, observes results, then reasons again. It is NOT prompt chaining (where you have a fixed sequence of prompts). The loop is open-ended and driven by the model’s decisions.


The Core Flow

User prompt
    │
    ▼
┌──────────────────────────────────┐
│  event_loop_cycle()              │
│                                  │
│  1. Send to LLM:                │
│     - system_prompt              │
│     - messages (full history)    │
│     - tool_specs (available tools│
│                                  │
│  2. LLM responds with either:   │
│     a) stop_reason="end_turn"  ──┼──► Done! Return response
│     b) stop_reason="tool_use"  ──┼──► Execute tools, then RECURSE ↓
│     c) stop_reason="max_tokens"──┼──► Exception
└──────────────────────────────────┘                    │
          ▲                                             │
          └─────────────────────────────────────────────┘

What Exactly Gets Sent to the LLM Each Turn

Every single call to the LLM sends the ENTIRE conversation history. Here’s exactly what stream_messages() sends:

model.stream(
    messages,          # THE FULL message history (all turns)
    tool_specs,        # All available tool definitions
    system_prompt,     # System prompt (same every time)
)

So for a 3-tool-call interaction, the LLM calls look like:

Call #What’s sent
1system_prompt + [user_msg]
2system_prompt + [user_msg, assistant_msg_with_toolUse, user_msg_with_toolResult]
3system_prompt + [user_msg, asst_toolUse, toolResult, asst_toolUse_2, toolResult_2]

Yes — the full history grows with each cycle. The system prompt is sent every time (LLMs are stateless).


How Context is Maintained

Context is maintained through agent.messages — a single mutable list that accumulates all turns:

  1. User message → appended at invocation start (_append_messages)
  2. Assistant message (LLM response) → appended in _handle_model_execution after streaming completes
  3. Tool result message → appended in _handle_tool_execution after tools run
# In _handle_model_execution:
agent.messages.append(message)  # Assistant's response

# In _handle_tool_execution:
tool_result_message = {
    "role": "user",
    "content": [{"toolResult": result} for result in tool_results],
}
agent.messages.append(tool_result_message)  # Tool results as "user" role

The conversation format follows the standard alternating role pattern that LLMs expect:

user → assistant (with toolUse) → user (with toolResult) → assistant (with toolUse) → user (with toolResult) → assistant (final answer)

The Recursion Mechanism

The loop is recursive, not iterative:

# In _handle_tool_execution, after tools complete:
events = recurse_event_loop(agent=agent, invocation_state=invocation_state)

Each recursion is a new event_loop_cycle() call. The recursion continues until the LLM returns stop_reason="end_turn" (meaning it’s done and doesn’t want to call more tools).


Context Window Management

When the accumulated messages exceed the model’s context window:

# In _execute_event_loop_cycle:
except ContextWindowOverflowException as e:
    self.conversation_manager.reduce_context(self, e=e)  # Trim history
    # Then retry with reduced context

The ConversationManager strategies handle this:

  • SlidingWindowConversationManager (default): Drops oldest messages
  • SummarizingConversationManager: Summarizes old messages into a compact form

Key LLM Concepts Used

ConceptHow Strands Uses It
ReAct LoopLLM reasons → calls tools → observes → reasons again (recursive)
Tool CallingLLM returns structured toolUse blocks; framework executes and returns results
Stateless LLM CallsFull history + system prompt sent on every call (LLMs have no memory between calls)
StreamingResponses streamed chunk-by-chunk via model.stream()process_stream()
Context WindowManaged by ConversationManager — sliding window or summarization when full

What It’s NOT

  • Not prompt chaining: There’s no fixed pipeline of prompts. The LLM decides when to stop.
  • Not RAG: Though tools can do retrieval, the loop itself is pure ReAct.
  • Not multi-turn memory: The agent doesn’t have persistent memory between agent() calls unless you keep the same agent.messages list (which it does by default).

Summary

Strands implements a pure ReAct agent loop: send the full conversation (system prompt + all history + tool specs) to the LLM every turn, let it decide whether to call tools or answer, execute tools if requested, append results to history, and recurse. The agent.messages list IS the context — it grows with every turn and gets sent in full each time. Context window overflow is handled by the conversation manager trimming or summarizing history.

This is How AWS Strands Agent Loop Works by Akshay Parkhi, posted on 7th March 2026.

Next: How AWS Strands Hooks Work

Previous: How Claude Team Agents ACTUALLY Connect — No Fluff