Back ago, I built what I thought was the perfect AI agent. It could research, analyze, and solve complex problems. There was just one tiny issue: I wanted to murder it every time I used it.
The problem was that after diving deep into frameworks, papers, blogs, and tutorials, I kept running into the same fundamental issue. Most sources primarily focus on how to call tools, how to retrieve responses, and how to quietly loop until a final answer emerges.
So there I was, having a similar approach implemented, asking my agent to do some research and I’d watch a loading spinner for 30 seconds while my agent silently went through files, making tool calls without keeping me in the loop.
The only thing that partially kept me grounded was a terminal and a list of tool calls in my UI but if you’re familiar with tool calls, they are usually a nightmare to analyze. These things are huge, so while I was focused on the first tool call, the agent could already do 10 more and at this point how am I supposed to know what is going on?
At the end the agent would finally respond with a perfect, thorough analysis. But by then? I was convinced the thing had crashed and was already posting some weird posts on X.
The ReAct Trap
Most tutorials beat the drum of what I’ve dubbed the “Tool-then-Answer” model. It’s the classic ReAct (Reasoning and Acting) framework, and it looks deceptively clean when you just gaze at the code:
while True: response = model.generate() if response.type == "tool_call": execute_tool(response.tool) else: # It's content for the user. Yup, we're done here! return response.content
See? The logic seems super simple. Call a tool, or return content and bail. That’s precisely why it’s the go-to for all these folks.
This works great for:
- Simple, single-shot commands: When you need a quick answer or a single, well-defined action performed, like “Summarize this article” or “Find the capital of France.” The interaction is brief and doesn’t require any mid-task discussion.
- Automated background tasks: For operations that run without a human staring at a screen, like data clean-up, report generation, or system monitoring. Here, transparency during execution isn’t critical; only the final result matters.
- Rapid prototyping and initial development: When you’re just trying to get a basic agent up and running. This model offers predictable control flow, making it easier to debug the core logic without the added complexity of managing stream interruptions or LLM-driven termination signals.
- High-throughput batch processing: In scenarios where the agent processes large volumes of input without human supervision, such as transforming datasets or running analyses on vast numbers of documents.
These are the problems that come with this approach:
- The Silent Treatment Problem: Picture this: I ask my agent to “research the latest trends in AI safety”. It immediately starts making API calls to search engines, fetching papers, analyzing content… and I see absolutely nothing. Just a cursor blinking at me mockingly. Is it still working? Maybe.
- The Loop Killer: Even worse, what if the agent wants to just yield a normal response to me? Like “I found what they call the best coffee nearby, but there is also one more place I have to check. Wait a sec.” and gets terminated because the LLM returned a response. Agent won’t be able to continue unless you reply. Yikes.
- The Streaming Nightmare: You can’t stream responses as they’re being generated. You have to wait for the entire chain of tool calls to complete before seeing the first word. This means you either need to implement real-time display of tool calls, or explicitly block the agent on every tool call, forcing you to confirm before it proceeds.
Explicit Termination to The Rescue
I soon realized I was treating the agent like a black-box tool that either calls functions or spits out final answers. But that’s not what I wanted at all. I wanted a collaborative agent, one that thought out loud, showed its work, and actually communicated with me mid-task. You know, like a human pairing partner who actually bothers to explain themselves.
The core idea involved shifting from implicit termination, which relies solely on the output type, to explicit termination via a predefined stop marker. It’s like teaching your agent to say, “Okay, I’m genuinely done now”.
Architecturally, this means a custom termination evaluator that checks for this marker:
class AgentResponseMarkerTerminationEvaluator(AgentTerminationEvaluator):
def should_terminate(self, context: "AgentContext", current_step: "AgentPartialStep") -> bool: """ Evaluates termination based on the presence of an end response marker in the LLM's content. Termination occurs only if the `current_step` is of `AgentStepType.CONTENT` and its content contains the configured `end_response_marker`.
Args: context: The current `AgentContext`. current_step: The `AgentPartialStep` to evaluate.
Returns: True if the end response marker is detected in the current content step, False otherwise. """ if current_step.type != AgentStepType.CONTENT or not current_step.content: return False
# Retrieve the configured end response marker if get_config().end_response_marker in str(current_step.content): return True
return False
Now, here’s the kicker and the part most tutorials conveniently ignore: implementing this explicit termination isn’t just a code change. I also had to dive deep into prompt engineering, meticulously crafting instructions to make the agent understand and consistently use this new communication flow. Getting an LLM to reliably output a specific marker at the right time is notoriously tricky. It highlights why creating a robust prompt that works across various models is a whole art form in itself.
If you want to implement it, brace yourself for more things to go sideways. Yeah.
But what does this look like in the main loop? Ditching the restrictive “Tool-then-Answer” pattern for something far more flexible, our agent can now explicitly signal completion. Here’s a super simplified version of that termination logic in action:
while True: response = model.generate() if response.type == "tool_call": execute_tool(response.tool) continue elif response.type == "content": if response.content.endswith("<|im_done|>"): # Explicit termination signal return response.content.replace("<|im_done|>", "") else: # Show progress and continue stream_progress(response.content) continue
Now my agent can:
- Provide real-time updates and context: No more staring at a blank screen wondering if it crashed. My agent can narrate its process, explaining what it’s doing and why, keeping me fully in the loop.
- Ask clarifying questions mid-task: If it encounters ambiguity or needs a decision, it can pause, communicate the issue, and wait for my input instead of making assumptions or silently failing. This is a game-changer for complex prompts.
- Yield partial results and insights: For long research tasks, it can share interesting findings as it discovers them, allowing me to review, redirect, or even terminate the task if I’ve already got what I need.
- Adapt dynamically based on my input: Since it’s constantly communicating, I can jump in, provide new instructions, or course-correct its direction, turning a monologue into a genuine dialogue.
- Feel like a true collaborative partner: It’s less like an opaque automated script and more like pair programming with an incredibly efficient, transparent colleague.
The difference was night and day. I went from frustrated to actually enjoying the collaboration. Watching my agent work became genuinely entertaining, like pair programming with someone who reads documentation at superhuman speed.
But here’s the catch (because there’s always a catch): it requires a compliant model and a damn good prompt. The model has to remember to use the stop marker consistently. Miss it once, and you’re back to infinite loop hell.
Why Tutorials Get This Wrong
Most AI agent tutorials are written by people who:
- Treat agents as academic exercises, not daily tools: They focus on proving a concept in isolation rather than building something designed for continuous, iterative use by a human. A demo that runs for 3 seconds on carefully curated input is worlds apart from an agent handling real-world ambiguity for hours.
- Assume an ideal, compliant model: They sidestep the messy reality that LLMs aren’t perfect. Real agents need robust termination and communication mechanisms precisely because models can hallucinate, get stuck, or simply forget a convention.
- Lack real-world deployment experience: They haven’t endured the frustration of watching a production agent silently spin for minutes, leading to user abandonment or forced restarts. Their understanding often stops at “it works”, not “it’s actually usable”.
- Don’t understand the human element of collaboration: They fail to grasp that for complex tasks, humans desire and need to understand the agent’s thought process, not just its final output.
They teach the ReAct model because it’s easier to explain in a Medium article and doesn’t require handling the messy reality of agent being in charge of finishing it’s response.
The Verdict
Look, the “Tool-then-Answer” model? It’s foundational, sure. Like learning to tie your shoes before you run a marathon. But let’s be real, while it offers a straightforward conceptual model, it often falls short in real-world, collaborative scenarios. It’s the equivalent of having a genius friend who only communicates by sending you a finished painting instead of, you know, talking to you while they’re working. You eventually get the message, but the journey is agonizing.