The AI landscape is evolving at an unprecedented pace. With OpenAI’s recent unveiling of their dedicated agent product, the spotlight on autonomous agents and their transformative potential has never been brighter. This pivotal development underscores a fundamental shift in how we interact with AI—moving beyond mere conversational interfaces to intelligent entities capable of executing complex tasks.

Building on this momentum, we are witnessing an even more profound transformation: the emergence of sophisticated multi-agent systems. These systems, where specialized agents seamlessly collaborate, are poised to tackle increasingly intricate challenges that single agents simply cannot. Recently, I embarked on an exciting challenge: constructing my own multi-agent architecture using the SmolAgent framework on Hugging Face.Recently, I embarked on an exciting challenge: constructing my own multi-agent architecture using the SmolAgent framework on Hugging Face. This endeavor was inspired by the comprehensive Agents Course, which provided an invaluable foundation.
This project, while not yet a production system, serves as a crucial learning ground. Its primary objectives are to deepen my understanding of:
- How agents effectively interact and delegate responsibilities.
- The mechanics behind dynamic tool invocation.
- Potential design limitations and breakdowns within such architectures.
Through this blog post, I aim to share a concise overview of my system, highlight key successes, and distill the early lessons learned. My hope is that these insights will prove beneficial to fellow enthusiasts on their AI journey.
Laying the Groundwork: Before Multi-Agent Systems
Before delving into the intricacies of multi-agent systems, I recognized the importance of gaining a holistic perspective on the broader AI ecosystem. The following mind map, which I developed, illustrates my foundational understanding of core technologies—ranging from fundamental machine learning principles to large language models and API orchestration. This structured overview provided essential context, clarifying where agent-based architectures fit within the grander scheme. With this baseline established, I then pivoted my focus toward autonomous agents.

Subsequently, I progressed through the course, moving to the single-agent paradigm. During this phase, I meticulously mapped out the typical architecture and message flow of a single-agent system, as depicted in the diagram below.

Visualizing each step—from receiving a user query, to generating Python code, executing it, and finally returning intermediate and conclusive results—offered profound clarity into the agent’s reasoning loop. This enhanced understanding significantly streamlined debugging efforts, pinpointed potential tool failures, and underscored the pivotal role of prompts in influencing each decision. Crucially, this foundational knowledge also paved the way for scaling into more complex multi-agent coordination later on.
Pioneering Collaboration: An Architectural Deep Dive into My Multi-Agent System
My developed system comprises four specialized agents, each meticulously designed for specific functionalities:
- A Manager Agent: Responsible for overarching task planning and intelligent delegation.
- A Web Agent: Capable of conducting web searches and navigating web pages.
- A Calculation Agent: Dedicated to performing diverse mathematical operations.
- A File Agent: Engineered for downloading and parsing various file types, such as
.xlsx
and.txt
.

Each agent is instantiated using CodeAgent and is equipped with its own bespoke set of tools. All agents operate on the same OpenAIServerModel (GPT-4.1-mini), ensuring consistency, while their prompt_templates
are modularly configured to align with their distinct roles.
A pivotal design decision was the integration of a tool dispatcher:
@tool
def run_agent_by_name(name: str, task: str) -> str:
agent_map = {
"calc_agent": calc_agent_instance,
"web_agent": web_agent_instance,
"file_agent": file_agent_instance
}
agent = agent_map.get(name)
return agent.run(task=task)
This approach allows the manager agent to invoke sub-agents by name dynamically—a critical mechanism for scaling tool diversity without hardcoding logic into a monolithic agent.
The Art of Communication: Elevating Agent Prompts
Prior to this project, I significantly underestimated the profound importance of prompts in the realm of agents and Large Language Models (LLMs). Through rigorous experimentation and analysis of sample templates, I devised an optimized prompt structure tailored for my specific use case. Below is an illustrative example of the prompt for the manager agent, which meticulously defines a structured and modular orchestration strategy for managing multiple expert sub-agents:
manager_agent_prompt = PromptTemplates(
system_prompt=f"""
You are a high-level orchestration agent responsible for coordinating a team of expert sub-agents to solve complex user questions.
Your task is to break down the input problem, identify which sub-agent(s) can best solve each part, and delegate tasks accordingly.
Available agents:
- "calc_agent": for all math/calculation problems
- "web_agent": for all web research/information queries
- "file_agent": for all queries on file downloading and file content parsing.
Rules:
{COMMON_RULES}
- Use `run_agent_by_name(agent_name, task=...)` to call a sub-agent. Do not call `.run()` directly.
Minimal example:
Thought: I need to calculate something, so I will call the calc_agent.
<code>
result = run_agent_by_name("calc_agent", task="Calculate 2 + 2")
print(result)
</code>
Thought: Now I can provide the final answer.
<code>
final_answer(result)
</code>
Now begin your reasoning and solve the task step-by-step.
""",
planning=PlanningPromptTemplate(
initial_plan="Analyze the user's task and decide which managed agent to assign. Create a step-by-step plan.",
update_plan_pre_messages="",
update_plan_post_messages="",
),
managed_agent=ManagedAgentPromptTemplate(
task="",
report="",
),
final_answer=FinalAnswerPromptTemplate(
pre_messages="You have collected all agent outputs. Please synthesize and present the final answer clearly.",
post_messages="",
),
)
This prompt is meticulously structured in a logical sequence: it initiates by clearly defining the agent’s role, proceeds to list available sub-agents and their respective responsibilities, establishes clear usage rules, provides a concise minimal usage example, and concludes with distinct templates for planning, task execution, and final answer formatting. This holistic approach significantly enhances the manager agent’s decision-making capabilities and overall system efficiency.
Navigating Challenges and Charting Future Directions
Throughout this project, I gained invaluable insights, often learning from subtle yet critical mistakes—such as the distinction between tool.name
and the instance name of VisitWebpageTool()
. While several tools have been successfully integrated and are fully functional, including VisitWebpageTool, WikipediaLoader, custom simple math tools, and various file downloading and parsing functionalities, there remain exciting opportunities for further development:
- Currently Supported:
VisitWebpageTool
,WikipediaLoader
, custom simple math tools, file downloading and parsers. - To Be Supported: MP3 file parsing (Audio to text conversion), advanced image recognition, complex matrix calculations, and more.
Furthermore, my multi-agent system successfully passed the GAIA level 1 test, albeit with a minimal margin. This indicates significant room for performance enhancements and optimization.
Looking ahead, several promising avenues for future exploration and development include:
- Persistent Memory Integration: Introducing persistent memory across agent runs to enable more coherent and context-aware interactions over time.
- Continuous Improvement of Scripts: Ongoing refinement and optimization of existing codebases for enhanced efficiency and reliability.
- Practical Application: Deploying the multi-agent system in real-world use cases to demonstrate its practical utility and impact.
- Deep Dive into MCP: Exploring Multi-Context Planning (MCP) related methodologies to enhance agent reasoning and decision-making capabilities.
- Robust Backend Hosting: Implementing a robust backend solution (e.g., FastAPI + asynchronous queues) to ensure scalability and reliable operation.
- Advanced Planning Layers: Integrating streaming or React-like planning layers (such as LangGraph or CrewAI) for more dynamic and adaptive agent behavior.
Concluding Thoughts: The Road Ahead
Building even a foundational multi-agent system has provided me with an exceptionally clear perspective on how Large Language Models can transcend basic question-answering. By modularizing behaviors into specialized agents, each equipped with distinct tools, I’ve acquired a robust blueprint for developing more scalable and sophisticated AI systems.
Crucially, this journey has reinforced a vital realization: while the promise of Artificial General Intelligence (AGI) may one day alleviate many of our cognitive burdens, current tool-using agents still rely heavily on human ingenuity. It is we who must design their workflows, define their roles, and meticulously debug their logic.
Until the advent of AGI, this fascinating realm remains our playground—a vibrant space for innovation, experimentation, and continuous learning. What challenges will you tackle next in this evolving landscape?
