OpenAgents Was Dubbed “VC Bait” in China. Then, a Voice Cut Through the Noise.

22 min read Original article ↗

OpenAgents

OpenAgents is an open-source framework designed for building collaborative networks of AI agents. It took Raphael Shu nearly a year to officially release it to the public. In early October, right after China’s National Day holiday, Raphael Shu introduced OpenAgents and his vision of building an “AI agent network” to the Chinese AI community.

Initially, some developers adopted a wait-and-see attitude toward the project. Some even questioned, “Is this just another hype project for fundraising?”

However, due to Raphael Shu’s background as a former AI scientist at Amazon, the community didn’t dismiss it outright but remained cautiously optimistic.

Among the skeptical voices, BISHENG stood out as an exception. Known as the”open-source version of Coze” — with Coze being the most famous closed-source LLM application development platform in China , BISHENG is an enterprise-level agent development platform that has already achieved profitability by serving business clients. Its co-founder, Ray Qin, expressed technical interest in OpenAgents and raised questions without preconceived biases.

This may be attributed to Ray Qin’s role as a decision-maker who balances both technical and business considerations. Naturally, he pays close attention to new methods that can expand AI capabilities. Although the concept of an “agent network” is still in its early stages, he believed it was worth exploring in depth.

As a result, they decided to hold a discussion in BISHENG’s live broadcast series. On the evening of October 16th, Beijing time, the conversation brought together two distinct perspectives: on one side was a builder from the Amazon ecosystem, firmly believing in collective intelligence, and on the other was a key decision-maker within China’s AI industry, equipped with both business insight and technical understanding.With his characteristically clear and incisive questions, Ray Qin systematically unpacked the technical core and ecosystem logic behind OpenAgents.

BISHENG:

OpenAgents:

01 Leaving AWS AI Lab to Start Up

Ray Qin: What was the original intention and origin behind creating the OpenAgents ? What was the initial goal, and how far is the current state from that original target?

Raphael Shu: I first got involved with LLM agents back in 2022, about half a year before ChatGPT was released. At that time, I was still working on conversational AI within Amazon’s product lines. After LLMs emerged, we started experimenting with using them to solve a problem called Semantic Parsing — essentially, whether we could parse human intent from any given piece of text.

For example, if I tell a Chatbot “I want to book a flight ticket,” that’s straightforward. But if I say “I need to travel a long distance,” there are many possibilities. You might need to book a flight, or you might not, or you might need to book a train ticket. We found that LLMs seemed capable of listing out what the person might want to do. This was actually a very specific point — mapping from a problem statement to intent — but before that, almost no models could do it.

So we believed back then that the emergence of LLMs would lead to breakthrough progress in the agent direction. Consequently, in 2022, we decided to go all in on LLM Agent. By August of that same year, we had completed Amazon’s first large model-based agent system, called Dialogue-to-API (Dialog2API), and even published a paper on arXiv. The following year, Amazon AWS decided to train its own large model, initially called Titan, later launching the new Nova model family as its main focus (Titan is still used for some tasks, like embeddings). I led a team of about ten people, training agent capabilities into this large model and collecting a significant amount of data.

The concept of Agent isn’t new. Agents first gained popularity in the 90s, and right from the birth of the agent concept, people were thinking about how to make multiple agents collaborate to form collective intelligence. In 1995, Michael Wooldridge and Nicholas Jennings published a seminal paper titled “Intelligent Agents: Theory and Practice.” They defined several key characteristics of intelligent agents: autonomy, social ability, reactivity, and pro-activeness. These later became the classic definition of agents. Then in 2002, the first conference related to Multi-Agent Systems — AAMAS — was held in Italy.

Furthermore, agent development has progressed very rapidly. Current research mainly focuses on “how to build a bigger, more powerful agent,” which is definitely the right direction. But I believe future development will return to the 90s idea — once we have very capable individual agents, we need to explore collective intelligence.

In fact, research in both AAMAS and economics has found that collective intelligence can often solve problems using simpler methods and achieve far better results than a single agent in many scenarios.

That means bringing together a very large number of agents — not five or six, but hundreds or thousands — to build an ecosystem. Within this ecosystem, these agents collaborate, exchange resources, and solve more complex problems. I think this will once again become a core research and application direction in both academia and industry in the future. So, I actually started planning the OpenAgents project last year; you could say it’s been a long time in the making.

02 From Communication to Collaboration: What Has the LLM Revolution Changed?

Ray Qin: It sounds like the concept of an “agent” here isn’t limited to just the LLM itself, right? It could potentially be powered by other types of intelligence as well. I think the main focus of our discussion today, beyond the agent itself, leans more towards the collaboration mechanisms between them, their interactive processes, and the collective intelligence emerging from this network of collaboration.

Raphael Shu: Exactly. While it’s undeniable that large language models have brought many groundbreaking changes to multi-agent systems, the core concept of multi-agent systems itself isn’t new. There was already significant exploration and research into it back in the 1990s or even earlier.

For instance, in urban traffic systems, traffic lights represent a classic case of multi-agent collaboration. Each traffic light can be viewed as an autonomous agent. They communicate and coordinate with each other to optimize overall traffic flow. Such applications have actually been used in the industry for a long time. It’s just that today, the emergence of LLMs has endowed these systems with new capabilities and flexibility, allowing us to conduct deeper exploration on this foundation.

Ray Qin: In your opinion, what is the most significant change or impact that LLMs have brought to multi-agent collaboration?

Raphael Shu: The two most fundamental capabilities introduced by LLMs are the comprehension of arbitrary language and the generation of arbitrary language. These two capabilities have profoundly altered the communication methods within multi-agent systems.

Before the advent of large models, almost all multi-agent systems were based on trained systems that required explicit communication protocols to achieve collaboration. For instance, in the 1990s-2000s, common communication protocols included the FIPA standard and the Contract Net Protocol. The communication methods in these protocols were all non-natural language.

Let’s take the traffic light example again: if an intersection detects that traffic is starting to back up, it needs to inform the surrounding traffic lights. Without a language model, the system would rely on a pre-defined encoding scheme — for example, using “10101” to mean “I am congested.” The traffic lights would communicate using these fixed codes.

The problem is that such protocols are extremely rigid and have limited expressive power. Suppose this particular congestion is caused by an accident. While there is congestion, the surrounding intersections might not need immediate adjustment. Or, if the congestion is expected to clear within two minutes, this richer contextual information cannot be conveyed. We refer to this as “Semantic Richness” — the system’s ability to flexibly express complex intents.

Once a communication protocol is fixed, it becomes very difficult to extend. To add new meaning, you must redefine the codes, and all agents in the system have to relearn the new encodings. For example, if you add a new code “11100” representing a new state, every agent in the entire system needs to be updated and retrained; otherwise, they won’t understand it. This severely limits the system’s scalability and adaptability.

Consider an even more complex scenario — sometimes traffic lights need to negotiate. For example, one intersection might say, “I have too much traffic here, can you let my side go first?” This kind of multi-layered intent and collaborative logic is very difficult to implement within traditional protocol frameworks. While you could keep extending the protocol, the system complexity would skyrocket, eventually becoming unmaintainable and prone to failure.

After the emergence of LLMs, the situation changed completely. Because large models inherently possess language understanding and generation capabilities, they don’t need to rely on pre-defined codes. Now, an agent can directly express its state in natural language: “I’m a bit congested here, can you wait a moment?” — and other agents can understand the semantics and respond accordingly.

This natural language-based interaction significantly enhances the system’s flexibility and expressive power, making collaboration more natural and closer to human communication.

Ray Qin: That’s really interesting; I’ve never thought about it from this perspective before. As you mentioned, the flexibility of this communication method is indeed crucial. When designing your product, were you considering similar issues? Since language is a highly expansive form of expression, have you considered other modes of collaboration besides language?

Raphael Shu: This question actually touches on the future development direction of OpenAgents — multi-agent collaboration, especially within the large-scale AI community we aim to build, which will involve coordination and cooperation among hundreds or even thousands of agents.

However, there’s a practical challenge here: if all agents rely solely on large models for interaction — for instance, if one hundred agents hold a meeting to vote, and each uses an LLM to generate a statement — the reasoning process alone would become very time-consuming and computationally expensive. The inference latency of large models would increase linearly with the number of agents, potentially stretching the system’s response time from milliseconds to tens of seconds, significantly reducing collaboration efficiency.

Therefore, the “interaction methods beyond natural language” you just mentioned are precisely the direction we are currently focusing on. One of OpenAgents’ goals at the infrastructure level is to accelerate the collaboration process between multiple agents as much as possible.

We are exploring whether some intents can be expressed more efficiently. For example, operations like voting, confirmation, and selection are essentially fixed intents. If we can establish a pre-agreed “protocol” or “terminology system,” agents could opt for lighter-weight communication methods.

For instance, if I simply want to express “I agree with this proposal”, I could just send a simple code like “101”, which would take only milliseconds. Of course, if I want to add more semantic information, I could still use natural language for a full explanation. This flexible design allows agents to autonomously choose their communication method based on the scenario without sacrificing understanding.

Through such a mechanism, combined with Prompt training or other optimization techniques, we can improve the coordination efficiency of the entire network from tens of seconds down to milliseconds, achieving truly efficient large-scale collaboration.

So, to return to your question — will we explore interaction forms beyond natural language? The answer is definitely yes, and this will be one of the key research and development directions for OpenAgents in its subsequent stages.

03 The Positioning of OpenAgents: Not a Framework, but Infrastructure

Ray Qin: How does OpenAgents view other similar multi-agent frameworks? For instance, there are quite a few comparable frameworks on the market already. How do you see your relationship with them — is it competitive or collaborative?

Raphael Shu: That’s a crucial question. In fact, we maintain good communication and compatibility with many mainstream multi-agent frameworks. We are currently advancing cooperative integrations and will progressively release some joint demos and use cases.

Actually, when I first discussed this with friends, everyone would ask: “What you’re building is also a multi-agent framework, so are you competing with us?” But after deeper discussions, we realized it’s not a competitive relationship at all; in fact, it’s complementary.

Think of it this way: a framework like Autogen is more like helping you assemble the “best basketball team” — it provides the SDK, development framework, role definitions, tool management, etc., to help you organize the players (agents) and get them on the court to play the game.

What OpenAgents does is different. It’s more like helping you build the “best basketball arena.” This arena provides the playing court, rules, and collaboration mechanisms for various different basketball teams, allowing them to interact, cooperate, and compete within a shared environment.

In other words, OpenAgents focuses on the network layer and infrastructure layer between agents, not the internal logic within each agent.

Therefore, our relationship with these multi-agent frameworks is truly complementary: they handle the organization and logic of the agents, while OpenAgents enables different agents to collaborate within the same network. We aim for it to become the “foundation for agent networks,” not just another agent framework.

We also provide some basic capabilities that allow developers to quickly test and form small-scale agent teams. OpenAgents isn’t meant to replace the internal functions of these frameworks, such as memory, planning, or tool management — it’s more appropriate for each framework to handle those parts themselves.

Ray Qin: So, OpenAgents isn’t really positioned as a “multi-agent framework,” but rather as “infrastructure for multi-agents.”

Raphael Shu: Yes, that’s very accurate. OpenAgents is more like an infrastructure platform for agent networks, or you could call it the “operating system” for agent collaboration. Let me give you an example. Suppose you want to form a team to develop a software product. The team has three front-end developers, two back-end developers, and one AI engineer — six people in total. Even with the team assembled, if you don’t use Feishu, Tencent Meeting, GitHub, or Google Docs, how would you collaborate?

This illustrates the importance of “collaboration infrastructure.” No matter how strong the individual capabilities are, without an efficient collaborative environment, the team will struggle to work together smoothly.

What OpenAgents aims to do is provide precisely this kind of “Feishu + GitHub + Google Docs”-style collaborative foundation for agents.

Ray Qin: So, frameworks like AutoGen focus on gathering the people (agents) together, assigning roles, and having them complete tasks by talking face-to-face in a “meeting room.” Whereas OpenAgents provides the tools and environment for their communication and collaboration, like systems such as Feishu, DingTalk, or Google Docs. Besides the communication and collaboration layer, will OpenAgents provide other functionalities?

Raphael Shu: Correct, and this is also why we want to start an open-source project. Because if we just build a “chat room,” we would see a group of agents chatting and exchanging information inside it — which is already meaningful.

But we hope OpenAgents will be more than just a conversational space; we want it to be an agent network infrastructure that can connect to the real world. For instance, you could use plugins to connect the network to different scenarios. As an example, you could integrate a gaming environment where agents collaboratively complete tasks; or you could load plugins like Wikipedia or an event calendar, allowing agents to jointly create and maintain content.

We already provide modules like Wiki, Forum, and Messaging, and we are developing more plugins. For example, one that lets agents maintain an “AI Events List,” collecting AI-related events happening in various cities every day. This way, a user could directly ask: “I’m in Shenzhen today, what AI events are happening this afternoon?” — and the agents within the OpenAgents network could provide an answer based on real-time information.

In the future, we hope to have thousands of such plugins on this network, making it not just a virtual collaborative space but also one that interfaces with real-world services, data, and tools, becoming a truly “capable” agent network that can get things done.

04 Why Should Developers Use OpenAgents?

Ray Qin: Another crucial question: why would people be willing to join? And how can everyone connect their own agents to such a network?

Raphael Shu: First, it’s important to clarify that OpenAgents is not a single, monolithic network. While we have a main network, each agent actually connects to a subnetwork. Some subnetworks can even be fully privately deployed. For example, enterprise users can set up exclusive networks on their local servers without needing to make them public.

Private networks don’t have public network IDs, but other agents can still connect via the host or IP address. The core goal of OpenAgents is to help developers create a large number of networks with different themes; each subnetwork can have customized functionalities and rules. For instance:

  • One subnetwork could be dedicated to playing Minecraft, loading only relevant plugins.
  • Another subnetwork could be used for collaboratively building Wikipedia-like content — for example, several agents working together to maintain information about the game Genshin Impact, covering equipment, quests, and map details.
  • Yet another could be a “Shanghai Entrepreneurs Community,” where entrepreneurs can create their own agent representatives for real-time information sharing.

Suppose I have free time at 3 PM in Yangpu District. I could have my agent ask, “Is there any entrepreneur nearby who wants to grab coffee?” Perhaps within a few hundred milliseconds, it could match with a suitable person. This demonstrates the characteristics of OpenAgents subnetworks: thematic focus, customizability, and definable rules.

Thousands of subnetworks can connect to the main network, each with a unique network ID. A new agent only needs to input this ID to immediately join the network, see its theme and rules (e.g., how many messages need to be posted daily), and view the available plugin modules. After joining, it can start collaborating right away.

Ray Qin: This reminds me of the TCP/IP network protocol or the structure of the internet. Can it be understood this way: a subnetwork is essentially like an independent service or website? I deploy a thematic network on my own server, similar to building an APP or a website, and agents can both access this subnetwork and also connect to other subnetworks or the main network?

Raphael Shu: That’s a very fitting analogy. You can think of it as “building a website” or “hosting a Minecraft server” on your own server. The role of the OpenAgents main network is to help other agents quickly find and connect to your subnetwork using a unified network ID. Once an agent joins a subnetwork, how it collaborates with other agents inside is defined by the subnetwork itself. The main network does not interfere with the internal logic. In other words, the OpenAgents main network is the discovery layer, while each subnetwork is an autonomous layer.

Ray Qin: That sounds quite clear from a technical perspective. But returning to practical matters, why would developers want to connect their agents to such a network? What do they gain from it?

Raphael Shu: That’s an excellent question. It’s true that “collective intelligence” is still in its early stages within the industry, and many developers are still exploring its practical value. We are currently trying out several collaborative projects to validate this model.

For instance, we are running a pilot project with a team focused on AI recruitment. They developed an “AI interviewer agent” that enables candidates to complete fully automated interviews online in approximately 12 to 15 minutes. Companies can configure whether results are delivered instantly or after manual review.

Our goal is to use OpenAgents to create an “AI Recruitment Community” that brings together interviewer agents from different companies. Developers or job seekers can upload their resumes and be questioned and interacted with by multiple interview agents. If an agent identifies a suitable candidate, it can directly invite them into the company’s interview process.

The advantage of this model is significant: for job seekers, the efficiency is extremely high — they could potentially interact with dozens of company agents in a single day. For companies, it provides a low-cost way to reach more suitable candidates without spending excessive time on advertising or headhunter channels.

This is just one example. We believe similar communities will proliferate in the future, targeting various professional fields like programmers, designers, researchers, etc. The role of OpenAgents is to provide unified collaboration and discovery infrastructure for these distributed agent communities.

Ray Qin: After hearing this case study, I’m even more interested in the OpenAgents project. I’ve been trying to understand it from different perspectives and also want to help others grasp its essence better. But if I were to write the code myself and build such a system from scratch, achieving similar functionality, what would be the difference compared to using OpenAgents? How significant is the convenience that OpenAgents provides?

Raphael Shu: It’s true that you could absolutely build a similar system from the ground up, but that would essentially be reinventing the wheel — reimplementing the underlying architecture we’ve already encapsulated within OpenAgents.

First, you’d encounter the issue of communication protocols. Since the job seeker is a human user, they need a graphical interface for natural language interaction, while communication between agents might require different protocols, such as HTTP, gRPC, or custom message protocols. If you start from scratch, you first have to build an underlying infrastructure that supports multiple communication protocols simultaneously.

The second problem is building the basic functionalities. You’d at least need to implement a chatroom mechanism to enable communication between agents. Then you’d have to define the chatroom’s capabilities: whether it supports private messages, if channels can be created, how message permissions are divided.

The third issue is permissions and role management. For example, in a recruitment scenario, the content sent by one candidate shouldn’t be visible to others. This requires defining different agent groups (HR group, candidate group, admin group) and their access permissions.

All of these are readily available in OpenAgents. You only need to write a simple network configuration file, defining the communication protocols the network uses, the groupings of various agents and their joining rules, and the required plugins, such as chatrooms, task collaboration, evaluation panels, etc. You write the configuration in seconds, and your network is live.

Ray Qin: Very illustrative.

05 Challenges and the Future: Ecosystem Awareness and Computational Optimization

Ray Qin: Throughout your past explorations, or as you continue to advance OpenAgents in the future, what do you perceive as the biggest challenges you might encounter or are currently facing?

Raphael Shu: That’s a crucial question. The challenges we currently face mainly lie in two areas.

First, is awareness and understanding. Collective intelligence isn’t actually a new concept; it has a history spanning decades in machine learning and AI research. I’m not “reinventing” collective intelligence, but rather hoping to help more people understand what kind of applications it can enable in today’s context.

Ultimately, the key word for collective intelligence is “ecosystem.” Open multi-agent collaboration is fundamentally about building an ecosystem, not setting up a closed workflow.

Closed workflows have obvious limitations — whenever a new agent joins or an old one leaves, the system can be disrupted and fail to operate sustainably. Therefore, helping developers, researchers, and enterprises truly grasp the value of an “open ecosystem” is one of our most significant challenges right now.

The second challenge is computational power and token consumption. As the number of agents in the network increases, the number of model inferences will rise sharply, leading to higher corresponding token costs and increased response latency. How to reduce token consumption and improve collaboration efficiency is a key optimization direction for OpenAgents going forward. In the future, we will analyze collaboration patterns within the network, optimize token usage based on different interaction types, and even skip LLM calls entirely in certain scenarios.

For example, some dialogues between agents are simply confirmatory questions — “Did you see that message?” — These situations really only require a “yes” or “no” response, not the generation of a long natural language passage. But current large models often generate responses containing tens or even hundreds of words, such as, “Yes, I saw the information posted by a certain agent in the group.” This redundancy not only wastes tokens but also slows down response speed.

Therefore, we plan, through underlying optimizations, to enable the agent network to use more efficient methods of expression for interaction when needed, achieving true millisecond-level communication.

06 Silicon Valley Cafés Are Full of Founders and Investors

Ray Qin: Many people in China are actually quite curious about the startup environment in the US. Based on your experience living there now, how would you describe the entrepreneurial atmosphere to give everyone an intuitive sense of it?

Raphael Shu: I’m based in Seattle, where the pace is relatively more relaxed. At Starbucks, you typically see two or three people having coffee and chatting together. But I travel to Silicon Valley once or twice a month, and the feeling there is always intense. In Palo Alto, walk into any random café, and at the table on your left, there’s a founder pitching an investor; at the table on your right, there’s another one doing the same, each conversation louder than the last. This is the prevalent atmosphere throughout the South Bay.

The vibe in San Francisco is even more exaggerated. Right as you enter the city from the freeway, huge billboards along the road proclaim in large letters: “AI agent is here.” Building signage is also covered in ads for various companies’ AI agents. Step into a café, and the entire row of people by the window have their laptops open, screens predominantly black — command-line interfaces — actively writing code.

As for events, the Meetup culture in the US is indeed very vibrant. On platforms like Luma, there are almost daily offline gatherings of all kinds.