Jump to section

Last verified: May 16, 2026. Vendor pricing and benchmarks refreshed quarterly.

If you searched “AI agent platforms” and got a list of 13 tools with no explanation of what makes them different, that list is not the problem. The problem is that those 13 tools are not the same kind of thing. OpenClaw is a self-hosted autonomous platform. LangGraph is a developer orchestration framework. Salesforce Agentforce is a CRM-native vendor product. MCP is a protocol standard. They cannot be compared to each other any more than a car engine can be compared to a road map.

AI agent platforms fall into four distinct layers: self-hosted autonomous platforms like OpenClaw and Hermes Agent, developer orchestration frameworks like LangGraph and CrewAI, vendor agent products like Claude Code and Agentforce, and the protocol layer anchored by MCP (Model Context Protocol). The most common mistake operators make when evaluating AI agent platforms is treating the LLM choice as the platform choice. The orchestration layer, the deployment model, and the protocol layer are separate decisions.

Before you read the landscape below, make sure you have a working definition of what an AI agent actually is. The landscape assumes you understand the agentic loop: perceive, reason, act, observe, repeat. This piece covers which tools you use to run that loop, at scale, reliably.


Why These Tools Are Not the Same Kind of Thing

The vocabulary problem is real. “Platform,” “framework,” “SDK,” and “agent” get used interchangeably across vendor marketing, GitHub repos, and comparison listicles. A common example: one widely-shared listicle puts LangGraph (a developer library you write code against) next to Salesforce Agentforce (an enterprise CRM product with its own proprietary reasoning engine) as if they are options in the same product category. They are not. Choosing the wrong layer wastes weeks.

A solo operator who starts evaluating LangGraph because a blog post listed it first will spend days in Python setup and graph configuration before realizing the tool they actually needed was a vendor agent product that ships ready to use. An engineering team that picks a vendor product when they needed a developer framework will run into walls the moment they need code-level control over failure handling or stateful checkpoint recovery.

Listicles mixing all four layer types are not wrong about the tools they list. They are misleading about what you are being asked to choose. Vendor self-rankings have the same problem.

If you are new to agents, the piece on what an AI agent actually is is the right starting point. The four-layer frame below is the map the SERP does not give you.


The 4 Layers (and How to Read Them)

Each layer answers a different question. Layer 1 answers “how do I control the whole stack?” Layer 2 answers “how do I build custom agent logic?” Layer 3 answers “how do I buy working agent functionality?” Layer 4 answers “how do all of these talk to each other?”

Most operators need a Layer 2 or Layer 3 choice. Layers 1 and 4 are specialized: Layer 1 for operators with hard data-control requirements, Layer 4 as infrastructure that runs underneath whatever else you choose.

LayerExamplesWho It’s ForCode Required?Control Level
1: Self-hosted autonomousOpenClaw, Hermes AgentOperators who want full control over model and dataYes (setup)Maximum
2: Developer frameworksLangGraph, CrewAI, AutoGen, LlamaIndexEngineering teams building custom agentsYes (substantial)High
3: Vendor agent productsClaude Code, Agentforce, Copilot StudioOperators buying pre-built functionalityNo or minimalMedium
4: Protocol layerMCP, A2AInfrastructure builders; anyone using MCP-compatible toolsIndirectN/A

The framework choice and the model choice are separate decisions. If you need a refresher on how the underlying language model fits into the stack, that is covered in the LLM explainer. Short version: the orchestration framework handles coordination; the LLM handles reasoning.


Layer 1: Self-Hosted Autonomous Platforms (OpenClaw and Hermes Agent)

Self-hosted autonomous platforms put the full agent stack on your own infrastructure. You control the compute, the data, and the model backend. The cost is the LLM API calls you pay for. The tradeoff is that you need to be comfortable with self-hosting.

OpenClaw

OpenClaw started life as Clawdbot, published in November 2025 by Austrian developer Peter Steinberger. Anthropic trademark complaints triggered a rename to Moltbot on January 27, 2026, and three days later to OpenClaw. By March 2, 2026, OpenClaw had 247,000 GitHub stars and 47,700 forks. That is the verified figure. The 350K number you may have seen elsewhere is not confirmed.

The architecture of OpenClaw is messaging-platform-centric. OpenClaw connects Telegram, WhatsApp, Slack, and iMessage to any LLM backend, so your agent lives inside your messaging app rather than a separate interface. Tencent and Z.ai have announced OpenClaw-based services. On February 14, 2026, Steinberger announced he was joining OpenAI. Sam Altman confirmed OpenClaw would continue as an open-source project under an independent foundation with OpenAI support.

OpenClaw is not abandoned, but the governance transition is real. New builds that depend on OpenClaw should have a contingency plan. For messaging-based agent deployment with full model portability, OpenClaw remains one of the few serious self-hosted options in Layer 1. Licensed MIT.

Hermes Agent

Nous Research launched Hermes Agent on February 25, 2026. Hermes Agent reached 95,600 GitHub stars in seven weeks, making it the fastest-growing agent framework of 2026. As of May 10, 2026, Hermes Agent tops OpenRouter’s global daily token rankings at 224 billion tokens per day, ahead of OpenClaw at 186 billion.

The feature that distinguishes Hermes Agent from every other self-hosted platform is the self-improving loop. After any task with five or more tool calls, a background process auto-generates a Markdown skill file with YAML frontmatter, capturing the task trajectory as reusable logic. Memory in Hermes Agent spans three layers: a persistent user-agent snapshot, a SQLite FTS5 full-text index of every session, and the procedural skill files themselves. Nous Research’s internal benchmarks show agents with 20 or more self-created skills complete similar future tasks 40% faster than fresh instances. This is an internal benchmark, not third-party reproduced, so treat the number as directional rather than definitive. Skill transfer is also domain-specific.

Hermes Agent received ICLR 2026 Oral acceptance for the companion research on self-evolving agent systems. Licensed Apache 2.0.


Layer 2: Developer Frameworks (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex)

Developer frameworks are code libraries. You write code against them to build agent logic. This is the most-evaluated layer for the operators reading this, so each framework gets a tight profile oriented toward decisions, not documentation.

LangGraph

LangGraph is the production runtime within the LangChain ecosystem, and it is what most people mean when they search “what is better than LangChain.” LangGraph 1.0 shipped in October 2025 as the category’s first stable major release. LangGraph runs standalone without the broader LangChain package.

The architectural distinction is graph-based state management. In LangGraph, workflows are nodes (LLM calls, tool calls) connected by edges (routing logic). Agents can pause, pass control to a human reviewer, and resume from any checkpoint. IBM’s 2026 case study documented 40% latency reduction versus AutoGen in a 100-agent coordination workflow. LangGraph is in production at IBM, Klarna, and Uber.

LangGraph is the right tool for engineering teams building stateful, auditable, human-in-the-loop systems. It is harder to start with than CrewAI. The learning curve is real and measurable in days.

CrewAI

CrewAI organizes agents as crew members with roles, goals, and tasks. The role-handoff model is intuitive and gets you to a working multi-agent prototype faster than any other developer framework. Despite a common misconception, CrewAI is not built on LangChain. It is a distinct architecture built from scratch.

The limitation appears in production. Role-based routing handles linear task sequences well, but it struggles with complex conditional logic, branching, and parallel execution. The migration pattern is consistent across teams: prototype in CrewAI, move to LangGraph when the workflow requires checkpoint recovery or conditional routing that does not fit a linear handoff model.

AutoGen (Microsoft Agent Framework)

AutoGen’s design is conversational: agents communicate through structured message loops, suited to debate-and-critique patterns where agents review or challenge each other’s outputs. The Azure-native integration in AutoGen is the strongest in this class. Teams adopting AutoGen for new systems should plan for a naming and architectural transition, as the project is being consolidated into the Microsoft Agent Framework (MAF).

OpenAI Agents SDK

OpenAI released the Agents SDK in March 2025 as the production successor to the experimental Swarm project. Swarm is still available for learning; the Agents SDK is the production path. Do not build on Swarm for new projects. The four core primitives of the OpenAI Agents SDK are Agents, Handoffs, Guardrails, and Tracing. In April 2026, OpenAI added native sandboxing and TypeScript support alongside the original Python implementation. This is the natural choice for teams building on OpenAI models who need handoff-heavy workflows.

LlamaIndex

LlamaIndex started as a RAG (retrieval-augmented generation) toolkit, the strongest in its class for parsing complex document types: tables, scanned PDFs, and slide decks. The agent framework and event-driven Workflows engine were added to address the reality that most enterprise RAG use cases need retrieval coordinated with reasoning and tool use, not retrieval alone. Event-driven means steps respond to events rather than executing in a fixed sequence, which enables branching and parallel execution in document pipelines where LangGraph would require more explicit state definition.

If the primary question is “how do I let users ask questions across internal documents,” LlamaIndex is the default starting point for this layer. Workflows that grow past retrieval into heavy orchestration logic often pair LlamaIndex with LangGraph.

Choosing between Gemini Enterprise Agent Platform, Copilot Studio, and Agentforce is partly a question of which model powers the agent. For a model-level comparison between the major providers, that is covered in a separate piece.


Layer 3: Vendor Agent Products (Claude Code, Gemini, Agentforce, and the Rest)

Vendor agent products are not developer frameworks. You deploy or buy access to them. You do not build agent logic inside them at the code level. The distinction matters because vendor product documentation sounds like framework documentation, which is one reason the category confusion persists.

One honest note before the product list: most vendor agent products are still in preview or early general availability in 2026. Salesforce Agentforce is the exception. When a vendor says “generally available,” read the limitations section before committing. The gap between marketing and practitioner reality is widest at this layer.

Claude Code (Anthropic)

Claude Code is a terminal-native agentic coding tool that runs on your local machine with full file system access. It ships inside Claude Pro ($20/mo) and Max plans ($100/$200/mo). It is the strongest vendor agent product in this layer for coding tasks inside complex existing codebases with layered file relationships. Claude Code is not a tool for non-technical operators. It requires comfort with the terminal.

ChatGPT Operator (OpenAI)

ChatGPT Operator is a consumer-facing autonomous browser agent for booking, research, form-filling, and purchasing. Available on the Pro tier ($200/mo), which includes approximately 400 agent task messages per month. As of mid-2026, ChatGPT Operator remains a preview feature with a limited task scope. For non-technical operators who want OpenAI’s autonomous agent capabilities, this is the entry point.

Gemini Enterprise Agent Platform (Google)

Formerly Vertex AI, the Gemini Enterprise Agent Platform became generally available on April 22, 2026. Usage-based pricing: Gemini 2.5 Flash-Lite at $0.10 per million input tokens, Gemini 2.5 Pro at $1.25 per million input / $10 per million output tokens. Do not confuse it with the Gemini Enterprise app ($30/user/mo), which is a separate product. The Enterprise Agent Platform is the Google Cloud developer product for building agents.

Salesforce Agentforce

Salesforce Agentforce uses the proprietary Atlas Reasoning Engine to run CRM-native agents that act autonomously on Salesforce data. Consumption pricing is $2 per conversation. Agentforce is designed for organizations already operating inside the Salesforce ecosystem. This is the most mature enterprise agent product in the landscape and has been generally available since 2025.

Microsoft Copilot Studio

Microsoft Copilot Studio builds agents that live inside Microsoft 365: Teams, Outlook, SharePoint. Pricing is credit-based at approximately $0.01 per message. Copilot Studio’s design philosophy is human-in-the-loop: agents augment rather than replace human decisions. It has the lowest integration friction for organizations where M365 is the primary workflow layer. The Microsoft Agent Framework (MAF) sits underneath for teams that need custom orchestration.

Devin AI (Cognition Labs)

Devin AI is an autonomous software engineering agent. Devin 2.0 dropped the entry price from $500/mo to $20/mo on the Core plan. The enterprise deployment option puts Devin inside your own VPC so code never leaves your environment. Devin AI performs best on defined software engineering tasks where success criteria are measurable. It is not a general-purpose autonomous agent.

Amazon Nova Act (AWS)

Amazon Nova Act is a browser-automation agent SDK for building and managing agent fleets that execute UI-based browser workflows. Nova Act reached general availability in December 2025. Production-grade pricing and SLA details were not fully published at launch. If you are evaluating Nova Act, consult current AWS pricing rather than any figure you find in a third-party article.


Layer 4: The Protocol Layer (MCP and ACP)

The protocol layer is infrastructure, not a product you deploy as your primary agent tool. It is the interoperability layer that lets agents built on different frameworks talk to the same services without custom wrappers per provider.

MCP (Model Context Protocol)

MCP is the standard for how AI models connect to external tools, data sources, and services. Anthropic introduced MCP in November 2024. In December 2025, the Linux Foundation announced the Agentic AI Foundation (AAIF), with MCP as the anchor project. Platinum founding members include AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI.

When every major vendor co-signs the same infrastructure standard, that standard is being removed from competitive differentiation. The AAIF donation of MCP to the Linux Foundation is the most consequential infrastructure event in the agent space in 2025. By late 2025, MCP had 97 million monthly SDK downloads, more than 10,000 active MCP servers, and first-class client support across Claude, ChatGPT, Cursor, Gemini, Microsoft Copilot, and VS Code.

The practical effect for operators: build your tool integration as an MCP server once and any MCP-compatible model can use it. You write the connector once instead of maintaining separate wrappers per LLM provider. At AIM, the MCP integrations we have built connect Claude Code to external tools and data sources. That is Layer 4 in practice for a small agency.

ACP (Agent Communication Protocol)

IBM’s BeeAI team introduced ACP as an open HTTP-native standard for agent-to-agent communication. As of May 2026, ACP is merging with Google’s A2A (Agent-to-Agent) protocol under the Linux Foundation umbrella. Active development on standalone ACP has wound down. If you are evaluating agent interoperability standards, track A2A, not ACP independently.


Which Layer Matters for Your Work

Match your operator profile to a starting layer. Each profile gets a recommended starting layer, a first tool to try, and a clear avoid.

Solo operator or small business, minimal engineering

Start in Layer 3. Vendor agent products ship ready to use and the engineering overhead of Layer 2 is not worth it for individual or small-team workflows. If you are comfortable in a terminal, Claude Code. If you are M365-native, Microsoft Copilot Studio. Wait until the workflow scales before evaluating frameworks.

Small engineering team prototyping a specific use case

Start in Layer 2 with CrewAI. The role-based design gets a multi-agent prototype running in days. Plan to evaluate LangGraph if the workflow grows complex enough to need checkpoint recovery or conditional routing. Do not jump to LangGraph for a prototype; the learning curve adds weeks to what should be a validation sprint.

Engineering team shipping to production

LangGraph. It is the first stable major release in its class for stateful, auditable, human-in-the-loop workflows. The team will spend more time upfront than with CrewAI, and that time pays back when the workflow needs to survive failures gracefully. Avoid vendor agent products where you need code-level control over failure handling.

Enterprise team committed to an existing platform ecosystem

Start with your vendor’s product. On Microsoft: Copilot Studio plus the Microsoft Agent Framework. On Salesforce: Agentforce. On Google Cloud: Gemini Enterprise Agent Platform. Cross-ecosystem frameworks carry a higher total cost of ownership for organizations where 80% of operating data already lives in one platform. The interoperability cost is real.

Operator with hard data control or model portability requirements

Layer 1. Try Hermes Agent if you want a self-improving loop and have a knowledge-work use case where cross-session learning compounds value. Try OpenClaw if you want messaging-based deployment with full model portability. Vendor agent products in Layer 3 give you infrastructure convenience in exchange for model and data lock-in. If either constraint is hard, you are in Layer 1.


What I Actually Use at AIM

At Alameda Internet Marketing, the agent stack is Claude Code, agent-browser, and MCP integrations.

Claude Code is the primary tool for anything that touches a codebase or file system: content pipelines, batch operations, tool building, client work that runs through structured files. When I run a content pipeline across multiple client files, Claude Code manages the whole sequence from the terminal. It handles file reads, writes, sequential stage handoffs, and condition checks without any additional orchestration framework sitting on top of it. For a small agency, that is the right level of control without the overhead of a Layer 2 developer framework.

agent-browser handles browser automation tasks: UI interaction, scraping, and form-fill workflows. The reason I use agent-browser over Playwright MCP is token overhead. Playwright MCP snapshots on complex pages consume 50,000 to 100,000 tokens or more. agent-browser uses about 93% less context for the same task. At agency volume, that difference compounds.

A handful of MCP servers sit between Claude Code and our daily tooling: search APIs, the keyword research stack, internal databases, and the WordPress sites we operate. That is Layer 4 in action for a small agency, and it is where most of the practical productivity comes from once an agent is wired up.

I do not use LangGraph at AIM. The workflow complexity at agency scale does not require it yet. No Agentforce, no Copilot Studio. If you are running a small team, you may not need a Layer 2 framework at all. Start in Layer 3, add MCP integrations for the tool connections, and migrate to a developer framework only when you hit a wall that the vendor product genuinely cannot solve.

That is also the honest version of “I have been through the evaluation.” I did not abandon LangGraph or Agentforce. I looked at what the work actually required and started there.


What to Skip in 2026

AutoGPT and BabyAGI

Both emerged in early 2023 and demonstrated that self-directing task loops were possible. Neither is a current production choice. AutoGPT has gone through multiple rebuilds and has been displaced in production adoption by LangGraph and CrewAI. BabyAGI was always a research demonstration. They are useful as historical context for how the agent category developed; they are not on the evaluation list for 2026 builds.

Swarm (OpenAI)

Deprecated. The production path is the OpenAI Agents SDK. Do not start a new project on Swarm.

ACP as a standalone standard

The merger into A2A is underway under the Linux Foundation. Do not build a new integration against ACP independently.

Observability tools conflated with developer frameworks

Some listicles include LangSmith, MLflow, and Galileo alongside LangGraph and CrewAI as if they are competing options. They are not. Observability tools watch what an agent ran. Orchestration frameworks decide what an agent runs. These are not alternatives to each other. If a “Top 10 Agent Frameworks” list includes monitoring tools, the author does not understand the architecture.

Listicles that mix all four layers without naming them

They will not tell you which layer you need. That is the only question that matters before you evaluate individual tools.

One more honest caveat: most enterprise agent platforms in 2026 are still in preview marketed as GA. Read the limitations documentation before making a vendor commitment. The exception, again, is Salesforce Agentforce. Everything else in Layer 3 is in various stages of “generally available” that require careful reading.

First-time readers should detour through the what an AI agent actually is explainer before settling on a tool here. Ready to push further into Layer 2? LangGraph and CrewAI from the profiles above are where most practitioners spend their first week.


Frequently Asked Questions

What is the best AI agent framework in 2026?

There is no single best framework. The right answer depends on your operator profile. For most engineering teams building to production, LangGraph is the default: it handles state management, stateful checkpoints, and audit trails. For the fastest path to a working prototype, CrewAI. For document-heavy pipelines, LlamaIndex. If you are not writing production code, you are in Layer 3 (vendor agent products) territory, not Layer 2 (developer frameworks). Pick your layer first, then evaluate tools within it.

What is the difference between LangChain and CrewAI?

They are distinct architectures, not competing versions of the same thing. LangGraph, the production runtime within the LangChain ecosystem, is graph-based and stateful: it handles complex workflows with branching, rollback, and human-in-the-loop interrupts. CrewAI is role-based and conversational, suited to fast prototyping of multi-agent workflows where routing is linear. CrewAI is also not a wrapper on LangChain. That misconception comes up often. It is built independently. Most teams prototype in CrewAI and migrate to LangGraph when production complexity demands stateful orchestration.

Is Claude Code an AI agent platform?

Yes, in the Layer 3 sense: Claude Code is a vendor-built agent product that operators deploy, not a developer framework they build with. Claude Code is terminal-native and shipping-grade for coding tasks in complex existing codebases. It is included in Claude Pro ($20/mo) and Max plans. It is not the right tool for non-technical operators or for building custom orchestration logic from scratch. If you need that level of control, you are in Layer 2 territory.

What is MCP (Model Context Protocol)?

Think of MCP as a USB-C plug for AI tools: one connection standard that any model can use to reach any data source or external service. Anthropic introduced it in November 2024 and donated it to the Linux Foundation’s Agentic AI Foundation (AAIF) in December 2025, co-signed by OpenAI, Google, Microsoft, and AWS as Platinum members. The practical payoff: write your tool integration once as an MCP server and every MCP-compatible model can use it, instead of maintaining a separate wrapper per LLM provider.

Should I use a developer framework or a vendor agent product?

If you are not writing production code, start with a vendor agent product (Layer 3). If you need code-level control over agent behavior, state management, or failure handling, you need a developer framework (Layer 2). Most operators who think they need a framework actually need a well-chosen vendor product plus MCP integrations. The vendor-first path that outgrows itself is the common pattern. Start where the ceiling is high enough for your current work.

Is OpenClaw still relevant in 2026?

Yes, with one caveat. After Peter Steinberger joined OpenAI in February 2026 the project moved to an independent foundation under OpenAI’s support, so the governance arc is still settling. The ecosystem itself is healthy: 247,000 GitHub stars as of March 2026 and continued production use at Tencent scale. If your use case is hosting your own messaging-bridge agent with full control over the model and the deployment, this is still one of the few viable Layer 1 options in 2026. New builds should plan for the governance question rather than ignore it.