What starts as a personal workflow hack can change the game. João Moura began building AI agents simply to make his own work easier—automating content creation at Clearbit and generating hundreds of inbound leads a day. That experiment evolved into CrewAI, a platform now powering trusted, production-grade autonomous workflows at top enterprises.
In this episode, João breaks down the real challenge of moving AI agents from prototype to deployment. From multi-agent systems and guardrails to enterprise governance and security, he shares what it actually takes to deploy AI at scale. He also reflects on open-source growth, entrepreneurship in the AI gold rush, and why simplicity—rather than over-engineering—wins in the long run.
A must-listen for anyone building AI agents, leading enterprise AI strategy, or thinking about launching an AI startup.
Chapter 1 — 00:00 | Introduction & The Origin Story of Crew AI
João recounts how Crew AI started as a personal open-source project to reduce boilerplate in agent development, gained unexpected traction with enterprises like Oracle, and became a business.
Raja Iqbal: Our guest today is João Moura. João is the founder and CEO of Crew AI. He has been deep in the trenches figuring out how to make multi-agent workflows actually work. Welcome to the show, João.
João Moura: Thank you so much for having me. I’m so excited to be here. It’s funny — agents keeps being the topic of the year, year in, year out. I really appreciate the invitation.
Raja Iqbal: Thank you for being here. So, João, tell us about Crew AI. What are you building? What is it all about?
João Moura: That’s a great question. Let me tell you how Crew AI came to be. It started with me building an open-source library — that is how it all began. I was trying to build some agents myself and was struggling with a few things. I come from an engineering background with 20 years of experience, and there was just a lot of boilerplate code. I was copying and pasting things around, and it just didn’t feel right. And even after I got something working, there was still a gap: How do I deploy this now?
So I took all those learnings, built an open-source library, and put it out there. It skyrocketed. We started getting so many people using it — it was insane. And funny enough, a lot of them were big enterprises. The first company that ever reached out to me was Oracle. They actually brought me to one of their events in San Francisco and pulled me aside: “Hey, we’re actually using Crew AI — can you help us?” That was the moment I realized there was a real business here.
From there, we started building the company. There’s a race to the bottom right now to make building agents extremely easy — there are so many ways to do it. But where there’s a huge gap is in getting those things deployed, live, and running at scale, with everything that entails: from the more sophisticated features like guardrails, to the more “boring” but critical ones like privacy sanitization, PII detection, and governance. So we spend a lot of our time building across that entire stack — from building, to deploying, to orchestrating, to management.
Chapter 2 — 04:00 | Building the Full Stack: Why Crew AI Had to Do It All
João explains why Crew AI chose to build across the entire agentic stack — from building to deployment to orchestration — instead of relying on a patchwork of third-party tools.
Raja Iqbal: There’s this term “scaffolding” that’s becoming common — context management, memory, vector databases, MCP, reasoning, planning, guardrails, red teaming. Do you offer everything across the board?
João Moura: It’s a great question, because if you think about the stack, it’s enormous. It goes all the way back to data lakes — Snowflake, Redshift, BigQuery — and then there’s an LLM layer, an orchestration layer, an observability layer, a connection layer, and potentially a UI layer as well. The way users experience your agents can vary: sometimes you want a conversational interface, sometimes something else entirely.
Funny enough, when we started Crew AI, some of our investors called me borderline crazy. I said, “We have no other option but to build all of it,” and they said, “You cannot do all of it.” But I was convinced. And I think this was almost a byproduct of our go-to-market strategy — especially when selling to large enterprises.
Go back a year ago and imagine me trying to sell Crew AI to a major bank, or the U.S. Department of Defense, which is one of our customers. They’d say, “I like the platform, I want to buy it. How do I do observability?” I can’t say, “Well, for observability you need to buy something else.” And then, “How do I do memory?” — “For memory, you need another product.” Suddenly they’re thinking they need five products just to use ours. That doesn’t make sense.
So we ended up building a lot of those things into the product. Across the board, you’ll have no-code building, code-based building, MCP servers, prompt checking, PII detection, deployment, metrics, traces — everything is embedded. Now, there are areas where we excel and areas where we’re not the absolute best. For example, dedicated observability companies like Galileo and Arize are fine-tuning their own models to pinpoint hallucinations — that’s incredible, and we don’t go quite that far. But we come with batteries included that allow many large companies to run their entire agentic strategy on Crew AI.
Chapter 3 — 07:00 | Agentic AI vs. Traditional Software Engineering
The two discuss how agentic systems differ from traditional software: no equivalent of TDD, far more post-deployment iteration, and the need for different architectural thinking.
Raja Iqbal: How do you see building agentic AI differently from traditional SaaS software?
João Moura: It’s quite different. These systems are inherently based on LLMs, and LLMs are very non-deterministic. One of the biggest differences is that there’s no real equivalent of TDD or BDD yet for agentic systems. You can’t test them the way you’d test traditional software. You do have evaluations, which help keep things on track, but it’s somewhat different and tangential to the problem.
Even after you deploy, I’ve found that agentic systems are far more iterative than regular software development. In traditional software, you can spend a lot of time upfront on a PRD and then build against it. With agentic systems, even if you do all that upfront work, you still find yourself adjusting constantly after deployment — because you notice things you didn’t think through, edge cases emerge, prompts need tweaking. There’s something inherently different about the development cycle.
The architecture itself also matters. I’ve seen many people make early architectural choices they end up regretting. A year or two ago, LLMs weren’t as capable as they are now, so people built agentic systems with all sorts of scaffolding to force things to fit together. But now the models are so good that people are removing that scaffolding and delegating more directly to the model. That’s a really interesting shift in mental model.
Chapter 4 — 10:00 | How Reasoning Models Broke — Then Changed — Everything
João shares how the first wave of reasoning models (like O1) temporarily broke Crew AI because they resisted using external tools, preferring to reason their way to answers instead.
Raja Iqbal: I like that you mentioned that — we used to have chain-of-thought and a lot of prompt engineering happening outside the model as part of our prompts, but now models are building their reasoning frameworks internally.
João Moura: Exactly. And it’s funny because when that happened, it actually broke Crew AI for a moment. The first wave of reasoning models — when O1 launched and others followed — required somewhat different prompting. They didn’t want to use tools the way we expected. Instead of using a tool to gather information, they would try to reason their way to that information, which threw some of our use cases off a little bit.
Chapter 5 — 11:30 | Setting Realistic Expectations with Enterprises
João describes the maturity curve enterprises follow: starting with efficiency savings, moving to revenue generation, and eventually reaching genuine innovation. Includes the story of a customer with 51 live use cases saving $48M annually.
Raja Iqbal: There’s a lot of excitement around delegating knowledge work to AI agents — from enterprises to individual engineers. When you talk to enterprises, what are the realistic expectations you set? Is there a disconnect between what practitioners know and what’s actually possible?
João Moura: There sometimes is. It’s interesting — go back a year ago, and every lead that came to us would ask, “What should I do? What are my competitors doing?” Now they show up with 50 use cases already mapped out. That’s progress.
In terms of expectations around impact, I think there’s a maturity curve. A lot of customers start thinking about efficiency gains — how they’re going to save money. But as they start building, they see more opportunity and shift to thinking about revenue generation. And eventually, if they stay with it long enough, they start thinking about genuinely novel innovations they hadn’t even considered before.
We have one customer who started with a single use case and now has over 51 live and running. They’re already saving around $48 million using Crew AI alone. Their goal for this year is $100 million in savings, and their five-year goal is a billion. They’ve even picked one country within their organization to use as a transformation pilot — to see just how far they can take it. It’s incredible to see large companies experimenting so boldly.
Chapter 6 — 15:00 | Common Misconceptions: Magic vs. Engineering
The biggest misunderstandings in the space: thinking agents are plug-and-play magic, and over-engineering systems when simplicity is actually harder — and more valuable — to achieve.
Raja Iqbal: What are some of the most commonly misunderstood aspects of agentic AI — in terms of limitations, security, compliance?
João Moura: First, if someone isn’t technical enough, they might think it’s borderline magical — that an agent will come in and fix all their problems. That’s not the case. Every engineer appreciates that even though these systems are remarkable once they’re up and running, they still require serious engineering work to get there. For simple things, yes, you can do it with no code. But for any meaningful use case, you’re going to need hands on keyboard, real thought, and these systems are going to grow on top of your existing tech stack — connecting to pipelines, data lakes, all sorts of things.
The other big misunderstanding is that people over-engineer. They make things more complex than they need to be. There’s something that more senior, seasoned engineers understand deeply: doing something simple is actually hard. Complexity doesn’t vanish — you can choose where you put it, but it’s not going away. The idea that you can build something genuinely simple is something a lot of people miss.
Raja Iqbal: In many ways, building agentic AI solutions feels more like good software engineering than it does AI. It’s more software than AI.
João Moura: I agree. It’s a different kind of software, so you need to get reps in and build experience, but yes. And the other thing is — the models are good enough. Honestly, a lot of our use cases run on GPT-4o Mini. I don’t need a top-tier model for many of them. The intelligence piece is, in a sense, solved. Now it’s about the architecture, the process, and the craft of actually getting these things into production.
Chapter 7 — 18:00 | The POC-to-Production Gap: Why Projects Stall
João argues that building an agent has zero value unless it goes live. Most projects stall due to governance failures, data access issues, compliance gaps, and — ultimately — a lack of trust.
Raja Iqbal: Given how rapidly things are changing, is technical debt inevitable when building these systems? Can you build a perfect system from the start?
João Moura: I don’t believe in perfect systems, to start. But yes — this space is moving too fast. That’s just a fact. There are technical side effects and business side effects, and they’re intertwined.
On the technical side: yes, there’s going to be a lot of code thrown away. But the cost of prototyping has dropped dramatically — with Claude Code, Cursor, and all the other tools available, it’s getting cheaper and easier to experiment and discard. If your engineering culture embraces that, you can work around it.
On the business side, companies picking a tech stack are essentially picking a winner in a race where the race isn’t over. A lot of businesses are trying to make progress without putting all their eggs in one basket — whether that’s about what products they buy, whether to use open source, or which frameworks to choose.
Raja Iqbal: What is the biggest barrier companies face after building a POC? That initial excitement tends to fade — why?
João Moura: If you look at the industry, there’s a race to make building agents as easy as possible. Replit, n8n, Crew AI, LangChain, Microsoft Copilot Studio — there are endless tools. Building is becoming easier every day. But after that initial build, especially for larger companies, there’s a massive effort gap. The building piece is a small fraction of the total effort. Getting things live and scaled is where most of the work actually lives, and not enough people are focused on delivering value there.
What happens is: projects get built but never see the light of day. They don’t comply with governance, they’re not accessing the right data, they’re exposing data they shouldn’t, there’s no way to monitor or deploy them reliably, and ultimately — people don’t trust them.
Building agents has zero value unless those agents actually go live. If they don’t, it’s negative ROI — all that time goes to nothing. What I think leaders are starting to understand is that we’re entering a phase of: “We built this. Now I want to see the business outcome.” That’s the hard shift happening in 2026.
Chapter 8 — 21:30 | Trust, Autonomy & Non-Determinism in Production
Raja raises the tension between agent autonomy and unpredictable outputs. João breaks down when agents are even the right tool, and how a layered combination of guardrails gets you to ~99.9% reliability.
Raja Iqbal: You’ve mentioned trust repeatedly. Agentic systems have autonomy — they may spin up more agents, call various tools and MCP servers, reference memory. With that autonomy and the inherent non-determinism, things may not always go as expected. How do you reconcile that? Are enterprise CIOs and technical leaders aware of these challenges?
João Moura: They are — especially technical leaders. They’re seeing this from a mile away. Two things worth noting here.
First: not every use case actually needs agents. Agents are shiny, but there’s no silver bullet. Keep it simple. If your use case is basically “if this, then that,” just do that — it’ll be cheaper, faster, and easier. Sometimes you only need a single LLM call with function calling. Sometimes one agent is enough. Sometimes you genuinely need multiple agents. The question is: does your use case actually benefit from the non-determinism that comes with LLMs?
Second: assuming the answer is yes, how do you ensure those varied outputs fall within a spectrum you’re comfortable with? Honestly, there’s no single proven solution. What you have is a combination of approaches that get you very close to 100%. That includes LLM-as-a-judge — prompting another model to sanity check outputs against certain criteria — coding guardrails that use point-blank code to validate outputs, training agents on enough examples of what “good” looks like, and choosing the right models for each task.
In production, a mix of two or three of those approaches typically handles most problems. You may still see edge cases, but as long as you have tracing and controls to catch and block problematic outputs, you should be fine. We’re not at a “works 100% of the time” solution, but we’re getting to 99.9% — and it’s improving. It requires real work: adding controls, guardrails, and checks.
Chapter 9 — 26:25 | Guardrails: How Stringent Is Stringent Enough?
João explains how guardrails at Crew AI are applied per task and per agent, tuned through real executions, and varied based on use case — there is no universal recipe.
Raja Iqbal: Is it easy to add guardrails? There’s a sweet spot between usability and having proper controls. How do you decide how stringent to be — especially when the same guardrails won’t apply across two different scenarios within the same organization?
João Moura: What we do is apply guardrails at the task or agent level. If you’re reusing an agent, it might come with embedded guardrails; if an agent is doing a specific task, that task might have its own guardrails.
Mostly, I get comfortable with guardrails through execution. We run a lot of crews internally — on the sales side, engineering side, and marketing side — and we just run them enough times to understand and tune what’s needed. This speaks to what I said earlier: agentic systems are far more iterative than regular engineering. You have to put miles on them.
I don’t have a single guardrail I reuse every time. A use case might require me to write actual code guardrails. Another might just need LLM-as-a-judge. We also have hallucination controls — if a hallucination is detected beyond a certain threshold, it can halt the execution or reroute it. With enough options, you can figure out what your use case needs. But especially for complex use cases, it’s very much case by case. There’s nothing magical about it yet.
Chapter 10 — 29:36 | Iterating on Guardrails: There’s No Silver Bullet
Crew AI’s enterprise product observes executions and flags improvements automatically. The only real path to well-tuned guardrails is usage, iteration, and time.
Raja Iqbal: So from your experience — there’s no silver bullet, no recipe for setting up proper guardrails. You use it and figure out what’s needed?
João Moura: On our enterprise product, we have a feature that observes your executions and pinpoints things you might want to improve — and depending on how you build it, it can actually make those improvements for you, adjusting your agents and tasks automatically. We try to help you get there by showing you what good looks like. But in the end, every use case has its own specific details. You’ve got to actually use it to improve it.
Chapter 11 — 30:12 | Prompt Injection, MCP Poisoning & Layered Defenses
João recommends combining LLM-as-a-judge with code guardrails for complex systems, and discusses how fine-tuned models help detect prompt injections and PII — with latency as the key trade-off.
Raja Iqbal: One fundamental problem with LLM-based guardrails is that someone can talk their way into deceiving the agent. You can also have classic ML-based guardrails, or regex-based guardrails. What’s your recommendation to engineers implementing agents?
João Moura: I end up using a combination for the most complex systems — LLM guardrails for certain things and code guardrails for others. With code, the sky is the limit: you can bring in regular ML, classification or regression models, or just plain logic. We usually use a combination depending on the use case.
For prompt injection and MCP poisoning specifically — which is increasingly relevant as MCP adoption grows — we add protections on the enterprise side using a combination of fine-tuned models that detect prompt injections and PII. It’s a spectrum: if you don’t want to add latency, something as simple as regex might suffice. If you need to catch everything, you activate the fine-tuned model.
In general, I usually start with LLM-as-a-judge and only move to code guardrails if I actually need to. I’ve been increasingly delegating to the models as much as possible, given how capable they’ve become, and falling back to code only when necessary.
Chapter 12 — 33:07 | Setting Up Evals: Measuring Agent Quality Over Time
Because agents usually automate an existing process, you already have a baseline to test against. João explains how LLM-as-a-judge can detect quality deviation over time, and introduces Crew AI’s model-comparison testing feature.
Raja Iqbal: How do you set up evals for an agent? If a new company comes in, sets up their agent, and then switches to a supposedly better model — how do they know the agent isn’t regressing on queries it previously handled well?
João Moura: The good thing about evals specifically is that you’re rarely creating a use case from thin air — especially early on. You’re almost always automating something that already exists, so you have a sense of what “good” looks like, and you likely have examples you can use to test against. You don’t need to put as many miles on it initially, because you can run those same scenarios and check whether the outputs fall within an acceptable range.
Once you have that, you can use it essentially as a test suite. How you measure quality varies — we have a feature that does it automatically using LLM-as-a-judge, where another model evaluates the output based on the agent’s task and criteria. The key metric isn’t hitting a specific score from 0 to 10; it’s whether performance deviates. If your system was consistently scoring 7 or 8 and suddenly drops to 5 or 6 after a change, something is likely wrong.
We also have a feature called “Crew Testing” that lets you run the same agents across any model — GPT-4o, 4o Mini, Gemini 2.5, Claude Sonnet — and compare quality, latency, and token usage. This is especially useful when new models launch every two weeks and you want to make an informed migration decision.
Chapter 13 — 36:28 | Quality Is Subjective: Tolerances by Industry & Task
Quality thresholds vary dramatically — 70% similarity might work for marketing but be unacceptable in a regulated industry. Most companies start with internal use cases before going customer-facing, often with human-in-the-loop as a safety net.
Raja Iqbal: Quality is subjective though, right? It depends on the task, the industry, and your tolerance level. Seventy percent similarity might be fine for marketing but completely unacceptable in a regulated industry.
João Moura: A thousand percent. And what we see in practice is that most companies start with internal use cases — testing things internally first — and only once that works do they move to customer-facing use cases. A lot of the early traction is in back-office or marketing and go-to-market workflows where you still have humans in the loop.
And I’m actually seeing a lot of human-in-the-loop patterns emerging right now — where people either want to be “in the loop” or “on the loop.” An agent might ping them for confirmation or feedback at key points, or they can observe everything the agent does and intervene if necessary. That’s a way to achieve remarkable results while maintaining a very high quality bar.
Chapter 14 — 38:07 | MCP: Here to Stay, but Still Maturing
João believes MCP has achieved enough adoption to become a de facto standard, despite being over-engineered. A key surprise: enterprise customers care more about their internal MCP registries than the thousands of public servers available.
Raja Iqbal: Let’s talk about context — specifically MCP. Anyone dealing with agentic AI knows about MCP servers now. But when companies rush to release MCP servers, many are barely usable. How has your experience been? Is MCP here to stay?
João Moura: I think MCP is definitely not going anywhere. People are already adopting it, and it’s become something of a standard. It has its flaws — I’m not a huge fan of a few things. I think out of the gate it was a bit over-engineered. I don’t think we needed an entirely new protocol. I’m a big fan of HTTPS.
Raja Iqbal: You think it’s basically REST with a new coat of paint?
João Moura: I mean, I think there must be an easier way — something as simple as an agreed-upon JSON standard would have done the job. But in the end, it does bring value. It got accepted. A lot of companies are now using and deploying MCP.
What was interesting is that when we noticed the pattern and built features around it, we went to customers and said, “We’re bringing you 2,000 MCP servers.” That wasn’t what they wanted at all. They said, “That’s great, but we have 20 internal MCPs and we want to use those.” They wanted an internal registry of their own MCPs, not the public ones. I didn’t see that coming.
The standardization does bring real value though — it makes it easier to adopt and migrate. If everyone’s using MCP, you don’t feel like you’re picking a winner. You might use Salesforce today via MCP, and if you migrate to HubSpot later, you just swap that MCP and everything continues working.
Chapter 15 — 41:39 | MCP Security Risks & What’s Coming
João expects more MCP-related security incidents before the standard matures, pointing to the GitHub MCP leak as an early example. Prompt injection, MCP poisoning, and A2A protocols all add new attack surfaces.
Raja Iqbal: Do you expect MCP-related security incidents — like the data breaches we’ve seen with Equifax or Target — to happen?
João Moura: Yes, there’s still a long way to go. I remember the GitHub MCP issue where they launched it and leaked information they weren’t supposed to. There will be more of that until MCP matures. It does create new attack surfaces — prompt injection, MCP poisoning — and now A2A is coming on top of that. We’re not short of challenges in this industry.
Chapter 16 — 42:34 | Role-Based Access Control & OAuth for Agents
RBAC is a must — not just for security, but because giving agents too many actions degrades their performance. João walks through Crew AI’s OAuth-based “Agentic Apps” approach and explores certificate-based two-factor authentication as a further enhancement.
Raja Iqbal: What about role-based access control? Companies typically set up MCP servers with elevated service account permissions. With agents having so much autonomy, how do you ensure they don’t access or act on things they shouldn’t?
João Moura: Role-based access control is absolutely essential. You want to give agents only the necessary scope — and not just for security reasons. Throwing too many actions at an LLM creates diminished returns. It starts to hurt performance as the context window fills up.
With Crew AI, we have three types of integrations. We have what we call “Agentic Apps,” which you can think of as regular OAuth. OAuth helps address a lot of these problems — it comes with embedded scope. We’re also exploring ways to make it even more secure, like adding certificate-based two-factor authentication on top of OAuth. It’s not just “this agent uses my OAuth credentials” — it also has to be running from a specific Crew AI server that holds the certificate.
As long as you can adjust the OAuth scope at runtime based on who’s triggering the workflow, you can ensure agents don’t access more or less than they should. For anything beyond that, you may need to add an extra layer in your own implementation — RBAC, OAuth, or something else depending on the use case.
Chapter 17 — 45:15 | Advice for First-Time AI Founders
João frames this moment as potentially bigger than the internet, and argues the opportunity cost of not acting is enormous. But founders must enter with eyes open: it’s highly competitive, and an uphill battle for most.
Raja Iqbal: Let’s switch gears to entrepreneurship. You’re a first-time founder. What advice do you have for AI founders at various stages — those still thinking about building, or those who already have prototypes?
João Moura: What decided me to do this — and maybe this is the advice I’d leave with anyone who’s on the fence — is thinking about opportunity cost. I know everyone says this, but I genuinely believe it: this is as big, if not bigger, than the internet was. And between those two technological shifts, it only took about 30 years. This might be your once-in-a-lifetime opportunity to build in an emerging market of this scale. Technology will keep evolving, but you never know when you’ll get another wave like this. I didn’t want to miss it.
That said, be very mindful that it’s incredibly competitive. Because AI can impact so many industries, there are companies of all sizes going after these opportunities. Go in with the mentality that it’s going to be an uphill battle. If you have the stomach for it and the right mindset, it will be a lot of fun — but it is a grind.
Chapter 18 — 48:09 | The Get-Rich-Quick Myth & What Fundraising Really Looks Like
João dispels the “get rich fast” narrative. Running a company is a grind. Investors are smart and won’t write checks without conviction. The first investment is mostly a bet on the founder — so build a clear, credible case for why you are the right person.
Raja Iqbal: What about the idea of getting rich quickly? A lot of people venture into AI thinking the money will follow fast.
João Moura: That couldn’t be further from the truth. You hear other founders talk about how hard it is, and they’re not kidding. The only parallel I can draw is having a kid — everyone tells you it’s a lot of work, and you think you’ll handle it fine, but when it actually happens, it hits like a truck. Running a company is similar.
Being a founder sounds like you have a lot of control over your destiny, and you do — but it comes with a lot of strings attached. The problems that bubble up all the way to me are the ones no one else could solve. So all my days are spent on the hardest, most broken things in the company. It stops being fun quickly when you’re doing that.
Yes, there’s a strong investment appetite out there. But investors are smart, savvy people who’ve seen waves come and go. They’re not writing checks unless they actually believe in you and the company.
And here’s the most important piece of advice: the first check you’ll get is much more of a bet on you than on the company. The idea needs to be there, the business needs to be there — but ultimately it’s a bet on the founder. So you need to build a very strong case for why you are the right person for this.
In my case, I spent five years at Clearbit as Director of AI Engineering, stayed all the way through the HubSpot acquisition, and watched the company grow from under $10 million to $40–50 million. When I left to start Crew AI, all the founders from Clearbit invested in Crew, and Dharmesh, the CTO of HubSpot, invested as well. That gave strong signal to anyone who wanted to back Crew. Whatever your equivalent of that story is — figure it out, and make it easy to tell.
Chapter 19 — 52:03 | Open Source Business Models: The Vercel Playbook
Crew AI follows the Vercel/Next.js model — users don’t upgrade away from open source, they bring it into the commercial platform. Open source handles the 10x larger prototyping market; commercial handles production.
Raja Iqbal: You decided to open source Crew AI. What business model does that lead to?
João Moura: There are two classic open-source playbooks. One is the MongoDB-style approach — open source with an enterprise product alongside it, where users hit a ceiling and upgrade. The other is more like Vercel and Next.js — you never stop using the open-source framework, but you bring it into the commercial platform for deployment and scale.
We’re more the latter. You don’t upgrade away from the open source; you bring what you’re doing with the open source into our commercial offering.
The reason we open sourced is that right now, there’s probably 10x more demand for prototyping than for production. We’re still very early as an industry. So we open sourced the build layer — the framework — and the commercial side is for when companies actually want to move those things into production. When you start to get real value from it, you can choose to share some of that value with us, and we’ll bring you more features to support you.
Chapter 20 — 55:12 | What Enterprises Actually Pay For
The four pillars of Crew AI’s commercial value: centralizing agents and tools across an org, enabling both technical and non-technical builders, orchestrating external agents, and checking all the enterprise boxes.
Raja Iqbal: What does the commercial product actually include beyond support?
João Moura: It’s a full platform. Deployment with one click, no-code building, external agent orchestration — ServiceNow, Salesforce, SAP agents — a full integration library, traces, logs, OpenTelemetry routing, PII redaction from logs, row-based access control, enterprise certifications, and deployment on their own hyperscalers with any model.
There are roughly four or five main things that pull people toward our commercial product. First is centralizing use cases, tools, and agents — for a CIO managing AI sprawl across an organization, having everything in one place is transformational. Second is the ability for both technical and non-technical teams to work from the same building blocks — the commercial product is a full no-code platform that reuses the same agents and tools as the code-based framework. Third is external agent orchestration — being able to coordinate ServiceNow, Salesforce, SAP agents in one place. And the fourth — which looks least exciting but is probably most important for CIOs — is the enterprise checklist: RBAC, PII handling, certifications, hyperscaler deployment, and model flexibility.
We also do a lot of training — flying out to Korea, India, Mexico, Canada for 3–5 day workshops, teaching customers not just how to use the platform, but how to think about agentic systems, how to organize use cases, and how to structure the whole process.
Chapter 21 — 58:28 | Crew AI vs. LangChain: Competition or Coexistence?
João sees minimal dollar competition with LangChain (their focus is observability; Crew AI’s is orchestration), but real mindshare competition through their open-source libraries. The two have fundamentally different philosophies: graphs vs. events. Notably, LangChain’s CEO Harrison Chase is an investor in Crew AI.
Raja Iqbal: I see a lot of similarities with LangChain. Are you competing head-on, or are you in different domains?
João Moura: Agents is such a huge industry. We have customers across defense, food & beverage, legal tech, financial services — and those verticals each have dozens of horizontal use cases within them. This market is enormous, so everyone is going after it: hyperscalers like Microsoft, AWS, and Google; model providers like OpenAI and Anthropic; incumbents like ServiceNow, Salesforce, SAP, and Workday; and startups like LangChain and others.
In terms of actual dollar competition — competing for the same deal — we almost never come up against LangChain directly. Their revenue is primarily from observability, specifically LangSmith. We go way beyond that. Our focus is on orchestration and management.
We do compete for mindshare, though, because both of us have open-source libraries — LangGraph on their side, Crew AI on ours — that let people build agents. The philosophies are quite different. LangGraph is graph-based, with a heavy emphasis on control and structure. With Crew AI, we offer a spectrum — you can use Flows for control if you want it, or you can opt into full agency and let the LLMs work together.
Personally, I’m not a fan of the graph implementation. Many engineers can have entire careers without ever working with graphs — nodes, edges, that whole mental model. And I’ve found that graph-based agentic systems become hard to maintain and scale over time. That’s why we went with an events-based approach, similar to mental models like Redis or message queues that engineers encounter more often.
Interesting footnote: Harrison Chase, the CEO of LangChain, is actually an investor in Crew AI.
Chapter 22 — 01:03:44 | Will Models Eventually Replace All Scaffolding?
Investor Andrew Ng’s thesis: as models improve, everything — including events-based orchestration — gets displaced, leaving only groups of agents working together. João finds it compelling but isn’t fully convinced on the timeline.
Raja Iqbal: You mentioned Andrew Ng’s comment. Most of us agree that models will do more heavy lifting — the debate is really about how much and how fast.
João Moura: Andrew’s take is that as models continue to improve, people will delegate more and more to the LLMs — to the point where even events-based orchestration gets displaced, leaving only Crew AI’s core value of creating groups of agents working together. I’m not quite as confident as he is on that timeline, but it’s a fascinating perspective.
Go back two years, and context window size was the dominant conversation — every model release led with “context is now 128K, 256K, a million tokens.” Now nobody even talks about context. It’s basically solved. I agree the models don’t seem to have hit a ceiling yet. A lot of current models were trained on older GPUs. Now that we have much newer hardware, I’m curious what’s being cooked up. And yes, I think we’ll continue removing scaffolding and delegating more to the models — I just don’t know how fast that will happen.
Chapter 23 — 01:05:55 | From Context Windows to Context Engineering to Cost & Governance
The conversation has shifted from “how big is the context window?” to “how do I organize context effectively?” João predicts the next frontier will be cost, observability, governance, and compliance.
Raja Iqbal: As the focus shifts from context window to context engineering, do you think the next wave of discussions — six to twelve months from now — will center on cost, observability, governance, and security?
João Moura: Absolutely. On cost, there are two dynamics at play. On one hand, there’s a race to the bottom — every model keeps getting cheaper, and something like GPT-4o Mini is now remarkably capable for very little cost. On the other hand, some models won’t drop their prices because the use cases they serve justify a premium. Claude Opus is expensive, but people use it because it’s exceptional at certain things.
Beyond cost — yes. Enterprises are already moving their attention to security, compliance, metrics, and reporting. And I think this will be the dominant conversation going forward.
What I believe we need to see in 2026 is the gap closing between the seven or so S&P 500 stocks that are currently benefiting from AI and the remaining 493. That’s the bridge that needs to be built — real businesses, operating in real industries, seeing real transformation.
Chapter 24 — 01:09:43 | Logo Customers, Mindshare & Adoption Patterns
João reflects on the value of enterprise logos — not just for perception, but because they change adoption behavior. Technical companies tend to build first and buy later; traditional enterprises need platforms from the start.
Raja Iqbal: Having a recognizable logo — is that mindshare value as much as dollar value?
João Moura: There’s definitely perception, but I think it also changes adoption behavior more broadly. Tech-savvy companies — the Ubers, the DoorDashes of the world — are going to use open source and try building first. They might buy observability tooling early on, but because they have highly technical teams, they’ll build before they buy. Two years later, they come around and say, “Alright, now we’re ready to buy a full platform.”
Traditional enterprises operate differently — they need a platform from day one that checks all their boxes. So the adoption pattern is quite different depending on how technical a business is. And yes, closing a customer like PepsiCo or Johnson & Johnson, or AB InBev — which controls three out of every four beers sold globally — is exciting precisely because you can have such an immense impact on businesses that most people don’t even know are using AI.
Chapter 25 — 01:11:07 | Closing Thoughts: What Excites João Most in 2026
João is most excited about seeing real business outcomes materialize — companies saving millions, whole industries transforming. On the technical side: a 5x agent speed improvement from a recent experiment, and upcoming support for long-running agents. His personal north star: every Crew AI agent should be as capable as a Cursor agent, by default.
Raja Iqbal: What are you most excited about — in agentic AI and at Crew AI specifically?
João Moura: Honestly, what I’m most excited about is seeing the business outcomes come to life. Once you build your first agent, it clicks — you immediately see the potential. But between that moment of clarity and actually getting to measurable, real-world results, there’s a long maturation process. Now we’re there. Companies are saving millions of dollars. I’m excited to tell those stories and see the full fruition of what agents are capable of.
On the technical side, there’s still a lot to do. We ran an experiment last week that got all of our agents running five times faster — a 500% improvement. That kind of discovery still surprises me, and there will be more of them.
And we have something bigger in the works — not released yet — but essentially making it much easier to run custom agents for hours or even days if needed. Right now you can technically do it, but it requires a lot of extra work. We want to make that kind of long-running agent behavior seamless.
My personal north star is: every agent built with Crew AI should be as good as a Cursor agent by default — regardless of what that agent is for. That’s what we’re working toward.
Raja Iqbal: João, thank you so much for being here.
João Moura: Thank you so much for having me. This was a lot of fun, and I hope everyone who tuned in got something valuable out of it.
Raja Iqbal: Thank you.