For a hands-on learning experience to develop Agentic AI applications, join our Agentic AI Bootcamp today. Early Bird Discount

podcast Emil Eifrem on Neo4j, Graph Databases, Connected Data & Graph-Native AI

Emil Eifrem on Neo4j, Graph Databases, Connected Data & Graph-Native AI

Did you know one of the most influential data models in the world was sketched on a plane to Mumbai in 2000? Emil Eifrem, Neo4j’s Co-Founder & CEO, did exactly that. That sketch became the property graph model, and Emil has spent the last two decades building, innovating, and evangelizing graph databases that help organizations see relationships like never before. Long before graph databases became critical to AI systems, Emil Eifrem was a curious technologist obsessed with one simple idea: relationships matter more than raw data. In this episode, he shares the unconventional journey of building Neo4j, why graphs mirror how humans naturally think, and how relationship-driven data is shaping the future of AI. From early skepticism to powering some of the world’s most advanced AI applications, Emil reflects on leadership, scaling deep tech, and why context—not just compute—is the next frontier in AI.

About Speaker

Emil Eifrem is Neo4j’s Co-Founder and CEO. He sketched what today is known as the property graph model on a flight to Mumbai in 2000 and has devoted his professional life to building, innovating, and evangelizing graph databases and graph analytics. He is also co-author of the O’Reilly book Graph Databases. Neo4j today helps more than 75 of the Fortune 100, and a community of over 250,000 open-source data developers, data scientists, and architects, find hidden relationships and patterns across billions of connections deeply, easily, and quickly. Emil plans to change the world with graphs and own Larry’s yacht by the end of the decade

Transcript

Raja Iqbal: My guest today is Emil Eifrem. Emil is the CEO and co-founder of Neo4j. Welcome, Emil. It is great to have you.

Emil Eifrem: Great to be here.

Raja Iqbal: Okay. So I was looking at your background, and I bumped into, on the Neo4j website, and I will actually read, what your bio or your background says there. And he plans to change the world with graphs, and own Larry’s yacht by the end of the decade. So, is it… first of all, is it Larry Page, or Larry Ellison, or all the rich Larrys in the world? I mean, so, what is the background?

Emil Eifrem: It’s Larry Ellison, of course. You know, as the symbol of relational database dominance. Right? I figured, you know, in the end, maybe I’ll just own a rowboat, but I’m targeting the yacht. It’s… it’s my feeble attempt to be funny and not take myself too seriously in my bio. I hate these kind of bios that are all these kind of hero descriptions and that kind of stuff, and in reality, I don’t want to own a yacht, but I thought it seemed like a… funny symbolism, because on some level, what we want to do is usher in a new era of working with data that does not replace the relational database, of course, but is a good complement to it. So, that’s kind of the spirit behind the tongue-in-cheek bio.

Raja Iqbal: Yeah, so are you on track for this KPI?

Emil Eifrem: We’ll see. Hey, we’re only halfway through the decade, so, you know, so again, hope, you know, probably I’ll only own a rowboat, you know, at the end of the day, but we’ll see.

Raja Iqbal: And I don’t want to turn this into a more of an exercise in KPIs and metrics, but, I mean, you have some, some leading metrics as well, I mean, so… I mean, clearly, when we talk about graph databases, knowledge graphs, I mean, Neo4j is probably the… the first name that comes to mind, at least for me. I mean, that’s my exposure to Graph databases, so… So I think you… looks like that you are… you’re there, right? So you’re making it.

Emil Eifrem: Yeah, so I think we have the benefit of being kind of the OG graph database, so to speak, right? Like, we coined the term way back in the days, right? And kind of… played, I think, a significant role in defining the property graph model and then evangelizing it. So I think it’s fair to say that we’re the most popular graph database out there today. There were several other kind of implementations of… call it the RDF world, right? And, you know, we can get into the weeds of this, but just at a high level, like, RDF is Resource Description Framework, and this is… Tim Berners-Lee, kind of, second… innovation after the first one being the web. The web being this great system, of course, for human beings to consume through browsers.

And then what he wanted to do in the late 90s, right? So predating this agentic, here we are in 2025 kind of universe, right? But what he wanted to do is add a semantic layer to the more syntactic web, and he defined a format for that called RDF, Resource Description Framework. It was… the purpose of that was to describe resources on the web. And it has, you know, like anything, has pros and cons, but certainly lots of strengths for that kind of stuff. And then… several people tried to take that and turn that into a database format, and I was always of the opinion that that was, you know, some of the RDF technology was really strong, but it was not good as a database, kind of foundation for a database technology, because, like, ultimately it was a format for describing resources on the web. And so I didn’t think that was a good fit. So there were certainly other databases that took a kind of a graph-ish view of the world. But they didn’t call themselves graph databases, and they didn’t use what today is the most popular graph model, which is the property graph model.

Raja Iqbal: And, I read it somewhere that you came up with this sketch of a property graph model on a flight to Mumbai in year 2000. And… around that time, you know, I remember those days, right? So I’m old enough to remember those days. These are the early days of cloud computing and big data, right? So Hadoop was becoming mainstream. And, lots and lots is happening. So, what were you thinking? Because at that time, internet-scale data, that’s large-scale data. It was… I would say around Yahoo and some of those companies, barring those companies, it was not something very mainstream. It was… it was starting to happen. So, can you take us back to how did you even think about that, at that point in time?

Emil Eifrem: Yeah, so let me kind of first tell kind of our story, and then let’s relate that to what was going on in the rest of the industry, and kind of the timeline for that kind of stuff, right? So, our story was that, you know, I joined a small startup in Sweden, didn’t even know what the term startup was at the time, but, you know, I was a developer, and I loved to write code, and pretty quickly became the CTO of that startup, and CTO and VP Engineering kind of combined. So the… the 20-person, you know, engineering team reported to me, and after a while, what we saw was that approximately half of that team, about 10 people, spent the majority of their time fighting with a relational database.

We were building, kind of, these multi-tenant SaaS offerings, but it wasn’t called… it was called different, like, ASP (application service provider), and so we didn’t have, like, that modern terminology for it, but that’s what we were building—doing, basically, content management, especially around media assets. And it turns out that when you do that, you have a lot of connected data. For example, all the files need to sit in a file system, with folders that sit in folders, which is a tree. But then we supported shortcuts, or in Unix, you would call them symbolic links. And all of a sudden, one file or folder could have multiple parents, and this is mathematical when a tree turns into a graph, right? And then we had this very sophisticated security model that we overlaid on top of that. This all became, like, this really deep, complex web of connections that we tried to squeeze into square and static tables.

And it worked. You know, there’s mathematical proof that you can model any data in the relational model. So, you can do it. But it was like pushing a rope uphill. There’s tons of friction. We had to do join tables and foreign keys here and there, and it was just messy, right? A combinatorial explosion that we hadn’t really seen before. Today, I think anyone would look at it and we would have a crisp understanding in the industry. And we would look at that and say, “Dude, there’s a mismatch between the shape of your data and the abstraction that you’ve chosen, the relational database.” It doesn’t mean that the relational database is wrong generally, but for this problem, it doesn’t seem to be the best fit.

But remember, back in the early 2000s, really the only type of database that existed was the relational database, right? And so we banged our head against the proverbial relational wall for long enough that after a while, we realized something is wrong here, right? And we started realizing that, wait, what if we had Oracle, or Postgres, or MySQL, or Informix, or like a robust, high-quality database? But instead of a tabular representation, right, it would have nodes that are connected to other nodes, right? And then key-value pairs (properties) that you can attach to both of them. Then we can model everything. And that’s kind of the thing that when you read the thing, like, we grabbed this cocktail napkin—it was literally a cocktail napkin—on a flight to Mumbai and sketched out this thing.

Now, of course, as any marketing message, that’s a vastly simplified view. It makes it sound like it’s all me sitting there in guru meditation position and figuring it all out. I had a minor part of it. It was a broad team effort. You know, the reason I flew to Bombay was that I was going to IIT Bombay, so I had, you know, several friends at IIT Bombay, which you would know is a phenomenal school, and they played a huge role in figuring out this model. But ultimately, you know, we had a problem, we realized that the relational database wasn’t the best fit, we were young enough and naive enough to say, “You know what? Let’s just invent something new here. How hard can it be?” Right? And that is what led us to ultimately create what people today call the property graph model.

So that’s kind of our origin story. And now, what was going on, kind of in parallel, because you related it to the rest of the industry—this actually predated Hadoop. Right? So, let’s think back, like, early 2000s, right? Google had just been founded, Yahoo was around, Hadoop didn’t exist yet. Google was starting internally to use MapReduce, but they hadn’t spoken publicly about it. And the discourse in the industry was that the relational database is the end-all, be-all innovation of data. There may be data innovation on top of it, but it’s like a mathematical axiom. And this is a lot of the scar tissue from the mid-90s. There’d been this surge of object databases, right, that was going to take out the relational database, you know, those kind of things, right? But they failed. They went out of business… well, at least they grew to maybe 10, 20, 50 million dollars of total revenue, right? And ended up, like, going into obscurity.

So I think a lot of the discourse was that the relational database is going to be there forever, right? But inside these web companies, they were also running into another version of the problem that we ran into, primarily driven around scale. And this really ended up becoming more of a public thing in 2006, when Amazon published the Dynamo paper. And they said, “Hey, folks! We want to sell books,” because at the time they were primarily, or maybe even exclusively, an online bookstore. The last thing we want to do is build our own database. But sadly, all the off-the-shelf databases don’t work for us. So we’ve been forced to invent our own, we call it Dynamo, here’s how it works. And just a few months later, 6 to 9 months later, Google published the Bigtable paper, which said, “Hey, we’re Google, we’re also handling some amount of scale, we also were unable to get there with the off-the-shelf tools from the Oracles and the IBMs and whatever of the world. So we also invented our own database. We call it Bigtable, and here’s how it works.” And those two papers were like… they were like a lightning strike. And all of a sudden, the world… well, a few alpha geeks, really, but started talking about, “Wait, the relational database? May not be the only solution for all of data.“

Raja Iqbal: And I think this dissatisfaction started to happen with relational databases around the time when we started handling internet-scale data, right? So, I was talking to Amr Awadallah a while back on the same podcast, and we were discussing, and he was telling us back during his time at Yahoo, they would—I think he was part of the advertising team—and a single query would take them days to run on Oracle, back then. And then, early proof of concepts of Hadoop and, you know, this NoSQL world, it suddenly drastically cut the query time. So, yeah. So, yeah, that’s… that’s, interesting. Thanks for the background here. And now, a lot of companies—once again, I was reading while preparing for this podcast—that 75 out of the Fortune 100 companies, more than 75 of them, they use Neo4j, right? So what are some typical use cases that you see for Graph databases, knowledge graphs, property graphs?

Emil Eifrem: It’s actually now up to 84 of the 100. Yeah, so we’re moving up in the world. Exactly, exactly, right? And it’s very horizontal, like, across all verticals that I know of, so every single one of the 20 biggest banks in the US are now customers. 9 of the 10 biggest pharma companies in the world. 10 out of 10, the 10 biggest car companies in the world, 8 of the 10 biggest retailers, and I could keep going—insurance companies, 8 out of 10, and so on and so forth, right? So, it’s very, very horizontal technology.

In terms of use cases, I would break it down in two broad buckets. One, the first one I would call intelligent applications. So these are applications where there’s a lot of value in not just retrieving a singular data point, like, you store something and then you retrieve exactly that back. But you actually figure something out in the data, right? Like, for example, there’s a fraud ring here. Right? There’s a lot of individual transactions that seemingly are okay individually. But they’re connected in a fraudulent way, right? Or a customer journey—like analyzing what’s the journey, what’s the path of an individual customer, starting with some advertising campaign that maybe goes to the website, maybe they visit your app, maybe you have an, like, an offline store, like, they go to the store, maybe they talk to your customer service center, right?

How do you trace that entire path, right? So, these are what we call intelligent applications, and they break down into what we talk about, the seven graphs of the enterprise. Which is graphs like the graph of your employees, the graph of your transactions, the graph of your products, the graph of your customers, and there’s a bunch of use cases hanging off each and every one of these. So that’s the first category, intelligent applications.

The second category is what’s happened over the last couple of years, right? Which is this surge of adoption thanks to the amazing fit of knowledge graphs and AI. And that really doesn’t break down into, like, use cases, per se. It really is anywhere where you have your own organization kind of internal proprietary data, and you want to marry that up with an LLM. And then you use a knowledge graph as a way—an efficient and powerful way—of giving the LLM access to that data. So those would be the two broad buckets with lots of use cases inside of the first one, and very horizontal applicability in the second.

Raja Iqbal: So, I was going to ask you about knowledge graphs and AI later, but since you mentioned… So, it almost sounds like that, you know, you were there when the opportunity knocked at your door, right? So, because having built Agentic AI systems myself, I actually clearly see why a knowledge graph is so helpful. But for anyone who does not understand out there, why do you think knowledge graphs are going to be helpful in building AI systems? I mean, how do they… there’s this, we read about this a lot, that Knowledge Graphs can help minimize hallucination. So, if you can give us a sense, why do you think?

Emil Eifrem: Yeah. So there are 3 core benefits that we see that our customers keep telling us over and over again, right? And most people value all three, but they’re kind of individually weighting of the three. So let’s go through all of them in order. The first one is improved accuracy. And this, of course, is another way of saying reducing hallucination. And if you think about, kind of, the classic agentic RAG architecture, right? You have, ultimately, a human being, not always, but let’s assume, for the sake of simplicity, there’s a human being, right? And maybe let’s do the classic customer service support portal type use case, just to keep it simple, right?

And so you have Raja is asking a support question about a product that he just bought, let’s say a Wi-Fi router, or something like that, right? And then you want your agent to understand that question, even in text, or maybe it’s voice, maybe it’s, like, you know, whatever, but ultimately get to that agent in text, right? And then, the job of that agent is to look at the corpus of 10,000 or maybe 100,000 support articles, find the top K, which is, like, a pretty small number, like 10 or 20, not like 1,000, right? Let’s say it’s 10. Find the top 10 support articles that are most likely to answer your question about your Wi-Fi router, hand it off to the LLM, along with your question, and then say, “Okay, does this answer my question?” And then, of course, in agentic RAG, then there’s, like, a loop. But anyways, but that’s kind of the single path, single tools, kind of simplified view, right?

Okay. If you think about that problem, right? It turns out we, as humanity, not Neo4j, humanity, we’ve actually solved that problem before. And you and I, we ended up talking a lot about history here at the top of the podcast, right? You think back to the mid to late 90s, there were actually dozens of search engines out there—Lycos, Excite, Yahoo, AltaVista, a bunch of them, right? And they all did basically the same thing, which is they did some kind of inverted index-type search, maybe BM25, like, some version of that type of text search on individual documents on the web, right?

And then, of course, along comes a small startup, Google, obviously, which says, “We’re gonna do that, what you guys do. But on top of that, we’re gonna rank the search results based on the links on the web, right? We’re gonna use a graph algorithm called eigenvector centrality.” But they modified it for the web, and they, of course, famously called it PageRank, right? Powerful graph algorithm. That results in the top 10 blue links. I was very active on the web in the early-mid 90s, right? The problem was the AltaVista problem. You searched… whatever you searched for, you got too many search results, and the relevant ones were, like, page 77, or something like that, sometimes. Frequently, actually, right? And Google just killed that. And so it’s exactly the same problem. Look at the big corpus of information, find the top 10 most relevant documents. So, a graph turns out to be a great way of doing that, which is why there’s so much research showing that accuracy with GraphRAG (if you use a knowledge graph in combination with vector search) is a much more accurate way—it gives much more accurate responses than vector-only RAG. So that’s the first benefit, improved accuracy. This one is intuitive once you think about it from that perspective.

But the second one is very counterintuitive to a lot of people. So, we first saw this when one of our first customers, which is a company called Klarna that many have heard of, and they’re one of our first customers for AI. And they had written this application internally on top of a vector database, and then they started exploring Neo4j, and so they ported that application onto Neo4j. And I still remember the Slack message that Sebastian, the CEO of Klarna, sent to me, from one of his engineers, saying, “Holy crap. Just by virtue of porting it from the vector database to Neo4j, we’ve found a bunch of bugs already.”

And why is that? It is because if you think about vector space—and I’m a big fan, we support vector search in Neo4j, it’s very powerful—but vector space is ultimately opaque. If you have two objects, and you ask “How similar are they?” then you’re gonna get… maybe these are not that similar, so, like, 0.4 you’re gonna get back, which is some cosine/Euclidean space calculation type thing, right? If it’s an apple and an orange, it might be 0.6, but you don’t know why. If you marry that up with a knowledge graph, you know that an apple and an orange, they’re related because they’re fruits. An apple and a tennis ball might be very semantically similar in vector space, right? But if you marry that up—vector space plus a graph—then all of a sudden you can say that, “No, no, it’s because they’re round and because they’re green,” for example.

Raja Iqbal: Right?

Emil Eifrem: And so, all of a sudden, graph space is explicit and visible. vector space is opaque. So when you write your application on top of a knowledge graph, you know, instead of just a vector database, you see your data, and that leads to people being able to build applications faster, find bugs faster, stuff like that, right? So that’s the second benefit.

Emil Eifrem: And then the third one is, like, the other side, the flip side of the coin of the fact that graph space is explicit, which is it’s also explainable. Back to the… the… the example of customer service portal. Like, why did you choose exactly these top 10? Well, it’s because, Raj… well, I guess Raja, in this example, you were the end user, so Emil, the great support engineer Emil, wrote these Like, some of these, and he’s highly ranked. right? Many people approve his answers, you know, that’s why I chose the top 10. So that gives explainability, especially if you marry it up with some of, you know, Anthropic has done a lot of fascinating research around interpretability and visibility into the LLM, right? So you have both of those things, you have an explainable AI system, and it’s also auditable.

Emil Eifrem: And the banks, regulatory, whatever, insurance companies, government, like, they love that. So, higher accuracy, improved developer productivity, and then explainability and auditability. Those are the three core benefits of using graph, right, using a knowledge graph on top, or instead of just using vector search.

Raja Iqbal: I was going to actually, mention explainability here, right? So, do you have any… Do you foresee, or maybe you already have, things that some kind of explainability tools built in when I’m building an HNT API system. And, but… I have some regulatory requirement, or maybe my application, the domain requires my solution to be explainable. Do you have anything that is built in, within knowledge crafts that will… if I got this response from my agent AKI system, or if a certain decision was taken, depending upon what kind of agent I have. I’m able to explain it. Is it… do you have any such thing?

Emil Eifrem: Yes, we have a plethora of tools available, and some of them are, like, command line tools, like the equivalent of, like, Explain in Oracle or Postgres or something like that, where it kind of explain the query paths and stuff like that. But I think maybe the more powerful ones are visual. Right? Where you can go in and you can actually see the query path, you can see how it traverses through the graph, right? And through that, understand why it chose this particular document, and hand it off to the LLM, right? So, it’s those types of tools that people tend to use.

Raja Iqbal: And you earlier, I think the second point that you were mentioning, contrasting it with the semantic relationships, you know, you gave this example, apple and orange. Do you see this… it is going to be problem-specific, that certain problems they require? Knowledge Crafts more than other, or are you always… is it complementary? to, semantic, your classic LLM-based approach versus LLM plus knowledge graphs Any day. So, let’s say I’m an HNTKI developer, let me rephrase the question. If I am building an HNDKI system, should I plan my system to ground up? It is always going to be There’s knowledge graphs, entity, property relationships, plus the semantic relationships using this model, LLMs. Or, it is going to be on a case-by-case basis. For some problems, do this, for other problems.

Raja Iqbal: You know, have both.

Emil Eifrem: Yeah, I don’t think there’s any technology that is the silver bullet that you should apply across the field anytime.

Raja Iqbal: What I will say, however, is that there’s a broader trend.

Emil Eifrem: I think, well-rec… starting to become well-recognized. It’s not quite there yet, but people are starting to pick up on that, in that… in order to build truly production-grade agentic AI systems, not the toys, not the proof of concepts, right? But production-grade, in particular in the enterprise, right? Or at startups, if it’s mission critical for that startup, like, so if it’s your main product. For those kind of situations.

Raja Iqbal: You better invest up front.

Emil Eifrem: in your data. Right? And that can be… produce a knowledge graph so that downstream, you get higher, kind of better retrieval, recall, better, higher accuracy, right? And easier to build your application. All the benefits we just talked about.

But yeah, I do think that sometimes the way to do that is you extract some structure in your ingestion pipeline, and then you do… people in, kind of, vector database land tend to think of it as metadata filtering, but it really is marrying up the semantic, unstructured search with structured data, right? And I think sometimes for the low-end use cases, maybe that’s… that’s enough. Certainly, it’s powerful and easy to get started. When you’re just playing around, most people don’t yet spend enough time up front on prepping the data and on their ingestion pipeline. They usually want to do something simple to throw it in there, and then I think, like, in a vector-only approach is fine to get started.

But for many of the real production stuff, you better invest in that ingestion pipeline, and creating a knowledge graph is one of the most powerful representations that will certainly help you downstream in many scenarios.

Raja Iqbal: Okay. And, having… using something like a knowledge graph, or property graphs, there is some investment, in terms of, so if I need a semantic relationship engine, let’s call my LLMs, I have them available off the shelf. Now, then I, keyword search, has been around for 25, 30 years now, you know, BM25, TFIDF, all those. I mean, they have been around for a while. Then, knowledge graphs have been there for a while, right? So, I see something called hybrid search now, so hybrid search is available off the shelf. So, do you have some version of, like, Hybrid Search 2.0, where keywords, semantic, and knowledge graph, they are available? Like, at the same time, right? So, and I don’t have.

Emil Eifrem: No, I think…

Raja Iqbal: As a developer, I don’t have to worry about it.

Emil Eifrem: Yes, so we do that, and so that’s available out of the box from Neo4j today. Now, our vector search is very good, but it’s still early days in terms of scalability. Right? And so what we say is, like, for the high-end use cases, we actually recommend that people go with the Neo4j for the knowledge graph and a dedicated vector database. This is if you have, call it, 500 million embeddings, or a billion embeddings, or something like that, right?

For… if you have a million, if you have 10 million, if you have 50 million, it’s easier to just get started with the built-in vector search that we have, which, by the way, is Lucene-based, right? And we’ve had Lucene for our keyword search for.

Raja Iqbal: 15 years, right? So we have a lot of experience in, kind of.

Emil Eifrem: You know, bundling that, and.

Raja Iqbal: And Lucy now supports semantic as well, right?

Emil Eifrem: Exactly.

Raja Iqbal: Yeah.

Emil Eifrem: That’s exactly right.

Raja Iqbal: Okay, so you have something that is best of all worlds, right? So, semantic, graph, keyword, all of them at the same time.

Emil Eifrem: Yes, that’s exactly right.

Raja Iqbal: What about your, integration with, you know, mainstream vector databases, or mainstream developer stacks or frameworks, like Langchain, Olama, and all of that. I mean, do you have that going on?

Emil Eifrem: So we have great integrations to most popular frameworks. You can never cover the entire long tail, but this is one of the benefits of being the most popular graph database, and having a big community, and all that kind of stuff, right? So there’s lots of integration. And the same with many vector databases. There’s patterns, there’s connectors, and, you know, that kind of stuff, right? So… we’re in a pretty good spot there, I think.

Raja Iqbal: That’s… that’s great. So you mentioned, you are up to… your score is 84 out of that Fortune 100 companies, right? So… so let’s say I’m the CIO of, the rest of the, you know, 16 companies.

Emil Eifrem: The 16th, yeah.

Raja Iqbal: Yeah, remaining 16 companies, and they want to stay in the relational world, and at best, I mean, they want to stay in the NoSQL world. How would you help someone who has a technical background—I mean, why should I care?

Emil Eifrem: Yeah, so I don’t think you should care just because the data model is nice, or offers some benefits, or something like that. It has to start with the business problem. Right? And the business problems tend to come in one of those two broad buckets that I said, like intelligent applications, of which there are many, there’s probably hundreds of those use cases, right, out there. You know, in fraud detection, or personalization, or digital twin, or supply chain analytics, or product recommendations, and so on and so forth, and so on and so forth, right? So there’s… there’s a ton over there.

Or, the broader notion of, you know what? In order to do AI properly, there actually is a big trend, and you said CIO here? And so with the CIO, I would speak to it differently than the individual developer. To the CIO, I would say, Look, most successful AI applications today, they are using exactly the same pattern. And you see it under the hood almost everywhere, which is, I’m gonna take… I’m gonna sit on top of several disparate data sources. I’m gonna read them into some intermediary format.

Right? Which is not your core data platform. It’s not Snowflake, it’s not Databricks, it’s not BigQuery, because most agentic applications require low latency, more real-time requirements. Not all, there are some batch agents, but most require that low latency that those core data platforms can’t do. And so, people read it into some kind of an intermediary representation, and then they layer their agents on top of that.

And it turns out that the property graph model is a really powerful way of expressing that information. This is why, if you look at… again, this would be how I would speak to a CIO, right? If you look at the big tech vendors out there, we’re recording this, Thanksgiving week, you know, so end of November. Last week was Ignite. Microsoft launched Fabric IQ, which is their graph-based… they call it the semantic intelligence layer.

We tend to call it a knowledge layer, but it’s the same thing. It’s a graph-based way of sitting on top of your multiple data sources. Get them in this graph-based format, with an ontology, so with a schema, let’s say. We can talk about ontologies later on if you want to. And that is how you build best-in-class agents. That’s the Microsoft story. The ServiceNow story, to pick another tech company, they, a couple of months ago, launched their, big bet on AI called AI Experience.

In their blog post, when they launched that, Amit Savari, the Chief Product Officer and President of ServiceNow, he wrote about the top three differentiators of the AI experience. The number one of them was Knowledge Graphs. Right?

Salesforce, to pick a third example, Salesforce is now hiring for a VP of Knowledge Graphs, reporting to the Chief Data Officer. Right? And so, there’s this big trend going on right now. Everyone is independently concluding, people who don’t have a horse in the graph race like I do, of course, I’m the OG graph guy. I’m always gonna be graph, graph, graph, right? But lots of people who don’t… have a foregone conclusion that Graph is the right solution, right? Have independently come to the conclusion that, dear CIO, the best way to write enterprise-grade, robust AI agentic applications is to use this graph-shaped layer, semantic layer, or a knowledge layer, and that is why you should care about graphs.

Raja Iqbal: And is it done… how do you foresee it, right? So do you think that organizations will have an organizational-level knowledge graph of all the relationships, and then sub-organizations within the company, so someone… there is some kind of, the steward, as we call it, right? Some kind of, top-level… administrator of the knowledge graph. Is that how you anticipate, they’re going to be using it.

Emil Eifrem: Yeah, so it’s a great question, right? Because it’s about, like, okay, great, if that’s the theory.

Emil Eifrem: In practice, how do organizations adopt this, right?

Raja Iqbal: also matters, right? So, no two organizations will have the same knowledge graph, right?

Emil Eifrem: Totally, right? We see two broad patterns here, and they roughly break down per size of company. Right? One is exactly what you said. It’s like, alright, we need to call it an enterprise knowledge graph, which is… it replicates parts of the data that sits in their core data platform.

Remember, like, the industry have now spent, call it 5 years, moving their data. You mentioned Hadoop before, that was kind of the V1 attempt at this, which ultimately didn’t work out, right? But now, with kind of the data lakes and the lake house patterns, right, people have now moved most of their data into these massive modern data platforms. Right? And so then, it’s like, okay, I want to replicate part of that data into a graph form in my enterprise knowledge graph, and then I hang these, you know, concrete, intelligent use cases, or AI-agentic applications on top of that. That’s one pattern.

But the much more common one, that’s usually smaller companies. The much more common one—and the one that I always recommend for, like, we… I guess your framing of the question was the 16 of the remaining of the Fortune 100. So, by definition, these are massive, massive companies which have many divisions that, in and of themselves, could be, like, as a big company in it, right? And so, my… there, I always recommend that you need to start with a business problem solved by an AI application, which might be composed of one or multiple agents, and all that kind of stuff, right?

And then let that application drag the data in. So what that means is, okay, for that application, solving that particular business problem, maybe it needs to sit on top of my sales database, my, kind of, customer success database, and a weather database. For example, right? Because I want to do some forecasting based on weather forecasts and stuff like that, right? Just making something up, right? Okay, then the initial thing is, like, let’s put that in a knowledge graph, like, those three systems, right? And then you incrementally build it out over time as you add more and more applications to it.

Generally, one of the reasons that Hadoop failed—one of them, there’s plenty more—but, like, one of the key reasons was that the entire notion was, I’m gonna first take all my data, dump it into Hadoop, and then later, I’m going to get business value from it, right? And I think that generally is an anti-pattern.

Raja Iqbal: Yeah, and also, I almost, as you discuss, as you explain, and thanks for very clear explanation, so… and… I come from a technical background, I build these things, so… but the kind of insights that I’m getting is, you know, we in the… in data science, machine learning, and now Agentic AI, domain knowledge is something that is going to be incredibly, incredibly useful, and many times the failure happens not because the system failed, it is just that the domain knowledge was simply not there. So, I see knowledge graphs as more of… a central curation of the domain knowledge, right? So all the relationships between entities and concepts and ideas. In case the Agentic AI doesn’t know which way to go, it just steers it in the right direction.

Emil Eifrem: Yeah, I think that’s… that’s spot on, right? Like, you have your fragmented data landscape, and that’s true for any non-trivial size organization, and then you can get that… those individual pieces using some kind of a tool in your agent. You could call it drag, you can call it whatever you want, right? But if you have a layer that knows that these suppliers are connected in the following way. These products are related, these are the product hierarchies. This is an… airplane engine, that in turn consists of thousands and thousands of parts. Those parts, in turn, consist of thousands, well, maybe hundreds and hundreds of parts, right? And then you have the supply chain connecting that. If you have that view, the agent can make sense of that fragmented individual data element in a much, much better way. And that ultimately is what’s needed to create real, robust, enterprise-grade agents that solve real problems.

Raja Iqbal: And is there a… is there a common misconception that you hear about when, do you use Graph databases, or what are they, and when to use them?

Emil Eifrem: Yeah, probably the most common one is around the niche nature of graphs—that graphs are good, but really just for some little kind of corner of the world. And I see that changing rapidly with AI, where people start realizing that, well, whenever I have internal information, there’s a great chance that the best way of expressing that information for, like, towards an LLM is through a knowledge graph, right? But I still think that most people think of it as kind of this smaller niche technology, so that’s probably the number one misconception, I would think.

Raja Iqbal: And the process of creating a knowledge graph for our given organization, it is still… in a machine learning sense, it is more supervised. You know, you have to have some domain expert who actually figures out these rules, they understand the rules. Do you see that it is, with, LLMs and Agent Kai, maybe with usage—it could either be using LLMs itself, AI itself, or maybe some kind of feedback loop—that this can become a semi-supervised or almost automated process, that as the organization matures, this process of construction of knowledge graphs, it becomes or maybe improvement of knowledge graphs? You start with a very basic, primitive knowledge graph, and eventually it becomes a complete knowledge graph. An answer was upvoted, let’s figure out what rules were, can we infer from here. Do you see something like this happening?

Emil Eifrem: Yeah, it’s a great question, and let’s just start with kind of being intellectually honest. One of our core values at Neo4j is intellectual honesty, right? Remember I talked about the three benefits of using knowledge graphs for agents? Accuracy, developer productivity, and then explainability and auditability. The number 2 benefit there? Developer productivity. It’s true, we see it every day, we hear it from customers. But back to the intellectual honesty and rigor, it is only true once you have the knowledge graph. Right? And so you’re exactly right to hone in on this part, like, how do you get it in the first place?

And I think what’s happened here is that this… used to be pretty easy, right? Because all the data that we operated on was structured, right? And still, people were like, okay, how do I do… like, everyone knows how to do OR modeling, we’re taught that in school, and if not, you know, there’s plenty of people around us, like, if you’re any kind of developer is…

Raja Iqbal: If I want to be a contrarian, I mean, there are very few people who know how to do it well, right?

Emil Eifrem: Well, this I completely agree with, right? But you can get your application up and running relatively easy, right, and model it in a relational database, right? And you might need a little bit more help with a graph database. But it’s still structured data, right?

And, you know, one of the things that we’ve spent a lot of time on, it’s a little bit of a kind of sidebar here, is the GQL language, which is the first time that the SQL committee approved a new language for databases in the history of… of software. Right? So, for 40 years, we had only one language for speaking, a standardized language for speaking with databases, SQL. Now we have GQL, a sibling language to SQL, which is governed by the same ISO committee. And it is 95… I don’t know, 98% Cypher, which was, of course, invented by Neo4j. Super proud and love that.

As part of that, one of the things that the team worked on was how do you take data out of a relational database and transform it into the… into a property graph model? These data models, by the way, are all isomorphic. Right? So, which is a fancy way of saying that you can take information in one of them and transform to the other and back without any loss of information. Right? And so there’s this notion called tables for labels. So, if you have a person table, that means you’re going to take that and translate that into nodes with a person label. And every column in that person table will be a property key on a node that has a person label, right? So there’s actually a very simplistic process for going from a relational schema into a property graph model.

But so why did I say that I thought your question was really, really astute? Because the big thing that has changed, of course, in the last few years, what has opened up for all of us, is operating on unstructured data. Right? That’s ultimately one of the L’s in LLM, like, is language. It’s text, right? And so now, all of a sudden, what the challenge is, is how do I go from unstructured data—let’s just simplify it and say it’s text in PDF documents—and how do I create the graph from that, right?

It turns out LLMs, exactly as per your question, is a great first step towards that. But the notion of just throwing any random PDF to an LLM, one-shot conversion with no additional information… It will create the knowledge graph for you, and initially, it’ll look good. It’ll be a little bit like if you just dump your data into a vector database.

Raja Iqbal: You don’t care about chunking, you don’t…

Emil Eifrem: It’ll kind of work, and again, for that toy POC, it’ll work. But it won’t really work when you put it in production, and you want real accuracy, and that kind of stuff. And it’s a little bit the same. What we’re seeing, though, that if you provide just a little bit of hints to the LLM about the domain—back to your comment about how domain expertise is becoming increasingly important, right?—if you tell it that these are not generic PDFs, this is about a healthcare project with the following, kind of, key principles, or this is oil platform maintenance document, or whatever the domain might be, right? Then, all of a sudden, it gets to this—instead of, call it a 60% valid knowledge graph, it moves to the 90s. Right? So, all of a sudden, you get a much higher quality knowledge graph out there. And so, that’s a big part of what we’re working with our customers to do in their ingestion pipelines.

Raja Iqbal: So, data quality and humans are still important, right?

Emil Eifrem: Oh, 100%.

Raja Iqbal: It’s not going away any… anytime soon.

Emil Eifrem: 100%.

Raja Iqbal: So let’s, let’s, I can continue actually discussing, knowledge graphs and, you know, spend an entire day here, but I had a few other questions, unrelated to Neo4j and knowledge graphs. As an entrepreneur, you have built a company from scratch that is, it is currently valued at—according to public information, you’re valued at around $2 billion.

Emil Eifrem: Yep.

Raja Iqbal: So what is one founder challenge as someone who has started from scratch, and now you’re a market leader in one particular area. What is one founder’s challenge that you wish someone had warned you about?

Emil Eifrem: Man, that’s a good question. There’s… there’s lots… obviously, there’s a long list to, to choose from. Maybe what I’ll talk about is… so I think many engineers, you know, I’m obviously kind of a developer by background, and I think we’re wired to be relatively analytical about the company-building process, and maybe you know, other people might be more instinctual, but I’m certainly more kind of on the analytical mind, right? And so I think what that led to was that in the early days, we ended up with what I still today believe is the right strategy on the go-to-market for a company like us.

And that strategy was win the hearts and minds of developers. That is why we are open source. That is why we go to all these conferences and meetups and, like, all of that kind of stuff, right? And we have a free tier of our cloud service, like a free forever tier, right? All of that is to win the hearts and minds of developers everywhere, right? And… very powerful.

Emil Eifrem: What I did in the early days, though, right? You know, back in 2011, when we raised our Series A, you know, then we were, I don’t know, a dozen people, or something like that. It was basically myself, one of my co-founders who was, like, a DevRel-ish type person, and then zero engineers, right? But a year later, after our Series A, we were, like, 50 people, and maybe, like, a dozen engineers, or 15 engineers, or something like that. We built an entire enterprise sales go-to-market organization around a still very embryonic engineering organization, and very early product, right?

And… obviously, part of that was fantastic. That is when we started the march towards the, whatever, 84 out of the 100, started winning, like, some of these big logos, and… and you need enterprise sales to do that. Like, PLG won’t get you there. Not for the real deal sizes, not for the mission-critical stuff, right? Though I love PLG, it’s a good, good compliment. But you’ve got to have real salespeople for that kind of stuff, right? And so that was positive.

But… if the go-to-market motion starts with winning the hearts and minds of developers. So, for example, when we entered into all these 84 out of the 100, we didn’t go and knock on some door and talk to the CIO. I didn’t talk to any… for the first 10 years of the company, didn’t talk to a single CIO. Now, we have tens of applications, sometimes 50 to 100 applications in production with these massive companies. That’s when I talk to the CIO, and I say, hey, do or do that, right? Let’s try to rationalize this, you should have a center of excellence, we should have an enterprise consumption agreement, and all of that. I have a strategic conversation with them, right? But initially, we entered in through the developer.

Okay, so what’s… what’s the one kind of mistake? And again, there’s plenty to choose from, but the one that I choose to talk about here that I wish someone would have told me is that I did… I had the right strategy, I still believe, for the company. But there’s a mismatch between the resourcing and the strategy. 10, or 12, or maybe 15 out of the 50 were engineers. And the problem is, if you want to win the hearts and minds of developers, how do you do that? By creating a low-friction product experience. Super easy to install, easy to get started. That’s a product game. That’s developer… that’s DX, right? And, you know, packaging and that kind of stuff, right? And you’re… you know, very strong enterprise salespeople won’t help you with that, right? And so in that, and of course, we course-corrected it, but it took several years for me to understand that the core mistake there was this mismatch between strategy and resourcing. So, that probably would be… one out of many mistakes that I would tell my younger self not to do.

Raja Iqbal: Okay, yeah, that’s great. And there is just way too many things, right? Your GTM plays a role, you know, your sales, enterprise sales, but what if they go and sell and the product has friction in it, right? So, you know, and it doesn’t add value, or maybe it adds value, but it is too difficult to use, right? So, I think it’s a fairly complex game, and a lot of things you can only understand if you’ve been through it, right? So, a lot of.

Emil Eifrem: In hindsight.

Raja Iqbal: In hindsight, right? A lot of this is hindsight. As an entrepreneur myself, I mean, many things that I can relate to now, I’ve… I know someone told me, like, many years ago. But I had to run this course. I had to experience and fall down, not just once, a few times, until I internalize this is how it works. That’s…

Emil Eifrem: I think that’s spot on, and I think that’s also how entrepreneurs are wired. We’re not wired to listen to the system. If we were, right? Then we wouldn’t be entrepreneurs in the first place, right? And so… so I think, like, some of it, exactly to your point, you just have to live through yourself, which is why we all have a bunch of scars on the back of our… you know, but that’s part of the game. That’s what makes it fun.

Raja Iqbal: And another thing that is very common among entrepreneurs is near-death experiences, right? So, you know, no one knows, and… you don’t have money to run the next payroll, right? All of that. Any… anything that you can share publicly?

Emil Eifrem: Yeah, we had, like, an early near-death experience, right, before we’d raised any money, and we were, you know, two founders, you know, on the payroll, but not taking in a salary, and then four other people. And, we were kind of funding the company through, like, consulting a little bit, and we’d raised, actually, like, a really small kind of angel round. I want to say, like, 150K to 200K, or something like that, right? So it’s very little, but there’s some kind of investment into the company.

And then we were about to raise our seed, and we got a term sheet from, you know—this is when we were back in Sweden, right?—but one of the best investors in Europe, like, very kind of credentialed. And they gave us a term sheet around Christmas time. And then we started, we signed it, we started this DD process, we had $30,000 in the bank when we started, and you know how this goes, like, the company pays for… kind of lawyers, and there’s kind of accounting… it was, like, the company was, like, nothing, but still some amount of accounting DD, and looked into the IP, and, like, an FTO, freedom to operate, like, a patent thing, and stuff like that.

And so by March, when we were about to sign, we’d spread through, like, all of our cash. We had, like, nothing. And $2,000 in the bank account. And that’s when they call me up, just literally the day before we’re gonna… we’re gonna sign, and they say the deal is off. We’re gonna walk from the… from the term sheet. And… and I remember looking at the bank account, it’s $2,000, and it’s 6 days to payroll. Right? It’s March 19th. Excuse me. Yeah, I admit, I’m getting… it’s… you know, I’m getting panic attacks just… just talking about…

Raja Iqbal: Even thinking about…

Emil Eifrem: There you go. Hmm. No, that’s not true. It’s actually one of those… yeah, remember even back then, thinking to myself, man, this isn’t gonna be a great story one day, if we survive, right?

But obviously we did survive, so what we ended up doing was, you know, I got that—it was, like, a Tuesday, I believe—I got the call in the morning. Instead of getting on a train going up to Stockholm to sign the papers, I called the team, we got back, and it’s like, alright, this is the deal. They just walk from the term sheet, we gotta do something. We ended up… I called all my friends in the industry, we sent people out to consulting, right, to do good… to do consulting, not even on top of Neo4j, just anything, right? Just, like, as, you know, software developers. I took… I convinced the customers that I could send an invoice right away. Right? Ahead of time.

And then I took those invoices, and I sold them to a factoring firm, which is… it’s actually generally, like, not a good deal, like, if you have an invoice for $100, you sell it to them, and then they give you $90 right away, or $80, right? And then, you know, obviously they make the balance, $20, for example, right? And so it’s obviously a shitty deal. But the benefit is to get the money right away. And so we did that in a few days to just get money. We made payroll, kind of, the following week, and then we started kind of funding the company that way.

So, it should have killed the company, but the team was so committed. This is, like, the early team. Like, this is when people talk about founders—yes, founders are so important, of course, right?—but man, like, that early team have so many founder-esque qualities, and they just… I remember them saying, like, “No, this is too good. We can’t let this die because of this,” right? And so, they went out and did, like, boring consulting, doing, like… that’s not what they were… started joining Neo4j, not to do, like, some consulting thing, right? But they did that just for us to get cash flow, and then, you know, after a while, we took one of them home, two of them home, started building the product, and then subsequently, like, nine months later, something like that, NoSQL had happened. Through the hype of that, we ended up raising a real round, and then we were off to the races.

Raja Iqbal: It is amazing. I’ve talked to many founders, and pretty much all startups, or most startups, actually, they have a very similar story, right? So, you have some early people who believed in the idea, they stay with you, you know, you’re running out of money, some help comes in, consulting, borrowing from friends, whatever it takes, right? But you just have to survive. I mean, you have no other option.

Emil Eifrem: I think you nailed it with whatever it takes, right? You gotta have… you gotta have that mentality, and if not, you’re just not gonna survive.

Raja Iqbal: Yeah. And how, how important it is for a startup to proactively manage the culture, and maybe as… and invest in people, and make sure that you don’t get the right people? So I think it’s all part of the culture, right? So, do you actively at Neo4j, do you invest in culture? Mindset, you know, some cultural values.

Emil Eifrem: Yeah. Yeah, this is the… this could be, like, a two-hour test in and of itself, right? It is always absolutely crucial. But the tactics and the techniques vary significantly at stage, is what I’ve found. Right? Like, so in the early days, it is basically through hiring. Who do you hire and who do you not hire? And then, through what… what behavior do you model? Those are the real ways. Yes, you can write shit down on something, but, like, that’s the real way that you end up shaping your culture, right?

And when you’re 5, 10, 20 people, it just happened through osmosis, because you’re in it, you’re together, and everyone is… it’s a… what we in the graph world would call it’s a fully connected graph, so every node has a relationship to all the other nodes. It’s a fully connected graph, right? At some point, that breaks down. And it is… actually, a cultural trait is, when does that break down? And when do you start becoming a little bit more hierarchical? And, you know, we have all heard of Dunbar’s number, right, which is around 150, which… how many people can you actively kind of keep in your head at the same time and have an individual relationship with, right? And if you’re geographically distributed, that adds, you know, more complexities, too.

But at some point, that’s gonna switch. Typically tends to happen between 30 and 50, is what I’ve seen, right? And that is also… around 50 is when it happened for us. When you need to, like—again, the tactics and the techniques shift. You still need to role model behavior. You still need, like, hiring is maybe the most important one, or at least very high on the list, right? But that’s when you need to start becoming more explicit around writing things down. And you need to have mantras. Like, one of our, like… you know, we have 6 core values still at Neo4j, but, like, one of the mantras that we had that I talked a lot about was building an American company with a Swedish soul. Right? Which… people tend to love that, but, like, no one knows what it means, so then, like, I have a little bit of a lecture about what it means, which I’m not gonna go into.

Raja Iqbal: Actually, I get it. I absolutely get it. Yeah, I think it’s… love it, actually. I immediately get the references that you have there, so yeah. I get it.

Emil Eifrem: Yeah. And so… so that’s, like, one… one technique, then, of, like, tying your cultural traits and attributes to something when people don’t see you day-to-day anymore. Again, when you’re 5 people, that’s not needed, right? Because they have so much surface area with all of the employees in the company, right? And so on and so forth. And then, like, you know, now we’re about a thousand people, so now there’s plenty other things that you have to layer on top of those different things, right? But to your question, it is always, like, important.

Raja Iqbal: And, has it ever happened that you hired someone, and then you realized, oh, well… you know, we have to protect our culture. For whatever reason, maybe… not consistent with being an American company, or someone who comes in and takes away that Swedish soul out of your… out of Neo4j? Has it ever happened?

Emil Eifrem: Oh, many times, right? Where it’s like, okay, this person has the core skill sets for the job, but the way they interact, the way they show up, the way they treat their colleagues, or treat their customers, or prospects, or partners, is just not consistent with how we want to show up. Right? And then there’s… you… there’s only one way. Like, you have to take care of that. Otherwise, it falls apart.

And, you know, again, we have 6 core values. The first one is kind of first among equals, which is very simple, it’s just three words. And those 3 words are: we value relationships. That’s it. Right? And people think that we have that value because we’re graph people. But it’s actually the other way around. We’ve built the graph database because we are people who value relationships. We value the relationships between our employees. We value the relationship between us and our stakeholders, like investors, the relationship between us and our customers, us and our community members. We also value relationships in data. And people who value relationships and data, if they’re building a database, they will end up building a graph database, right? And so if you show up in a way, if you behave in a way that doesn’t value relationships, for example, between you and your colleagues, you have to take care of that. Otherwise, the culture deteriorates.

Raja Iqbal: And it is… it encompasses a lot of different aspects, right? So, relationship within the company, and of course, you know, customers are all about relationship. Partnerships are all about relationships, right? Managers with their direct reports, but I think it’s… I love this one too, right? So that’s, that’s very spot on. Okay, so, one last question before we close. What are you most excited about, as Neo4j in terms of a leader in knowledge graphs, graph databases? What are you most excited about?

Emil Eifrem: It actually is what we kind of discussed before when you asked me what would be kind of the CIO-level pitch to close the gap between the 86 to 100, right? So, in other words, we have gotten to, like, where we got started—my full scope then was the scope of a single application. And my focus was, how can I make it better for initially developers and then ultimately data scientists, in pursuit of solving a particular problem, right? Expressed typically as an application. Right?

And then what’s happened over the last several years has been… many of our big customers, again, they have tens of applications in production, and they start seeing data network effects and use case network effects. So they might have started with, like, a supply chain visibility application. And the team there purely looked at it: “what would be the best database backend, so that I can have visibility if this Suez Canal gets blocked for a week, like it was a few summers ago? What is that going to do to my global supply chain? Or run what-if scenarios, root cause, and that kind of stuff, right?” And they looked at the graphs, they believed that was the best fit for that particular application. Okay, that’s great.

Completely independently, there’s another team doing personalization and recommendations, and they say, to express my product hierarchy, the best way to do that is with graphs, right? Okay, so that all happened kind of independently. But over the last few years, they’ve started talking, and they say, “whoa, wait, what if I connect these two graphs?” And we’re pretty good at connecting stuff, right? Now, all of a sudden, I will only recommend the products that I can actually ship through my supply chain, right? It’s a really relevant and important thing. So that’s a 1 plus 1 equals 3.

And we start seeing this snowball effect. Or adding the n plus 1 application, the n plus 1 use case for graphs, is easier than adding the nth one. And the N plus 2 is even easier than N plus 3, right? And so that is really powerful, and marrying that up with the… this… this momentum around using knowledge graphs for AI, that, I think, is a really, really powerful, like a fertile ground to build a truly generational company, which is ultimately what we want to do at Neo4j.

Raja Iqbal: Okay, Emil, it was a pleasure having you. Thank you so much.

Emil Eifrem: Awesome. Really enjoyed the conversation. Thanks.

Bootcamps

Courses

Case Studies

Reviews

Consulting

Case studies

Community

Company

Emil Eifrem on Neo4j, Graph Databases, Connected Data & Graph-Native AI

About Speaker

Transcript

Sign up to get the latest on data science events and webinars