until LLM Bootcamp: In-Person (Seattle) and Online Learn more
Future of Data and AI / Hosted by Data Science Dojo

CEO of LlamaIndex, Jerry Liu on Generative AI, LLMs, LlamaIndex, RAG, Fine-tuning, Entrepreneurship

There are two predominant approaches for context augmentation: RAG and fine-tuning. RAG is easier to implement and use for most developers, allowing them to leverage external data sources without retraining the model itself. Fine-tuning requires more expertise and is better suited for specialized tasks. Hence, RAG will continue to be more widely adopted approach due to its ease of use.
Jerry Liu
Co-founder and CEO at LlamaIndex

Listen on your favourite podcast app

Jerry Liu - LlamaIndex - RAG - Data Science Dojo - podcast

Are LLMs useful for enterprises? Well, what is the use of a large language model trained on trillions of tokens but knows little to nothing about your business? To make LLMs useful for enterprises, they need to retrieve the company’s data effectively. LlamaIndex has been at the forefront of providing such solutions and frameworks to augment LLMs.

In this episode, Jerry Liu, Co-founder and CEO of LlamaIndex, joins Raja Iqbal, CEO and Chief Data Scientist at Data Science Dojo, for a deep dive into the intersection of generative AI, data, and entrepreneurship.

Jerry walks us through the cutting-edge technologies reshaping the generative AI landscape such as LlamaIndex and LangChain. He also explores Retrieval Augmented Generation (RAG) and fine-tuning in detail, discussing their benefits, trade-offs, use cases, and enterprise adoption, making these complex tools and topics easily understandable and fascinating.

Jerry further ventures into the heart of entrepreneurship, sharing valuable lessons and insights learned along his journey, from navigating his corporate career at tech giants like Apple, Quora, Two Sigma, and Uber, to starting as a founder in the data and AI landscape.

Amidst the excitement of innovation, Raja and Jerry also address the potential risks and considerations with generative AI. They raise thought-provoking questions about its impact on society, for instance, whether we’re trading critical thinking for convenience.

Whether you’re a generative AI enthusiast, a seasoned entrepreneur, or simply curious about the future, this podcast promises plenty of knowledge and insights for you.

Next recommended podcast: Building a Multi-million Dollar AI Business – AI Founders Reveal their Success Hacks


(AI-generated transcript)


Welcome to Future of Data and AI. I’m your host, Raja Iqbal. My guest in the show today is Jerry Liu. Jerry is the co-founder and CEO of LlamaIndex. Welcome to the show, Jerry. I’m excited to have you here.


Excited to be here. Thanks for having me.

What is Llamaindex?


So, let’s start. Can you describe it for a layperson and imagine that I’ve only used ChatGPT. I understand how ChatGPT works. Can you describe it a little bit? What does LlamaIndex do?


Yeah, I think the mission is pretty simple for a lay person to understand, especially if you use ChatGPT. LlamaIndex takes ChatGPT, which, you know, we all use and all love, and allows you to use it over your own personal source of data no matter what type of data that is.

Unstructured, semi structured, whether you are part of a company and you want to point it to, you know, a directory or drive documents or your own personal device.

That’s the whole point of LlamaIndex. So whatever data that’s private to you, the ChatGPT and LLMs in general are not going to understand. And so how do you build the right tooling to connect that data to LLMs? We are developer facing. In that, we want to give developers the right tools to do this connection so that they can build all sorts of different types of qualifications to unlock knowledge automation.


Okay. And then, chat GPT at least publicly as we see the tool. It was released in November of 2022, right? And LlamaIndex was released roughly around the same time. At least, I mean, it came when I was looking at you know, the chronology, of, you know, how, when it was released. So, were you working on it before chat GPT was released already? I’m just curious.




APIs did exist by the way APIs did exist but as a public facing tool it came out in November.


Yeah, I would say so LlamaIndex or it was called GPT index at the time, was a personal project that was mostly, designed around experimenting with the APIs that were available that Open AI provided, which had been out for about a year or so, like, Deventry, or had been out for about a year.

And so, I thought I was late to the game because it was more just, I wanted to take the opportunity to hack around with these APIs. After all, it was more that I wanted to take the opportunity to hack around with these APIs to see what types of applications we could build with large language models had not come out yet.

It came out about a month or so after and ChatGPT that whole boom was also the reason, ChatGPT and LlamaIndex, became very, very popular because I think, through that, releasing like a public facing interface for interacting with this AI model, it just, invoke like a ton of interest in this entire space from every aspect of the developer community. And that led to the project’s growth and popularity.


Okay, and when did you realize that a framework like this would be needed? You know, was it at one of your previous jobs or when did it actually happen?


Yeah, it’s a good question. I mean, to be totally transparent. When it first started as a project, GPT index or a LlamaIndex was very much like a design project. I was doing it mostly for fun. I was very interested in startups. I was very interested in building a company one day and I was also interested in a variety of problems.

LLMs were just one piece of that. I’ve mentioned this one piece of that. I’ve mentioned the story a few times, but just one piece of that. I’ve mentioned the story a few times, but this one of the things that was actually very interesting in, you know, back in November or October of 2022.

It was actually like video search over unstructured data. So, like if you could, you know, obviously there’s steeple for acquiring like a structure database.

And I was very interested in, you know, like more and more data will be unstructured in the future. So how do we provide good interfaces for users to search over, map of on structure data, like images, videos.

So, it just seemed like a very, very hard technical problem to be and something that could unlock a variety of used cases for different notations. So, iti’s actually noodling on that idea for a while. And I was just very interested in LLMs.

And, just kind of came about as a thing where I wanted to hack on it over a weekend. Just to see, you know, what types of challenges, like opportunities, there were. And as I was hacking on language models, I realized there was this like, you know, fundamental limitation with language models, which is the moment you try to feed it a lot of your information, it tends to break because it has a limited context window.

You can’t feed your entire repository of knowledge into it. And so, I started to build some basic tooling that, you know, I would potentially use kind of. I started building basic tooling to see if I could connect, you know, some data from our own like enterprise corpus.

I worked out my previous startup time, into the Ventrio3 to see what you could. To see if they would make my life a little bit easier, right? To see if I can actually build like a Slack bot or chatbot that, understand our data.

So, I did that. I tried it. It worked okay. It worked decently and I decided to take some of those concepts and release it as an open-source leopard. The open-source library was intentionally not supposed to be super useful at times. The open-source library, the very early day days of the GPT index or LlamaIndex, was this thing called like a tree index, which I just made up.

And it was around this idea of can you develop some of abstraction to index like a large set of data and have airlines be responsible for indexing it and also traversing it. So, at the time, RAG wasn’t really a thing. That keyword did not really pop up. People did not really establish the best practices for, you know, using embedding search to index your information and then doing back for lookup, you know, retrieval to return relevant information.

Put it into the context window. That whole thing didn’t exist. Yeah, so I basically, I mean, it existed as a paper, but it just wasn’t popularized in the. Yeah.

So, like I think our project was one of the first that triggered, when I went out on Twitter, just like interest from the developer community because they thought, oh, you know, this is an interesting problem because I think a lot of people had been thinking about some more things.

And this is kind of like a novel technique towards that idea. It intentionally did not use embeddings because I at the time wanted to understand just the capabilities of language models themselves and be able to organize structure and retrieve information. And it didn’t work because it was a divine project. And so, I did this thing.

It worked kind of and then I released it on social media. And after I released it, because it tried to write a lot of interest from the developer community, I was inspired to keep working on it, you know, just because it was something that, you know, when something is getting a lot of attention, you just feel compelled to continue developing features for it.

And I’ve talked to other open-source library creators who you know, when it’s starting to take off, they just feel a lot of motivation to keep working on it because it’s fun. After about a month then I asked myself the question because I was still working

And also trying to noodle on this other like video circle idea. I thought to myself, you know, it seems like this is still getting a decent amount of attention. I started asking myself seriously if this could actually be like a useful tool for developers and then potentially like a company.

And I basically made a decision about a month in to just like pivot this thing from something that was very much like a sign project that worked so okay to like actually like a clean set of like interfaces for developers to use.

To just like, I’m sort of, LLM application. And so that pivot led to a few decisions like we tried to build out like an entire ecosystem of just tooling from data loaders to indexes to retrieval to all these things that are nowadays necessary to create RAG.

And that decision was one of the factors that, you know, maybe basically decided to go headfirst and to turning the thing into a company.

LlamaIndex vs. LangChain


And then, so now that the discussion is getting, somewhat technical, so a natural question that comes to anyone’s mind is when would you use LangChain versus LlamaIndex this is one of the common questions I’m sure you get it all the time so when do you use one or the other are they do they complement each other or are they do they compete with each other?


Yeah, there’s some decent number of overlaps between, and a lot of users that we’ve tried with, do use both in some ways.

Actually, I do use both in some ways. I would say, yeah. You can do pretty much whatever you want and either a log index or a link.

LangChain is a very broad framework that does a lot of different things. Depending on your preference for the level of abstraction, though I would say, you know, a lot of the content and resources that we put out have been centered around.

Stuff like RAG, and more broadly, just like, how do you unlock like, use cases over your data?

So, answering the question over your data, Adron checks reasoning over your data. So, we have, like a lot of depth and tutorials as well as we were like the first to basically come up with these advanced techniques over your data.

And a lot of people use, you know, that are using both, for instance, use us for the RAG and data indexing. And then, you know, they use LangChain if they are using LangChain for kind of like some more Adriatic reasoning, those types of things.

I’m not saying, I mean, like to be transparent like there are these number of overlaps and we are also investing a decent amount of effort in just like making our framework more robust, especially and more comprehensive in terms of agent forward.

But yeah, a lot of people use it.

Is LlamaIndex focused or an end-to-end RAG-based solution?


Okay, and then in a RAG implementation, you have, you know, it is well, retrieval augmented generation, right? So, there is this whole business. Around retrieval and, you know, indexing, fine-tuning your search index. You know, chunk size optimization and getting the search parameters, hybrid search, you are creating custom embeddings, and there’s this whole side around generation part of it and then the surrounding ecosystem. You have retrievers and parsers and so on. So, do you focus on a very specific area within this whole RAG pipeline or is it end-to-end?


So, the framework is quite broad. It focuses on giving users both the core abstractions as well as specific modules around every stage of this pipeline.

So, we have models around data loading in dressing transformations, embeddings, indexing, being able to put it into a variety of different storage systems, vector stores, graph stores, document stores, and then being doing different types of retrieval.

As well as downstream models post ref, like reranking. And of course, like prompt training, that type of stuff.

So, we actually, you know, in a lot of our materials and slides, the natural or documentation, a structure like this, we cover every stage of this process like this.

We cover every stage of this process and tell you, the natural or documentation is structured like this. We cover every stage of this process and tell you here are the modules that you should use. Here are the beginner models. Here is stuff you can go into if you want to get more advanced. And the goal of like a framework, right, is.

And the framework is a little bit different than a Python library for a specific module. The goal of a framework is to be a little bit opinionated so that you’re giving users like a reference on like how to build something. And then like, in the framework, you need to setup the right set of like core abstractions and you need to like strike a balance between customizability.

And, and being opinionated basically. So, like we, intentionally want all base classes, like all modules, within LlamaIndex, like the retrievers, indexes, you know, data loaders. To be very easily customizable. So, if the user has their own custom needs and any one of these components, they’re able to just like write logic to implement that. But they’re also able to plug in these components with more offshore components if they don’t want to write everything into this overall RAG pipeline or LLM application pipeline.

RAG vs fine-tuning and their enterprise adoption


Okay. And so based on your experience, I’m sure you work with enterprises who are trying to adopt solutions. First of all, do you see a lot of enterprises adopting fine tuning best solutions or rack base solutions or do you see both happening.


Yeah, so taking a step back, I think there’s maybe two prominent modes of contacts augmentation right now and there’s potentially a third. The dominant modes right now are retrieval augmentation right now and there’s potentially a third.

The dominant modes right now are retrieval augmentation and fine-tuning, and contact augmentation, by the way, is just how can I use outlines on top of my own data. Like how I can augment, with the source of data that wasn’t originally in the training set. And so, when we look at companies and we look at developers, most of them are doing like the RAG piece and what is RAG like at a very high level. It means that you’re not actually training to be a model anymore. You’re just using it like out of the box and in inference only setting.

So the parameters aren’t changed. But you’re basically creating like a data pipeline with the model to feed data from some source into the prompt. And that’s where the work for, augmentation piece comes in. You know, you do retrieval from a vector database, put a context into the prompt window of a language model.

And, the, like, this, paradigm, like, retrieval, I’m going to generation, where you’re using the model in an inference settings, you’re not training it, but you’re just composing pieces around the language model.

So like the data pipeline as well as any sort of like prompt orchestration afterwards. This is something that most people are doing today. It is something that is much easier and accessible. To developers then, for instance, to, that then like knowing how to train, underlying model itself. And so it’s also like easier and faster to get started. So that you can see a result in like a few minutes if you write them code versus having to wait like hours or days for a while to actually try.

The other paradigm is fine tuning. So like, you know, you can actually fine tune a model for a variety of different purposes and we could talk about like you know fine-tuning versus RAG and and the benefits and trade off that we can get into that discussion.

Sure, there’s like a lot of different aspects. But like fundamentally, when you fine tune a model on a data set, you’re changing the parameters of the model.

You know, when you think about how LLMs are trained, obviously it’s like a similar, it’s the same process. Like you’re changing, you’re updating the parameters of the model to make it better over some distribution of data that you’re feeding it. And so fine tuning is just taking that process and fitting it more to a specific set of data, for our given user.

And that we’re not seeing as much of we’re seeing that more for developers or teams with a lot of ML specialists or companies where people really understand how to train like, or, you know, have worked a lot with NLP before and know how to train models. It typically is a bit more for specialized use cases and something that does take a little bit longer to materialize.

Will RAG become the paradigm of choice in the future?


And, and do you see, down the road, RAG becoming more of a The paradigm of choice for enterprises, down the road or do you see both being, prevalent?


Yeah, I think RAG will continue. Like versus fine tuning will continue to be more widely adopted. The reason is just ease of use and setup and accessibility to a wider audience. And fundamentally, anything that is easier to use and just at a higher level or not higher level of abstraction, but just like accessible more people will just be more, widely adopted.

I think like the concepts you need to know how to find tune a model are much more, widely adopted. I think like the concepts you need to know how to find tuna model are much more comprehensive than the concepts you need to set up a basic, pipeline.

And so that, on its own is just gonna need to wider adoption. By the way, there is a like the third paradigm, by the way, is basically not really like RAG, like, the, the, but you could make an argument.

We can get into this discussion as well, you could make an argument. We can get into this discussion as well where as language model, like the context window gets quite big.

So like a million, you know, 5 million, 10 million, then you don’t really need to do any server retrieval augmentation. You just like do inference only like every call you just like load in your data and then every time you want to use the model you just like stuff everything into the prompt.

Right, so there’s no need to set up a data pipeline. All you have to do is just like, given whatever thing you want to feed the language model, just like feed it into the install language model and don’t worry about setting up the pipeline for you.

LLMs, large context windows, RAG


And are you talking about more futuristic or are there any? LLMS like this that are available off the shelf that have this and a huge context, context window, maybe, and of course not infinite, but maybe a million tokens or ten million tokens.

I mean, do we have any models around at the moment?


Yeah, I mean, as of today, there’s Gemini 1.5 pro, which is leading to a lot of discussions on Twitter. Like, Gemini Pro, the white paper also came out has the 1 million context window. And of course, like the, I think the, like the top, I model, which is not public yet, will have like a  10 million context window.

And actually context windows have gone. Almost exponentially bigger in the past year. When we first started the project, context windows were 4000 tokens. Then of course, you know, anthropic, GPT 4 like Claude 2 is 200,000, GP4 is 128000. And then there’s, Geminipro now with like a million.

And so, you know, we can have that discussion as well. I think it’s a very worthwhile discussion to have, which is, you know, where does RAG fall or like where, what is the role of RAG in the world of like long context windows and high level.

I think there’s a ton of trade offs in either one. I think you probably will still need some sort of retrieval in any case where you have like a larger enough data set just because like you know even a   million context window will not be able to index a significant amount of data by significant, I mean like a few hundred like megabytes or gigabytes in a way that’s like always going to be performant or gigabytes in a way that’s like always going to be performant and reliable. Before certain class the problems. And especially this is something that really appeals to companies, which is they just don’t have to worry about maintaining its entire pipeline.

For a certain class of problems, you can just feed everything into the prompt. It’s similar to how, you know, people are using ChatGPT these days. A lot of people just use raw trap, right? Because we could just like dump stuff into the prompt and copy and paste things.

And the longer the context window is, the more stuff you can just dump into the prompt. It’s app once without having to set up a more sophisticated pipeline at the beginning.

Risks of dumping data into ChatGPT


Okay, and  2 questions, right? So that emerged from the discussion here. So one is dump everything in chatgpt. And that’s second one is, you know, use if you have a bigger context windows. So let me first expand on the dump everything in chatgpt for, an average person, What could be the risks in dumping my data into chatgpt?


I mean, there it does not come free of risks, right? So should I do it or should I not do it or should I do it if what would be those if conditions.

I think for the average person, honestly doesn’t matter that much. I think the people that like really have these concerns that are in my mind like it valid from like a EV perspective or a company that stands to lose like a significant amount of what you’re got to do in the event that they don’t explicitly have these like constraints or on like data privacy security.

And of course there’s like an enterprise version, a trajectory, right, which, which does come with like a bit more guarantees, right? Which, which does come with like a bit more guarantees, but like honestly for the average person, which, which does come with like a bit more guarantees.

But like honestly for the average person, I think open AI is not like, training on or like using a lot of the data traces, right, for training. I think I’d be wrong on that. But basically, like, I think for the average person, honestly, doesn’t matter. I’ve seen people dump like everything into, from like, you know, their school props to like their like, yeah, like their assignments, right?

To like enterprise like just uploading arbitrary amounts of documents. I would probably be careful about uploading random things to ChatGPT are explicitly like a marks as like classified whenever company, right?

That’s probably the one, piece of advice I do, but generally speaking, you know, people are just uploading all sorts of stuff. And, and ChatGPT doesn’t like retain your data, right? If you upload a file or something, it’s not like it is stores it forever.

Paranoia around banning ChatGPT at enterprises


Okay, and let’s say an average person in the sense of an enterprise that is not up to speed they don’t quite understand how generative, how GPT models work. Is, is this paranoia around, banning ChatGPT at enterprise. I mean, do you think it is? It is rational or is it just irrational to actually block your employees from using chat GPT at work?


I think it is. I mean, I can’t speak for all companies. I think in a lot of cases it’s more just due to, like a lack of trust in OpenAI. I then due to like a deep grounded like, like valid concern. That said there are certain industries like I got in like governments or like banking, certain sections like healthcare, where it really is very important for that data to just never leave like a server because there’s always a risk.

But the moment like the data leads and sort of like, environment or leaves like some, thing you build a lot of security controls around, then there’s always risk that might be intercepted, right, like decrypted, you know, just, somehow use them some way. And so, and, and like if that is used in a malicious manner, there’s like huge consequences for the company.

And so like there is obviously just a lot of valid concerns in certain sectors of corporations where like the alternative right instead of using, like, T, and is to try to use like a local model, right, to use a model that is to try to use a model that is co-located with the rest of your data on your on from server so that the data is co-located with the rest of your data on your on from server so that the data on your on from server so that the data is just never leaves the device.

I think that the data is just never leaves the device. I think that’s, certainly of how a concern using ChatGPT is similar to using any other SAP service. And there’s more and more companies moving to the cloud, like using like, you know, putting your data on snowflake, putting your data or using like a salesforce like CRM, using ChatGPT is pretty much the same thing. And, and for most companies moving to the cloud is basically the trend that everybody is adopting.

Access controls in RAG-based solutions


Yeah, and do you think that the right access controls are there because you know some of the mature software like I mean if I’m using Dropbox yes you know I can use the Cloud maybe Sharepoint Snowflake all of these examples.

But I have proper access controls. Do you think the access controls in case of RAG, based solutions, you know, you take the documents, you break them down into chunks. The chunks go down somewhere. I built a RAG solution that the rest of the team is using someone uploaded some some data that others should not have access to.

Do you think so? The state of the art and access control is there for RAG based solutions to be feasible in a in a saas like setting.


Yeah, that’s a good question. I think RAG, for, like to take a step back. The fact that you’re doing RAG does give you the ability to implement access calls versus fine tuning. So that actually is an inherent advantage of doing RAG versus trying to like train a model over all data if that like certain piece of that data or get into certain types of users.

If you actually bake all the information into the bottles, weights through training on it, it’s almost impossible to do like proper access control. I think that’s just a very hard problem that no one really wants to deal with, right? Because how do you prevent like this this model from like leaking information to someone that wasn’t supposed to see this information.

Like I don’t know. And so what RAG actually does enable, and this is something that’s very useful, is that because that data is not part of that model’s training set. And instead is just fed to the model during inference time, you could basically implement some sort of software system, right?

Or during that data, that whole like data retrieval piece, to make sure that for a given user that’s asking a question using this RAG pipeline. Only the amount of like the data that user has access to is getting fed to the LLM. So the LLM itself doesn’t store that data. But that data is coming from some source where you can’t implement proper access. I think there’s definitely a lot of interest in this space right now.

I think for a lot of users who are implementing access to, like, it’s not that different from just implementing like user author, basically just authentication and pretty much any other live application because a lot of RAG is just setting up like the like rights, IDs and stuff on your storage systems on, like the data and junction piece.

That said, I think especially since the space is still somewhat early, people are still figuring out the best practices. And vector databases themselves are also kind of, like, basically making improvements on how do you actually store, data within the storage layer.

So, vector databases like that whole data and direction pipeline. Like I think there’s gonna be more and more support to make it easier for any user to basically just add access control information throughout every stage of that data pipeline up until and including when it hits the storage system.

Once that actually that, pipeline is set up, it’s pretty easy for users to just, use those access controls when building a RAG pipeline. If you get them, they can just gate the information that a given users to use when they do retrieval and feed it to a language model.

Best practices for RAG based solutions: What does the future look like?


That is great. So in terms of setting up a RAG pipeline, I don’t do a lot of hands on work myself, but I’m still reasonably technical, even now. So from my experience, figuring out how many chunks to break and chunk overlap the chunk you know, chunk augmentation, chunk size optimization, you know, during of your search parameters.

All of that. It’s quite hard and we have seen recently some solutions they claim to be, you know, RAG in a box type solutions they claim to be, you know, RAG in a box type solutions. What is your take on that? I mean, do you think we will get there at some point that somehow adaptively based on the content of my documentation. Or my documents and the document chunks.

There is some magical way of, you know, shielding me from all of this details because An LLM application developer might spend an insane amount of time in just these optimization and something that works for one particular set of documents may not work for a different set of documents. So do you foresee that this is going to happen eventually and are you is LlamaIndex doing something along those lines?


Yeah, I mean that’s basically what we’re doing. And you know, I think that’s a great, by the way, because I think basically what you’re asking is, are users going to have to define like very custom water at every stage of this pipeline and that’s going to continue to be true or are like basically best practices going to emerge so that users can kind of focus on higher level things and then there is just going to be some higher level abstractions around the core concepts of RAG that basically can be commoditized, right?

So that everybody can just use this and not have to worry about setting it up. I think that second case is generally true for most things in software and that like, you know, no one’s writing like a assembly or the, or most people are not these days, most people are writing kind of like application level code, like using, you know, Python typescript.

Plus, a variety of other languages and and you know most people are using thumb sort of like cloud based service and are not like provisioning setting up, but in for themselves. And I think it’s true for RAG too. Like, and I think the reason like, people are doing these very custom things right now is because the best practices are still working.

And also the technology is still trying to, like advance so that, you know, like, so that like it keeps up.

A lot of developers these days are experimenting with very, very custom things because they’re still in the stage of trying to discover the capabilities of a language model and really trying to understand. You know, what things work well and what don’t. So this whole thing around, first like, what are the stages of a RAG pipeline?

You load in some data, you chunk it up, you figure out the right embedding model for each chunk. You put it into like a vector storage system and then you know you do some sort of retrieval from a storage system and then you take the retrieved chunks you put it into an LLM prompt. There’s I just outlined the basic stages of like what a naive RAG pipeline is. For each of these stages, a developer has a lot of decisions to make. So for instance, storing data loading, if you’re loading PDF, you have like 10 to 20 different PDF parsers to choose from.

If you’re doing transformations and chunking, you have like, you know, 5 to 10 different chunking strategies that, you know, LlamaIndex potential offers for instance that the, that you can choose from, the embedding model providers. Obviously there’s like a ton of embedding models out there. You can choose from any one of those.

And so the whole point here is right now people are when you’re in the state of like discovery and experimentation, it’s nice to try to have a lot of flexibility at options because then you know you want to really understand what things work well and what things don’t and this is especially appealing to people that are used to a lot of experimentation.

So the researchers, the data scientists, these types of folks. I do think though that for just a general like if you’re an enterprise and you’re all like a team and you want to implement a lot of applications and adopt and quickly, the valid prop of having infinite flexibility is less compelling.

And the reason is you want to basically deliver solutions and a shorter amount of time and not spend as much engineering resources, to build something. If every time you’re building something new, you’re completely reinventing the wheel. That’s just operationally, you know, you’re completely reinventing the wheel. That’s just operationally very inefficient. Right.

And so I think as like RAG as like a framework and like as the RAG, as, as like a framework and like as the underlying technology drops, as, as the underlying technology develops. Inevitably best practices start to develop like standards start to develop. Inevitably best practices start to develop like standards start to emerge for different types of used cases.

We’ll probably start to centralize around, for instance, like different types of, parsing strategies that work well. Different types of like, a smaller set, a much smaller set of like chunking strategies that work well. And we might not even need chunking at all, right? I can talk about that as well.

But basically like a setup chunking, genuinely some in batting models that work ball as well as retrieval strategy best practices like we’re a hybrid search like reranking that type of stuff and then and then we’ll just start to see those being abstracted away services and like yeah like I I think there are a variety of companies these days that are developing like out of the box RAG solutions. I think the trick though is, timing. I think if you tried to develop out of the box like RAG, too early.

You basically run the risk of just being, like of no one using it once like, people figure out like better practices during the discovery phase. Because it’s always all going to take someone a lot longer to build like a managed out of the box offering for RAG versus building Python tooling and frameworks, enabling the end developer to build that service.

And so if you build it too early, you basically might just try to centralize around a set of abstractions that might change in the future and then you’ll be left behind. If you never build it though, then, you know, some service is gonna come along basically saying, Hey, stop getting your developers to worry about this.

And just use our stuff and it generally works pretty well. And for a lot of companies, they might just use this and that’s because, you know, like a lot of developers also don’t want to spend the time having to reinvent everything every time they build, an application and, they would much prefer to just use like a much rather choose a managed service.

And so I think our goal from LlamaIndex side is actually twofold and the open source is like a crucial part of this that will never go away. The open source will always be used for discovery of new used cases, and rapid iteration. I think, the framework itself is production grade and we’re going to continue to make it a production grade and basically the like entry point into the LLM ecosystem.

Models are always evolving. There’s always going to be new use cases that are emerging and the open-source framework is a great way to do discovery of these different use cases, whether it’s like RAG right now or agents or anything coming up later on.

As like we start to see more best practices around start and use cases emerge right around for instance documents or like PDF or like structure data, semi structure data around different use cases like question answering.

Workflow automation like do document processing extraction, like some sort of simulations like as best facts to emerge. We’re going to start probably building, a bit more managed services around pieces of that.

That basically centralized some of the choral object and make it easier for, like enterprises like teams, enterprise developers to just consume that at the service instead of having to figure out all the boilerplate. And, and, and, setting, setting the stuff up in the first place.

Automation in RAG-based solutions


Okay, I see a lot of parallels between. All of this optimization around chunk sizes, chunk overlaps, keywords and parameters. I see a lot of panels between this and your hyper parameter optimization in your classic machine learning model.

I mean, so I’m adjusting the penalty in a linear model or number of trees. Or perhaps a number of iterations in a boosted or tree or an ensemble. And, once we started getting more compute. Of course, some of those limitations were, overcome.

Now, do, do you see that we will reach a point where all of this will be? We would be able to automate. You know, just give me the right junk size or give me the right junk overlap and, and then what should be my hybrid search parameters.

Which custom embedded should I be using out of the choices right so there are some parallels but the search space in this case in my opinion is it’s much much bigger than you know tuning of 2 or 3 different parameters that have some plausible range. What are your thoughts?


Yes, yes, I mean, I think I will. I mean, I think that’s my hot take. Which we’ll see if it’s true, but generally speaking, people like these are our parameters that you can’t define trade over so you can’t just train it through back drop but they are high programmers that you can do some sort of search over and yes the search space is very big but it’s not like like there are like a proximate solutions for you to at least get something decent.

I think it’s actually not dissimilar from any sort of hyper parameter search and traditional machine learning as well as, like neural architecture, and deep learning. Like, like, neural architecture, search, right, by the way, after those who are familiar, it’s basically, If you imagine, machine learning, or building a neural network, you would go and implant it in PyTorch, right?

So that you like compose the nodes, the input output matrix, that the weights like, and then you define activation layers and you basically create like an MLP yourself, right? And you could do convolution networks like RNNs, transformers, that type of stuff. And neural architecture search is basically this idea that you know maybe you define like a core module as a demo engineer but then like there’s some meta optimization process that tries to create the network for you, right?

So it not only tries to optimize the weights of the network, but actually tries to create the overall structure of the network that best fits this problem. There’s a lot of research going into this, that has gone into this and past. I’m really kept up with recent research.

And, like, There is something very powerful about this overall idea, which is basically, you can optimize that higher levels of abstraction instead of a fixed architecture. And because once you have like a larger parameter space like there’s just going to be like a more globally optimal solution.

And that’s basically the same here with RAG. It’s like you have LLMs themselves, which, you know, are already pre trained on large amounts of data. We optimize parameters on this data. But once you build it, build LLM, as part of overall software system, like a retrieval augmentation pipeline or an agent.

There are going to be pieces in there that are book going to become relatively standardized like stack. This includes like data loading, chunking, embedding, you know, all things you mentioned, retrieval. And all of these are going to have parameters that aren’t easily fed to some backdrop algorithm.

And so, yes, like I either the data scientist or the machine learning engineer is going to have to go in tune all these parameters and basically have a terrible time doing though or there is going to be one best practices that emerged.

So just good default, I think are actually pretty important. So that roughly speaking, if you just set this up using some like with good default, you won’t really need to tune it. But to like basically some button that you can just click that will just optimize this a little bit more over your data set and it’ll just work over your data set.

I do think that going to be like a big big thing in the future because I don’t think that people really enjoy this idea of having to tune everything. I think people would much rather spend time. You know, writing any sort of custom logic that actually relates their own use case instead of having to fiddle around with like


And just like in, in the traditional machine learning hype cycle, we started seeing these auto tools pop up, right? Because hyper parameter tuning was not for everyone, right? So not everyone understood how to tune the hyper parameters.

You, I think both of us agree in this case that, that this is where this whole RAG business is going to converge eventually that this is where this whole RAG business is going to converge eventually or it start moving toward that direction.

And do you think that increasing the context window. It is going to make this problem more tractable and more manageable for, someone who’s building a RAG pipeline?


Yes. And the reason for that is that it reduces number of parameters that you have to think about. I think and, yeah, yeah, if we start getting into the discussion, I think like as models get bigger in terms of longer context and greater reasoning capabilities on lower costs.

People have to make fewer decisions in terms of doing trade offs between cough, latency, and performance. And they also have to make your micro data decision. For instance, like if you have a better and better 1 million context windows, which are generally going to be you can fit like books in there or you could fit like an entire or like 5 to 10K  reports in there. You no longer have to think about micro optimizations at the chunk level. It’s going to be more.

Like it’s, going to be more flexible and more forgiving in terms of like the amount of stuff you can put in there, right? So you don’t have to tune, you know, specific tunk and the refrigerator, which means that you basically can’t afford to do something more simple and still have it work very well with a large come size.

So I think what long context windows will and battle reasoning capabilities will allow you developers to do is they can like basically reduce the number of parameters in their overall pipeline. And not have to worry about certain parameters, which that makes it a lot easier to set up the system that works.

Are bigger context windows necessarily better?


And do you think a bigger context window is always better because I mean there’s no such thing as free lunch as we keep saying that we used to say this in your traditional machine learning, right? So the more complex the model gets, I mean, it comes at some sort of cost and hence we used to have this regularization approaches, right? So do you see, once again, I have spent quite a bit of time in traditional machine learning.

So, when you add more parameters, you add some sort of regularization penalty. You know, you want to discourage complexity. So do you see any parallels? Once again, in terms of a bigger context window. We are very all of us are very early in this. So I don’t have any practical experience with this.

But, you know, a bigger context window, potentially. Does it mean necessarily? My question is, does it necessarily mean that it is better? Right? So is it, let’s say, 16K context window.  64K context window, right? So is 64K always better than 16K?


Yeah, so that’s a good question. And I think I walk back the comments from what I said in my previous response about a long context.

And those always reducing the set of parameters that users will need to choose from. What I specifically meant around that was that if you use a long context model, then you don’t have to worry about things as much in terms of chunk size because then you could just fit entire documents into the long context model. And so certain data decisions will just go away. That said, the decision of whether to use a long context model does add to the basket options that a user has to choose from.

So for instance, from being able to pick from say a 1 million context model that costs a dollar right for in terms of inference and also takes like 20s to respond if you fill the entire context window versus like a 16K model which is blazing fast. Takes like, you know, sub 1s to give you back a response and also is much, much cheaper.

And so I think like there what this means is like users now have more options to choose from. Because depending on their use case. The long context model might actually work really, really well for, used cases where there are willing to wait where user latency is not like the most urgent concern.

And they’re they want to basically get holistic, synthesized responses across complex sources of data. And in that case, like, and this could be the case in like legal settings or, you know, other like financial settings too.

That, could be the best voice for them. In other cases where users want to build, for instance, like user facing search, you do, what you do care about stuff like latency cost, maybe you’re serving like millions of queries per day and users have an expectation that you know they get back responsible within a certain amount of time then you probably do want to optimize for latency cost and then you know even if you could stuff an entire 1 million context window you might not always want to

Like you might want to just like build do some sort of retrieval or more fine-grained trunks and information so that you know you’re able to not consume as many tokens during apprentice call. So I do think, you know, in general, if you imagine some sort of like curve, right? There’s some sort of like cost like there’s like a bunch of scatterplot of like, like the cost slash latency versus performance.

And generally speaking, as like, cost landscape goes up like performance always goes up. And that’s basically some sort of line, right? That, and you can draw a bunch of like points on different models that fall within this plot. Over time, cost latancy will always go down. And so this line will, you know, basically, I don’t know how like the viewers, the listeners, virtualizing with a box specializing as like cost latency on Y axis, performance on X axis. This overall curve will shift down. And so over time, like everything will get cheaper and faster.

That said, there will always be kind of like more and more used cases where, like users will still need to make these trade offs because in my opinion, using like the best, I like those, but the biggest contact window model, always, will not always be the best fit for, like universally for all different types of use cases.


And you emphasized a lot on cost and latency. Once again, finding parallels in your, your traditional machine learning, right? So there is this concept of curse of dimensionality, right? So you know you cannot have too few features.

You cannot have too many features. And we can’t go into the details of, you know, or do a tutorial on curse of dimensionality, but their general idea is that there is a right, you know, that Goldilocks dilemma, there is right, it cannot be too many, it cannot be too few. Do you think the accuracy also might take a hit? If your context window is too big, is a bigger context window. Is it a cure for all problems here in terms of being able to fit. Or, or, you have any is there any skepticism around it?


Yeah, I think. I think any sort of longer context window model. Like I think Like the curse of dimensionality, the whole thing.

I mean, it’s an interesting thing in machine learning, where Yeah, like basically what we were seeing with, these days is that you can throw in more compute and parameters generally and then problems tend to go away as long as you have the right architecture that allows you to scale compute and and of course,

like there’s, there’s issues with like a raw transformers architecture right and I’m pretty sure no one is like LLM training companies it’s actually using just, raw architecture they’re all combining like a bag of tricks, right, whether it’s like mixture of experts, like flash tension and I’ve no idea like just there’s some like secret stuff that’s going on that’s like optimizing beyond quadratic attention because like like the raw transformers architectures like NQ, n square, right? Like if you increase the context window like the kind of like computes like feel quadratically.

And but I think once you have the right architecture, it turns out for a lot of these models, you just like scale up the amount of compute and it just generally gets better. And so what happens is like I think a lot of current problems are seeing, potentially with long context windows.

I think some people did analysis but like GPT 4 128K and also like Claude 2 to 200K were like there’s issues with attention in the middle of the context window. So that’s popularly referred to as like lots of the middle problems or like if information is in the middle of an LLM context window, it tends to get lost. And, attend the, like, you know. Like, the, is an able to actually recall information there in a very precise way.

That problem, I think, tends to fade away as models get bigger and better. And so even the latest like Gemini Pro I think in that paper It attains like % accuracy or like, recall metrics or something like I forgot the precise thing.

And like with a bigger context window. It actually didn’t run into as many lost in the middle problems as some of the earlier models with lower context. And I think, you know, whatever checks they’re using under the hood, maybe it’s architecture, maybe it’s computes, maybe it’s a mixture of both.

I think generally speaking these models will just get bigger, better and like have better recall. And then context. So I do think any sort of problems with that today is more of a temporary state of things.

Future of LLMs, ensemble techniques, committee of models


So Jerry speaking of, speaking of, the possibility. Or the general progression, let’s call it progression. This is how it has been. It’s storage and compute, you know, they have been becoming cheaper and better.

So as compute becomes better and what I call the enabling technologies they become better. We can expect the that and the cost of training a model, the cost of owning or operating a model, becoming lower.

Do you see some sort of equivalent of ensemble techniques or some kind of? You know, from a classic, traditional machine learning, the boosting and bagging or random forest type approaches where you have a committee of models taking over as opposed to a single model.


Yeah, I think it’s very interesting. I don’t know how qualified I am to talk about this on the architectural level. So like Mistral of experts, right? Like I mean I’ve read the basics but you know it’s basically in the architectural level itself you can scale up the number of parameters and then you basically like add every layer like block you basically route through like a subset of the overall number of connections because you have a router that selects like a few experts given any sort of input.

And then, you know, and, so you can, scale up the number of parameters with, lower compute. And so it’s basically a strategy for just like, you know, selecting subset of the SORPLE network. That makes sense for, given an inference call. You can actually do this, outside the pure model architectural level too. And we’ve actually done this on the developer end.

Like if you just use that out of the box model, you can do some sort of interesting ensembling strategies. Like, you know, if you’re building like an agent or even a RAG pipeline. An example of this is if you just combine the outputs of like a variety of different RAG pipelines with different parameters and then you just combine the results at the end.

Like let’s say you know you’re not actually quite sure what chunk size makes sense. And so maybe you just have like 5 different RAG pipelines, one with each different type with each, chunk side, you generate 5 different responses and you aggregate the result at the end. The idea of routing itself is actually quite powerful. And you aggregate the results at the end.

Right, the idea of routing itself is actually quite powerful, as an abstraction and tool, but then as an abstraction and tool, but then, and, itself. And so, you know, just regarding like the underlying architecture, if you’re just composing LLMs, with other pieces of software, you can, for instance, have a router that given of like a set of like 5 or 6 options, just like, and given a user query, choose to route that query to a subset of these options, that are the most relevant, and then, you just combine, and do some sort of like pipelines of like agents reasoning over things and you can apply it in different settings. Another example is for instance, like one example for a tabular data, and this is pretty technical, by the way.

So just, for, those listeners who have played around with, you know, trying to use outlines on top of like, CSV, a tabular data, SQL database, those types of things. There’s typically like  ways that you can try to query tabular data with for outlines. One is at the office CSV, you just format it as plain text and dump it into the prompt window of a language model so that you know you have a language model.

You just input the entire comma like CSV comma separated values into the prompt and then you ask a question over. Another is you take, you know, you, Ask the LLM to given a user query, infer a set of like Pandas operations or a SQL statement to actually run over the table.

So instead of like direct prompting where you’re feeding all the valleys in the table into the prompt, you’re actually trying to infer some sub operations that you can do over the valleys. And so, an example of like an ensemble, you can do is you just do both and then you combine the results of the end. You try out like 5  different ways like trying to do like text sequel or a text with handed and you try out 5 different ways of doing direct prompting. And you just, would do some Africa at the results. This idea was inspired by a recent research paper that that they called this like mixed self-consistency technique.

The high-level idea is you just try out a bunch of things and you combine the result. So yeah, I think it’s definitely a very interesting thing. And I think more people will probably find out interesting ways of ensembling whether like a LLM inference call or RAG pipeline or an agent later on.

Entrepreneurship and technical leadership


Yeah. So let’s, let’s now, shift gears, and move towards entrepreneurship. So I have a few. Questions for you on the entrepreneurship to the side. Clearly you are a very technical CEO. You know and then and your co-founder, your CTO actually, of course, I’m sure he’s also technical.

So how do you resist this temptation because CEO is it can be a different job sometimes? Do you still contribute code? Let me ask you, let me start with that. Are you still contributing code, to the project or your stop writing code.


That’s a good question. I think I still try to, but definitely not every day. I think I do a little bit on the weekends, but my day help contribution have gone way down in the past few months. And, to some extent, you know, I think it’s a good question.

I don’t actually know if I have the right balance of how much to contribute. You know, doing other things that I would imagine like a CEO would typically do. The, in the beginning of this project, absolutely.

I was, I was really coding every day, like in the first few months, pretty much it like what really matter was just shipping velocity and being able to build this up as quickly as possible.

And so we really need it like all hands on DACA and just being able to build a tool. I think we were building inherently something that was very technical. We were building developer tooling and so, important, and actually quite important for all founders, like both founders to be very attuned to what was going on in the open source framework, have a deep empathy for that end user is through like being able to contribute directly to the abstractions in the library and understand and develop an internal sense of what the roadmap should be like.

I think over time as the team has gone there, I just practically due to the time constraints. I’ve stopped being able to ship a lot of the core, like abstractions in the library and to some extent it also probably wasn’t the best. It also probably wasn’t the best like end resource allocation strategy because the issue is if I’m doing something that’s blocking a bunch of other engineers and if I don’t do it then things will get done.

And so it’s better to allocate to people with like full time capacity to work on for instance, like a deep refactor or like any sort of like deep engineering feature work. So I still try to I think it’s important for me for us to continue remain very engaged in the developer community, but I started to more treat coding as a way of like dog fooding, like product, product dog fooding as opposed to like core feature.


And at what point do you think it’s more of a philosophical question purely from a I’m a business standpoint, you know, that Larry Page and Sergey Brin question, right? So both of you are very technical founders. And at some point in time you have to bring in someone from outside, right?

So, Yeah, you have to bring in. So what are what are the pros and cons of being a technical CEO and You know, how long do you think it should continue? Philosophically, I’m not asking when you plan to step down or anything, right? But it, in the lifecycle of company, I’m sure you have done this kind of soul-searching, right? So I see value. I’m I consider myself to be a technical CEO. I’m fairly technical. I’m not involved in day to day coding, but still I see value in this. So what is your take on that?


I think while building a developer tool or anything that internally is catering to a technical audience. I would almost say it’s almost a requirement to be a technical founder. I think the reason is if you are not one is you just move way slower in the beginning because the person that’s doing all the heavy lifting typically are is the like engineering or like the technical foundations.

And the second is. You just wanna have as much empathy for like the audience that you’re trying to build for. And I think one other thing is that contributed to our success. Was just the insane volume that we shipped in the beginning, given the small, side.

And I think just being able to actually really, really deeply understand, some of the problems at a very tight level, anticipate where the field was going. And also build the right, at the right level of distraction. I think all that stuff really mattered, especially since on the open source side, like everything is code, right?

And so your product is the code that you present to the end user, for them to use. And so I do think for a lot of like, at least like I can only say, speak from experience with like infra, like, that tooling, AI tools, that type of stuff. It is actually quite important to be pretty tactful in the space. I think, inevitably you have to scale, right?

I think, I think it’s In the beginning, it’s important to probably do things that don’t scale, like echoing programs essay. Like you know you you want to do things that you like just you want to go through the process of trying to do things yourself to learn.

Yeah, gain, an intuition for what’s actually needed and who you wanna hire for. So for instance, like, you know, we’ve gone through the process of like the founders go through, you know, trying to actually scope out the product roadmap, trying to iterate on the product roadmap, trying to iterate on like the UX’s of like any sort of higher level product that we’re building, like any sort of higher level product that we’re building, going through the process of obviously like being recruiters like being the people that hire the initial folks within the company. Doing a little bit a lot of user discovery and sales as well.

And so being able to, like, have dogs and good user conversations a week, really refining how you do discovery of like pain points issues and then how to filter that back into a product roadmap.

And then eventually once you have a product being able to like engage in the sales process. And so we’ve been doing all of that and I think throughout these realizations we actually realize like especially for a deeply tactical company, what you really need in the beginning is just engineering, engineering velocity and talent.

I think, and, I’m sure there’s other ways of doing this too. And, you know, we’re at a point where we do wanna look for our first and on time call hire. With so far as of the time this podcast. We have not made one, quite yet, but we are looking for one pretty soon, especially since we have the enterprise offering as well.

And we’re trying to scale up a lot of like BD efforts and kind of like, trying to make the whole like product process more efficient. But I think it’s like engineering is probably by far the most important thing in the beginning. At the engineering of the pace at which you’re able to ship and the like the quality of the things that you’re shipping.


And did I hear you correctly? You mentioned that you’re going for your first non-technical fire.

Yeah, yeah, so I think these days we’re starting to look for our first non technical hire. Is that right?


Yeah. Yeah. And keep in mind, it’s been pretty like a lot of people look for that either like that’s part of the founding team or in the first like 5 or 10 hires. I think so we are definitely looking into it a little bit later than others. That said you know we’ve talked to a lot of companies that equivalent companies in the industry. That are like data infrared, that type of stuff. And I think it’s not, out of super out of band.

Marketing at LlamaIndex


Yeah. Okay, and so who does your marketing then? Because I love the LinkedIn post from LlamaIndex. So they are I actually enjoy them I read them I learn things here and there there are many open questions that I may have and I figured out aha I mean this is this is what I was looking for so who does it may have. And I figured out, aha, I mean, this is what I was looking for. So who does it? I thought that you have at least you have a social media marketing team.


No that’s me and our head of dev rel. Yeah.


Okay, that is wonderful. I love those posts. I mean, they are clearly, I mean, they They have the technical depth, but at the same time they are Easy enough for anyone to understand what is, you know, so for anyone who is technical enough and they want to learn, you’re doing a great job there.


Thanks for that. I think if you’re a developer facing company, the best marketing, people will be your engineers and developers and, people will be your engineers and developers and, it has to be founder, people will be your engineers and developers and, and it has to be founder allowed to.

I think a lot of like traditional marketing works well for used cases where they you have like domain expertise in like certain areas, but especially in this area, the most powerful folks where you can really evangelize products, deeply understand it and have people actually trust you is going to be the technical folks on your team.

Jerry at Princeton, corporate career and pivot to entrepreneurship


Right. So, as you said, So if your audience is technical, they expect technical correctness and utility. I mean, you cannot post fluff and you know, survive out there for too long.

So, earlier, when we started the podcast, you actually mentioned, that this was your weekend project. You’ve worked on it for a few months and then you suddenly decided and it took you about 7 years, I was looking at your journey. It took you about 7 years out of college, 7 to 8 years out of college. You were, part of the, I think you were leading the entrepreneurship club, at Princeton.

And then, so at no point in time you decided you were thinking about you wanted to do it, but you did not quite have the idea. Tell me about that journey. Right? So how was it? Right? So you worked for Quora, you worked for Apple, then you worked for Uber. All of these companies, 2Sigma, right? So our great companies. And then you did not actually decide. In between to do a start-up. I mean, were you waiting for the right opportunity? What was going on?


Yeah, it’s a good question. I sometimes I wonder that myself as well. Because I wonder there’s probably multiple paths to starting a company and this is just one path, right? And so I can only talk about the current experience, but you know, this isn’t really prescriptive on what is exactly is that, or starting a company?

By the way, I think in terms of internships, so it’s Apple or 2Sigma and in terms of full time, it was, so I worked that Quora full time for a year, did research residency and was a research scientist that Uber, ATG, so working on research for self-driving for about like 2 and a half years.

And I spent 3 years at my last company, which is robust in television. Right. So throughout this process, yeah, I’ve always wanted to do a startup. I mean, I think like it was something that just inherently excited me. I think, typically the Like the reason it excited me was I tend to not, I tend to have a bigger aversion to process just pure process oriented things and more, of a passion for being able to build stuff from scratch. Right?

And so I think that’s typically like that’s just something that I really enjoyed the whole like  to one phase and being able to have like full ownership for that whole process. In terms of like actually doing it, I’ve pretty I think like I remember throughout like college and then also during the earliest my career.

I think I’ve always wanted to do a company. I think actually I wanted to do a company before I was interested in AI. So I was part of the off. Princeton, you know, I did all these things where I was like meeting other people. They’re also interested in startups. And I actually only got into machine learning. My junior, senior year really of college.

So actually I got into a pretty late. And you know, I learned about like basic machine learning all the way to neural nets to like how to train them and how to do research project on them senior. And basically it was kinda like parallel and often times competing interest at the time because I knew I was interested in startups but the people that were doing startups were mostly people that were building like tools like like B2B, SAP.

But I was also, of course, like very interested in just technical problems like, and stuff, probably. But like really AI was a nice intersection of interest from like just like math, and coding.

And so I thought it was just cool on its own and I wanted to learn more about it. The issue I think fundamentally with my career was that because I was interested in machine learning, and at the time a lot of what like being interested in machine learning meant was that people would go and get a PhD and at the and like it was a persistent struggle for me, which is I wanted to go deeper on.

Research. I really want to understand what’s going on in machine learning world, but the PhD world just ain’t very detached from the world, like building companies and startups.

And so I never actually ended up going to grad school, to get a degree. But, you know, if you trace through my path, I was machine learning engineer, I Quora for my first year out of school. And then I did research afterwards. At Uber. So I definitely wanted to dabble into research.

But then I was always on the fence about whether or not I wanted to pursue like a full grad degree because I was also like, you know, eventually, you know, it’d be really nice to take some of these ideas and try to start a company.

And so to some extent. Like out like the fact that I start a company in the LLM space was really like a nice intersection, I think. And I think it was nice timing because after like, at this point, I think I built up a decent. I guess, portfolio and background of actually understanding machine learning at a deeply technical level.

But it was also ripe for, it was also an opportunity where there was a lot of product innovation to be had on top of LLMs. And so it was a nice intersection of these 2 fields that up until then seem somewhat disparate, right? And I think like, to, me it was just a really nice timing and opportunity to get into the right place at the right time.

I think before the, space there were obviously a plenty of startups in the machine learning space, but there was like a lot of companies In LLMops, which is why, you know, our previous company was primarily focused on like monitoring observability, evaluations, that type of stuff.

As well as like kind of end to end training platforms. But I don’t know. I think there was just like, your opportunities for just like pure disruption, then this like emerging outline boom where there simultaneously both really interesting research and also really interesting companies coming out of it.

So I think if I like just completely did this differently, it’s possible I just never went into ML at all and just purely did the startup row and I think if I did that I would have started way earlier in the start of space. But I think it’s because I have this too interest that I check me a little bit of time to like try to merge those 2 together.Yeah.


And out of the companies that you work for, are you met Simon at Uber? I’m assuming I saw that there’s an overlap in the period that you were working there. So out of all of these companies, I mean, which one did you enjoy the most and which one is actually I would say maybe a catalyst to what you’re doing right now.


Yeah, I mean I think that’s. I think all of them were I think all of them help in some way. And I think, in terms of the things that probably help the most, okay, what did I learn from, From Quora.

I think Quora was actually pretty fun. I mean, it was a machine learning job. It was my first machine when I drop out of college. I think I grew a lot in terms of understanding the space at a very general level, right? Like keep in mind I’d only just gone into the mall space basically the year before and so I think it was a nice to intro just springboard into a lot of things.

And so it was also at a startup where, like startup like setting, it was like 700 people where I’ve had a reason about a flexibility to lead and own things. So I think it was a nice intro drop.

What I learned from Uber, right? Cause the stuff I’m doing now has nothing to do with self driving research. I mean, I’m sure like a lot of that and off concepts like you like you, you learn, but in terms of the content, it’s not really self-driving related. I think what I learned primarily from Uber was just general and ML foundations. And the second thing is grit, I think, research is a tough job.

I think doing anything that’s related to pure AI research can be very top and a lot of PhDs feel in this way too. Where you’re basically working like your bought off right so it’s like for years just every day trying to like come up with like new techniques you’re competing with like other research institutions right for having like novel ideas and trying to get accepted at these top conferences.

And you’re working very very hard like I think people like are working on the weekends like you’re like these conference deadlines people are just pulling all nighters. And you really do learn grit and intensity and and and the like just the desire to make something succeed and push it through.

And I think that’s actual pretty good mindset to have, especially as you try to start a company. Right. It’s because it’s almost like irrational in a sense that if you really like research, you really like it. You’re willing to sacrifice other things to just like push something forward.

And, and then my last job, we’re both Caldons was just, you know, it actually taught me a bunch about like I was I joined when I was like it when it was 9 or 10 people and then it grew to 40-50 and it actually gave me a pretty good sense of just like the stuff you need to grow company.

And I think I took a lot of the skills, both in terms of, management skills as well as observing how teams formed around me to basically, carrying over some of those lessons.


Yeah, and so if you were employee number 9 or 10, It is almost that the challenges at that early stage, they are very unique, right? So as someone working for Microsoft or Google or Facebook, they will never ever understand, you know, what does it look like to work under a lot of ambiguity, a lot of constraints, right?

So, so there is so many things that it can teach you. And it’s probably a good. I would say good, for doing your own startup if you join that early, in a different startup.


I agree. I think in terms of hiring that we in terms of who we actually look for actually it is a good bonus if you either done your own startup or have worked at a very early stage company and scale that because then you have the right mindset that for what it takes to actually be in, a startup.


And in terms of, so you, mentioned that you met, Simon at Uber, the first time. So were you planning to do a startup right from there and right from the beginning? Hey Simon, let’s team up and do something. I mean, you know co-workers up and talk like this right so here let’s work on this idea or it just happened right so your weaken project and you call Simon up how did it all unfold?


Yeah, I mean it was part of the latter I think. Simon I have been friends. I, you know, knew a lot of great people at Uber and they were people that have high respect. They were very intelligent. They worked very hard. And I think Simon had the nice, I just, very, he was very strong, technically. And I knew he was a good mentor and I knew that he had a lot of qualities that I also probably liked.

He was more, he’s more of a structured thinker and able to think about things in a very like technical architecture type perspective and really ask the deep questions that I think is actually a really nice counterbalance to kind of me as a founder day where I tend to rely bit more on like intuition and quicker feedback.

And that’s probably just due to a function of the fact that the open source itself is just like very fast moving. And so I think when I first started this project. I wanted to think. Look at or find a co founder that was technically stronger than I was. And I think it was important that, you know, for the being this like because this is the Vover tooling that the other co-founder had to be technical.

And you know, for us to be equals like the cofounder would have to be more technical than I am. And so I, looked at a few people on the list and then I thought of Simon, and basically caught him up and said, do you want to do this? And he, yeah, I mean, he, it took a little bit of convincing, but he was very excited and I’m very excited to have him on board as a

Generative AI’s impact on society


That’s wonderful. So we are coming to, toward the end of, our discussion here. Let’s actually talk about society a bit now. In your opinion, what do you think is the biggest risk Generative AI and large language models can or will pose to us as a society. Any kind of risk that you can think of.


Yeah, I mean, I think there’s a few things. I think one top of mine thing is just like, I mean, like, I mean, there’s this whole access, central concern of like AGI, like destroying humans or whatever.

But I think just practically speaking, it’s There’s education. Like like economics like what’s gonna happen to like the marketplace of like workers like who’s whose drops are gonna be automated away like which supply which which job roles will just drink and supply right and and also I think Yeah, probably some like security concerns too.

I think I’ve probably thought a bit more about the education piece. I mean, I think one thing is Yeah, I mean like at the end of day like these things are just helping you do things and automate thought that you used to have to do yourself.

And people make arguments on whether or not like this is actually good for society. I think the pro LLM group typically says, oh, you know, like route history, there’s always been like devices that helped you like do things so you don’t have to do it yourself so you can focus on like higher level tab.

The issue is that I’ll start getting into like writing and reading and just like things that you do conceptually as like things that humans can do that like like basically other animals can’t do, which is like reasoning, right?

And once LLMs got into reasoning, Yeah, I’m not actually completely convinced by the argument that doesn’t make humans like potentially like laser or dumber over time because like if you’re able to just trust this thing to do things you don’t have to think critically yourself.

And I can’t actually imagine a world where this doesn’t actually just, like we like. Inhibit like the like, the development of like human body. And so that part still. Kind of unsure about. Because, you know, in schools like to be honest, at literally every kid’s using tractability right now.

And, and like, it basically it comes down to whether or not you’re really good at using Kadu or whether or not you can actually like, you know, write an essay on your own terms. And it’s like, then a question of, okay. Is it actually good that we’re becoming really reliant on this technology, that we can, you know, use it really well.

To, like, just do things without us deeply understand the concepts. And, yeah, at that part I’m still not really sure what, it, what, like, how to, that. It’s just, I still feel like it basically means that you don’t have to learn as many skills that you just get a little bit. And somehow that’s okay. So I’m still thinking about it. As you can see, it’s not like a fully wallport answer, but I’m still like on the fence now whether or not that’s actually good for education.


And regarding the human, so for the interruption here, right. So regarding humans becoming dumber, right? So but, devs use Stack Overflow. It doesn’t make them dumber, right? So and everyone uses Google. It doesn’t make them dumb, right?

So it’s it’s more about perhaps a new set of skills that people have to learn it will make them more efficient, but still not every devs can go and figure out the right answer from chat from Stack Overflow, right? So it’s almost. I see those parallels. It is just that maybe it has it has gone and let a notch up, but You know, we, we can definitely spend a lot of time on discussing this.


I think that’s true. I think that’s true, but also I think the need for doing like higher level reasoning itself, why actually just go down? I’m just thinking about the humans and like, I mean, definitely not political.

Do you think about the, humans don’t like Wally, right? Like the people, like you basically can just, exist in the States where you don’t really have to do much and things are just automated for you because that actually is part of the promise of, where you know, you just kind of sit back and let things like let this thing do stuff for you.

Right. And there’s a world where you use that to make yourselves more productive so you can, you do even better things, right? And then there’s a world where you just like. Don’t actually need to do those better things and you just sit back and like AI like satisfy your base human needs, right.

And so I think it’s kind of like, I’m not actually sure of the world, our goal right now for what it’s for by the way, you know, this is like, just like to, make sure that people don’t think, well, the next is just gonna like make everyone double.

That’s not really to the point where we’re just like automating away all human thoughts. I like to think we’re trying to just abstract away some of the boiler place of you being able to get information in the first place, which I think could totally be automated away. But I think the overall, I don’t know.

Exciting generative AI applications


Yeah, and let’s take the flip side of it, right? So what is the what is the area or impact in society. That you’re most excited about. You think that it is the it is going to be a positive impact.


Yeah, I mean, on the flip side, there is so much. Manual like routine work, like throughout all of enterprise that like like all like every company that does require quite a bit of repetition and and it’s just honestly quite painful to do. And in general, I’m pretty excited to see AI automate the things that are just honestly like this very manual in routes, right?

And I think to that, and I’m like, I do think that will probably take for instance like any sort of process automation of like let’s say you have like some spreadsheet you wanna translate it in some format and you want to do things like send an email, those types of things.

That’s not really like necessarily teaching humans new skills. It’s just like a tap that you have to perform repetitively like over and over again and some sort of like process automation using LLMs.

I think actually can just like make companies more efficient and also like humans do like higher level task right like basically make everybody more efficient than being able to solve problems that only they can solve.

And I think like the whole promise of like knowledge extraction and automation is that LLMs actually unlock. Insights over new sources of data. That you couldn’t unlock before. Now you can all of a sudden like connect, to like your entire collection of enterprise knowledge. Your, like PDF, to like your entire collection of enterprise knowledge, your, like PDF, your, SQL database.

And do it in a way without expanding like a ton of more like engineering resources and in that sense it basically makes the information available at everybody’s fingertips. Way better. So the flip side of AI is that it just makes everyone a much more information, like knowledge worker, right? So that everyone can just do any sort of like tap more efficiently.

Exciting developments at LlamaIndex


So, so one last question, Jerry. In terms of LlamaIndex. What is coming up in the next few months that are you are really excited about and and you want me to be excited about it as well.


Yeah, I mean, we just launch Llamacloud and LlamaParse, on Tuesday, which you might know. And so we launched this this past week and we are going to for the rest of this year focus like half and half on open source community development.

That part again will never go away and staying on top on, open source community development ecosystem and building the right tools for every developer to succeed. And then for the other half, we are going So for those of you who don’t know, Llama Cloud is our overall enterprise platform, focused on managed parsing and direction and retrieval.

This is currently open and like a private beta to a select number of companies. Lama Parse is a specific piece of Llamacloud. That is actually publicly available and we and it’s specifically a document processing, a document parsing for PDFs. We think this is the core stage of any RAG pipeline. If you don’t have well formatted data from your PDF, you’re not gonna have any sort of good LLM results from it. And so we are making this a public standalone API that anybody can use. And of course, we’re baking it into the rest of all the cloud.

And so, you know, generally speaking, if you’re a company with a lot of like complex fragments over your data, we’re gonna be adding more document sources to Lama Parse so that you can get value out of your data sources and, per our earlier discussion actually on kind of like the higher level abstractions for RAG, we are focused on solving just context augmentation with Llama Cloud.

So that you know you as a developer can stop worrying about trunk size and all these things just set up your data pipeline for things you want to address into an LLM, connect it to your storage system using along the cloud and then focus on building whatever LLM application you want to do, whether it’s question answering with RAG, whether it’s agents, those types of things. So we’re going to be building out that platform. A lot more in the next few months and hoping to make a more public release of One of Cloud available in the next few months.


Well, Jerry, thank you so much for your time. It was a pleasure having you.


Thanks, Raja. This was a great conversation.

Subscribe to our Podcast

Subscribe to our podcast for the latest insights in data science, AI and technology. Let us know if you would like to know about our webinars, tutorials, newsletters and more!

Join our growing community of 900K+