Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Data Science Blog

The Dojo for AI and Data enthusiasts
Huda Mahmood - Author
Huda Mahmood

In the rapidly growing digital world, AI advancement is driving the transformation toward improved automation, better personalization, and smarter devices. In this evolving AI landscape, every country is striving to make the next big breakthrough.

In this blog, we will explore the global progress of artificial intelligence, highlighting the leading countries of AI advancement in 2024.

Top 9 countries leading AI development in 2024


leaders in AI advancement
Leaders in AI advancement for 2024


Let’s look at the leading 9 countries that are a hub for AI advancement in 2024, exploring their contribution and efforts to excel in the digital world.

The United States of America

Providing a home to the leading tech giants, including OpenAI, Google, and Meta, the United States has been leading the global AI race. The contribution of these companies in the form of GPT-4, Llama 2, Bard, and other AI-powered tools, has led to transformational changes in the world of generative AI.

The US continues to hold its leading position in AI advancement in 2024 with its high concentration of top-tier AI researchers fueled by the tech giants operating from Silicon Valley. Moreover, government support and initiative fosters collaboration, promising the progress of AI in the future.

The recent development of the Biden administration focused on ethical considerations for AI is another proactive approach by the US to ensure suitable regulation of AI advancement. This focus on responsible AI development can be seen as a positive step for the future.


Explore the key trends of AI in digital marketing in 2024



The next leading player in line is China powered by companies like Tencent, Huawei, and Baidu. The new releases, including Tencent’s Hunyuan’s large language model and Huawei’s Pangu, are guiding the country’s AI advancements.

Strategic focus on specific research areas in AI, government funding, and a large population providing a massive database are some of the favorable features that promote the technological development of China in 2024.

Moreover, China is known for its rapid commercialization, bringing AI products rapidly to the market. A subsequent benefit of it is the quick collection of real-world data and user feedback, ensuring further refinement of AI technologies. Thus, making China favorable to make significant strides in the field of AI in 2024.


Large language model bootcamp

The United Kingdom

The UK remains a significant contributor to the global AI race, boasting different avenues for AI advancement, including DeepMind – an AI development lab. Moreover, it hosts world-class universities like Oxford, Cambridge, and Imperial College London which are at the forefront of AI research.

The government also promotes AI advancement through investment and incentives, fostering a startup culture in the UK. It has also led to the development of AI companies like Darktrace and BenevolentAI supported by an ecosystem that provides access to funding, talent, and research infrastructure.

Thus, the government’s commitment and focus on responsible AI along with its strong research tradition, promises a growing future for AI advancement.


With top AI-powered companies like Cohere, Scale AI, and Coveo operating from the country, Canada has emerged as a leading player in the world of AI advancement. The government’s focus on initiatives like the Pan-Canadian Artificial Intelligence Strategy has also boosted AI development in the country.

Moreover, the development of research hubs and top AI talent in institutes like the Montreal Institute for Learning Algorithms (MILA) and the Alberta Machine Intelligence Institute (AMII) promotes an environment of development and innovation. It has also led to collaborations between academia and industry to accelerate AI advancement.

Canada is being strategic about its AI development, focusing on sectors where it has existing strengths, including healthcare, natural resource management, and sustainable development. Thus, Canada’s unique combination of strong research capabilities, ethical focus, and collaborative environment positions it as a prominent player in the global AI race.


While not at the top like the US or China, France is definitely leading the AI research in the European Union region. Its strong academic base has led to the development of research institutes like Inria and the 3IA Institutes, prioritizing long-term advancements in the field of AI.

The French government also actively supports research in AI, promoting the growth of innovative AI startups like Criteo (advertising) and Owkin (healthcare). Hence, the country plays a leading role in focusing on fundamental research alongside practical applications, giving France a significant advantage in the long run.


India is quietly emerging as a significant player in AI research and technology as the Indian government pours resources into initiatives like ‘India AI’, fostering a skilled workforce through education programs. This is fueling a vibrant startup landscape where homegrown companies like SigTuple are developing innovative AI solutions.

What truly sets India apart is its focus on social impact as it focuses on using AI to tackle challenges like healthcare access in rural areas and improve agricultural productivity. India also recognizes the importance of ethical AI development, addressing potential biases to ensure the responsible use of this powerful technology.

Hence, the focus on talent, social good, and responsible innovation makes India a promising contributor to the world of AI advancement in 2024.

Learn more about the top AI skills and jobs in 2024


With an aging population and strict immigration laws, Japanese companies have become champions of automation. It has resulted in the country developing solutions with real-world AI implementation, making it a leading contributor to the field.

While they are heavily invested in AI that can streamline processes and boost efficiency, their approach goes beyond just getting things done. Japan is also focused on collaboration between research institutions, universities, and businesses, prioritizing safety, with regulations and institutes dedicated to ensuring trustworthy AI.

Moreover, the country is a robotics powerhouse, integrating AI to create next-gen robots that work seamlessly alongside humans. So, while Japan might not be the first with every breakthrough, they are surely leading the way in making AI practical, safe, and collaborative.


Germans are at the forefront of a new industrial revolution in 2024 with Industry 4.0. Tech giants like Siemens and Bosch using AI are using AI to supercharge factories with intelligent robots, optimized production lines, and smart logistics systems.

The government also promotes AI advancement through funding for collaborations, especially between academia and industry. The focus on AI development has also led to the initiation of startups like Volocopter, Aleph Alpha, DeepL, and Parloa.

However, the development is also focused on the ethical aspects of AI, addressing potential biases on the technology. Thus, Germany’s focus on practical applications, responsible development, and Industry 4.0 makes it a true leader in this exciting new era.


Explore a hands-on curriculum that helps you build custom LLM applications!



The country has made it onto the global map of AI advancement with its strategic approach towards research in the field. The government welcomes international researchers to contribute to their AI development. It has resulted in big names like Google setting up shop there, promoting open collaboration using cutting-edge open-source AI tools.

Some of its notable startups include Biofourmis, Near, Active.Ai, and Osome. Moreover, Singapore leverages AI for applications beyond the tech race. Their ‘Smart Nation’ uses AI for efficient urban planning and improved public services.

In addition to this, with its focus on social challenges and focusing on the ethical use of AI, Singapore has a versatile approach to AI advancement. It makes the country a promising contender to become a leader in AI development in the years to come.



The future of AI advancement

The versatility of AI tools promises a future for the field in all kinds of fields. From personalizing education to aiding scientific discoveries, we can expect AI to play a crucial role in all departments. Moreover, the focus of the leading nations on the ethical impacts of AI ensures an increased aim toward responsible development.

Hence, it is clear that the rise of AI is inevitable. The worldwide focus on AI advancement creates an environment that promotes international collaboration and democratization of AI tools. Thus, leading to greater innovation and better accessibility for all.

April 3
Fiza Author image
Fiza Fatima

Large Language Models are growing smarter, transforming how we interact with technology. Yet, they stumble over a significant quality i.e. accuracy. Often, they provide unreliable information or guess answers to questions they don’t understand—guesses that can be completely wrong. Read more

This issue is a major concern for enterprises looking to leverage LLMs. How do we tackle this problem? Retrieval Augmented Generation (RAG) offers a viable solution, enabling LLMs to access up-to-date, relevant information, and significantly improving their responses.

Tune in to our podcast and dive deep into RAG, fine-tuning, LlamaIndex and LangChain in detail!


Understanding Retrieval Augmented Generation (RAG)

RAG is a framework that retrieves data from external sources and incorporates it into the LLM’s decision-making process. This allows the model to access real-time information and address knowledge gaps. The retrieved data is synthesized with the LLM’s internal training data to generate a response.

Retrieval Augmented Generation (RAG) Pipeline

Read more: RAG and finetuning: A comprehensive guide to understanding the two approaches

The challenge of bringing RAG based LLM applications to production

Prototyping a RAG application is easy, but making it performant, robust, and scalable to a large knowledge corpus is hard.

There are three important steps in a RAG framework i.e. Data Ingestion, Retrieval, and Generation. In this blog, we will be dissecting the challenges encountered based on each stage of the RAG  pipeline specifically from the perspective of production, and then propose relevant solutions. Let’s dig in!

Stage 1: Data Ingestion Pipeline

The ingestion stage is a preparation step for building a RAG pipeline, similar to the data cleaning and preprocessing steps in a machine learning pipeline. Usually, the ingestion stage consists of the following steps:

  • Collect data
  • Chunk data
  • Generate vector embeddings of chunks
  • Store vector embeddings and chunks in a vector database

The efficiency and effectiveness of the data ingestion phase significantly influence the overall performance of the system.

Common Pain Points in Data Ingestion Pipeline

Blog | Data Science Dojo

Challenge 1: Data Extraction:

  • Parsing Complex Data Structures: Extracting data from various types of documents, such as PDFs with embedded tables or images, can be challenging. These complex structures require specialized techniques to extract the relevant information accurately.
  • Handling Unstructured Data: Dealing with unstructured data, such as free-flowing text or natural language, can be difficult.
Proposed solutions
  • Better parsing techniques:Enhancing parsing techniques is key to solving the data extraction challenge in RAG-based LLM applications, enabling more accurate and efficient information extraction from complex data structures like PDFs with embedded tables or images. Llama Parse is a great tool by LlamaIndex that significantly improves data extraction for RAG systems by adeptly parsing complex documents into structured markdown.
  • Chain-of-the-table approach:The chain-of-table approach, as detailed by Wang et al., https://arxiv.org/abs/2401.04398 merges table analysis with step-by-step information extraction strategies. This technique aids in dissecting complex tables to pinpoint and extract specific data segments, enhancing tabular question-answering capabilities in RAG systems.
  • Mix-Self-Consistency:
    Large Language Models (LLMs) can analyze tabular data through two primary methods:

    • Direct prompting for textual reasoning.
    • Program synthesis for symbolic reasoning, utilizing languages like Python or SQL.

    According to the study “Rethinking Tabular Data Understanding with Large Language Models” by Liu and colleagues, LlamaIndex introduced the MixSelfConsistencyQueryEngine. This engine combines outcomes from both textual and symbolic analysis using a self-consistency approach, such as majority voting, to attain state-of-the-art (SoTA) results. Below is an example code snippet. For further information, visit LlamaIndex’s complete notebook.

Large Language Models Bootcamp | LLM

Challenge 2: Picking the Right Chunk Size and Chunking Strategy:

  1. Determining the Right Chunk Size: Finding the optimal chunk size for dividing documents into manageable parts is a challenge. Larger chunks may contain more relevant information but can reduce retrieval efficiency and increase processing time. Finding the optimal balance is crucial.
  2. Defining Chunking Strategy: Deciding how to partition the data into chunks requires careful consideration. Depending on the use case, different strategies may be necessary, such as sentence-based or paragraph-based chunking.
Proposed Solutions:
  • Fine Tuning Embedding Models:

Fine-tuning embedding models plays a pivotal role in solving the chunking challenge in RAG pipelines, enhancing both the quality and relevance of contexts retrieved during ingestion.

By incorporating domain-specific knowledge and training on pertinent data, these models excel in preserving context, ensuring chunks maintain their original meaning.

This fine-tuning process aids in identifying the optimal chunk size, striking a balance between comprehensive context capture and efficiency, thus minimizing noise.

Additionally, it significantly curtails hallucinations—erroneous or irrelevant information generation—by honing the model’s ability to accurately identify and extract relevant chunks.

According to experiments conducted by Llama Index, fine-tuning your embedding model can lead to a 5–10% performance increase in retrieval evaluation metrics.

  • Use Case-Dependent Chunking

Use case-dependent chunking tailors the segmentation process to the specific needs and characteristics of the application. Different use cases may require different granularity in data segmentation:

    • Detailed Analysis: Some applications might benefit from very fine-grained chunks to extract detailed information from the data.
    • Broad Overview: Others might need larger chunks that provide a broader context, important for understanding general themes or summaries.
  • Embedding Model-Dependent Chunking

Embedding model-dependent chunking aligns the segmentation strategy with the characteristics of the underlying embedding model used in the RAG framework. Embedding models convert text into numerical representations, and their capacity to capture semantic information varies:

    • Model Capacity: Some models are better at understanding broader contexts, while others excel at capturing specific details. Chunk sizes can be adjusted to match what the model handles best.
    • Semantic Sensitivity: If the embedding model is highly sensitive to semantic nuances, smaller chunks may be beneficial to capture detailed semantics. Conversely, for models that excel at capturing broader contexts, larger chunks might be more appropriate.

Challenge 3: Creating a Robust and Scalable Pipeline:

One of the critical challenges in implementing RAG is creating a robust and scalable pipeline that can effectively handle a large volume of data and continuously index and store it in a vector database. This challenge is of utmost importance as it directly impacts the system’s ability to accommodate user demands and provide accurate, up-to-date information.

  1. Proposed Solutions
  • Building a modular and distributed system:

To build a scalable pipeline for managing billions of text embeddings, a modular and distributed system is crucial. This system separates the pipeline into scalable units for targeted optimization and employs distributed processing for parallel operation efficiency. Horizontal scaling allows the system to expand with demand, supported by an optimized data ingestion process and a capable vector database for large-scale data storage and indexing.

This approach ensures scalability and technical robustness in handling vast amounts of text embeddings.

Stage 2: Retrieval

Retrieval in RAG involves the process of accessing and extracting information from authoritative external knowledge sources, such as databases, documents, and knowledge graphs. If the information is retrieved correctly in the right format, then the answers generated will be correct as well. However, you know the catch. Effective retrieval is a pain, and you can encounter several issues during this important stage.

RAG Pain Paints and Solutions - Retrieval

Common Pain Points in Data Ingestion Pipeline

Challenge 1: Retrieved Data Not in Context

The RAG system can retrieve data that doesn’t qualify to bring relevant context to generate an accurate response. There can be several reasons for this.

  • Missed Top Rank Documents: The system sometimes doesn’t include essential documents that contain the answer in the top results returned by the system’s retrieval component.
  • Incorrect Specificity: Responses may not provide precise information or adequately address the specific context of the user’s query
  • Losing Relevant Context During Reranking: This occurs when documents containing the answer are retrieved from the database but fail to make it into the context for generating an answer.
Proposed Solutions:
  • Query Augmentation: Query augmentation enables RAG to retrieve information that is in context by enhancing the user queries with additional contextual details or modifying them to maximize relevancy. This involves improving the phrasing, adding company-specific context, and generating sub-questions that help contextualize and generate accurate responses
    • Rephrasing
    • Hypothetical document embeddings
    • Sub-queries
  • Tweak retrieval strategies: Llama Index offers a range of retrieval strategies, from basic to advanced, to ensure accurate retrieval in RAG pipelines. By exploring these strategies, developers can improve the system’s ability to incorporate relevant information into the context for generating accurate responses.
    • Small-to-big sentence window retrieval,
    • recursive retrieval
    • semantic similarity scoring.
  • Hyperparameter tuning for chunk size and similarity_top_k: This solution involves adjusting the parameters of the retrieval process in RAG models. More specifically, we can tune the parameters related to chunk size and similarity_top_k.
    The chunk_size parameter determines the size of the text chunks used for retrieval, while similarity_top_k controls the number of similar chunks retrieved.
    By experimenting with different values for these parameters, developers can find the optimal balance between computational efficiency and the quality of retrieved information.
  • Reranking: Reranking retrieval results before they are sent to the language model has proven to improve RAG systems’ performance significantly.
    By retrieving more documents and using techniques like CohereRerank, which leverages a reranker to improve the ranking order of the retrieved documents, developers can ensure that the most relevant and accurate documents are considered for generating responses. This reranking process can be implemented by incorporating the reranker as a postprocessor in the RAG pipeline.

Challenge 2: Task-Based Retrieval

If you deploy a RAG-based service, you should expect anything from the users and you should not just limit your RAG in production applications to only be highly performant for question-answering tasks.

Users can ask a wide variety of questions. Naive RAG stacks can address queries about specific facts, such as details on a company’s Diversity & Inclusion efforts in 2023 or the narrator’s activities at Google.

However, questions may also seek summaries (“Provide a high-level overview of this document”) or comparisons (“Compare X and Y”).

Different retrieval methods may be necessary for these diverse use cases.

Proposed Solutions
  • Query Routing: This technique involves retaining the initial user query while identifying the appropriate subset of tools or sources that pertain to the query. By routing the query to the suitable options, routing ensures that the retrieval process is fine-tuned to the specific tools or sources that are most likely to yield accurate and relevant information.

Challenge 3: Optimize the Vector DB to look for correct documents

The problem in the retrieval stage of RAG is about ensuring the lookup to a vector database effectively retrieves accurate documents that are relevant to the user’s query.

Hereby, we must address the challenge of semantic matching by seeking documents and information that are not just keyword matches, but also conceptually aligned with the meaning embedded within the user query.

Proposed Solutions:
  • Hybrid Search:

Hybrid search tackles the challenge of optimal document lookup in vector databases. It combines semantic and keyword searches, ensuring retrieval of the most relevant documents.

  • Semantic Search: Goes beyond keywords, considering document meaning and context for accurate results.
  • Keyword Search: Excellent for queries with specific terms like product codes, jargon, or dates.

Hybrid search strikes a balance, offering a comprehensive and optimized retrieval process. Developers can further refine results by adjusting weighting between semantic and keyword search. This empowers vector databases to deliver highly relevant documents, streamlining document lookup.

Challenge 4: Chunking Large Datasets

When we put large amounts of data into a RAG-based product we eventually have to parse and then chunk the data because when we retrieve info – we can’t really retrieve a whole pdf – but different chunks of it.

However, this can present several pain points.

  • Loss of Context: One primary issue is the potential loss of context when breaking down large documents into smaller chunks. When documents are divided into smaller pieces, the nuances and connections between different sections of the document may be lost, leading to incomplete representations of the content.
  • Optimal Chunk Size: Determining the optimal chunk size becomes essential to balance capturing essential information without sacrificing speed. While larger chunks could capture more context, they introduce more noise and require additional processing time and computational costs. On the other hand, smaller chunks have less noise but may not fully capture the necessary context.

Read more: Optimize RAG efficiency with LlamaIndex: The perfect chunk size

Proposed Solutions:
  • Document Hierarchies: This is a pre-processing step where you can organize data in a structured manner to improve information retrieval by locating the most relevant chunks of text.
  • Knowledge Graphs: Representing related data through graphs, enabling easy and quick retrieval of related information and reducing hallucinations in RAG systems.
  • Sub-document Summary: Breaking down documents into smaller chunks and injecting summaries to improve RAG retrieval performance by providing global context awareness.
  • Parent Document Retrieval: Retrieving summaries and parent documents in a recursive manner to improve information retrieval and response generation in RAG systems.
  • RAPTOR: RAPTOR recursively embeds, clusters, and summarizes text chunks to construct a tree structure with varying summarization levels. Read more
  • Recursive Retrieval: Retrieval of summaries and parent documents in multiple iterations to improve performance and provide context-specific information in RAG systems.

Challenge 5: Retrieving Outdated Content from the Database

Imagine a RAG app working perfectly for 100 documents. But what if a document gets updated? The app might still use the old info (stored as an “embedding”) and give you answers based on that, even though it’s wrong.

Proposed Solutions:
  • Meta-Data Filtering: It’s like a label that tells the app if a document is new or changed. This way, the app can always use the latest and greatest information.

Stage 3: Generation

While the quality of the response generated largely depends on how good the retrieval of information was, there still are tons of aspects you must consider. After all, the quality of the response and the time it takes to generate the response directly impacts the satisfaction of your user.

RAG Pain Points - Generation Stage

Challenge 1: Optimized Response Time for User

The prompt response to user queries is vital for maintaining user engagement and satisfaction.

Proposed Solutions:
  1. Semantic Caching: Semantic caching addresses the challenge of optimizing response time by implementing a cache system to store and quickly retrieve pre-processed data and responses. It can be implemented at two key points in an RAG system to enhance speed:
    • Retrieval of Information: The first point where semantic caching can be implemented is in retrieving the information needed to construct the enriched prompt. This involves pre-processing and storing relevant data and knowledge sources that are frequently accessed by the RAG system.
    • Calling the LLM: By implementing a semantic cache system, the pre-processed data and responses from previous interactions can be stored. When similar queries are encountered, the system can quickly access these cached responses, leading to faster response generation.

Challenge 2: Inference Costs

The cost of inference for large language models (LLMs) is a major concern, especially when considering enterprise applications.

Some of the factors that contribute to the inference cost of LLMs include context window size, model size, and training data.

Proposed Solutions:

  1. Minimum viable model for your use case: Not all LLMs are created equal. There are models specifically designed for tasks like question answering, code generation, or text summarization. Choosing an LLM with expertise in your desired area can lead to better results and potentially lower inference costs because the model is already optimized for that type of work.
  2. Conservative Use of LLMs in Pipeline: By strategically deploying LLMs only in critical parts of the pipeline where their advanced capabilities are essential, you can minimize unnecessary computational expenditure. This selective use ensures that LLMs contribute value where they’re most needed, optimizing the balance between performance and cost.

Challenge 3: Data Security

The problem of data security in RAG systems refers to the concerns and challenges associated with ensuring the security and integrity of Language Models LLMs used in RAG applications. As LLMs become more powerful and widely used, there are ethical and privacy considerations that need to be addressed to protect sensitive information and prevent potential abuses.

These include:

    • Prompt injection
    • Sensitive information disclosure
    • Insecure outputs

Proposed Solutions: 

  1. Multi-tenancy: Multi-tenancy is like having separate, secure rooms for each user or group within a large language model system, ensuring that everyone’s data is private and safe.It makes sure that each user’s data is kept apart from others, protecting sensitive information from being seen or accessed by those who shouldn’t.By setting up specific permissions, it controls who can see or use certain data, keeping the wrong hands off of it. This setup not only keeps user information private and safe from misuse but also helps the LLM follow strict rules and guidelines about handling and protecting data.
  1. NeMo Guardrails: NeMo Guardrails is an open-source security toolset designed specifically for language models, including large language models. It offers a wide range of programmable guardrails that can be customized to control and guide LLM inputs and outputs, ensuring secure and responsible usage in RAG systems.

Ensuring the Practical Success of the RAG Framework

This article explored key pain points associated with RAG systems, ranging from missing content and incomplete responses to data ingestion scalability and LLM security. For each pain point, we discussed potential solutions, highlighting various techniques and tools that developers can leverage to optimize RAG system performance and ensure accurate, reliable, and secure responses.

By addressing these challenges, RAG systems can unlock their full potential and become a powerful tool for enhancing the accuracy and effectiveness of LLMs across various applications.

March 29
Huda Mahmood - Author
Huda Mahmood

Knowledge graphs and LLMs are the building blocks of the most recent advancements happening in the world of artificial intelligence (AI). Combining knowledge graphs (KGs) and LLMs produces a system that has access to a vast network of factual information and can understand complex language.

The system has the potential to use this accessibility to answer questions, generate textual outputs, and engage with other NLP tasks. This blog aims to explore the potential of integrating knowledge graphs and LLMs, navigating through the promise of revolutionizing AI.

Introducing knowledge graphs and LLMs

Before we understand the impact and methods of integrating KGs and LLMs, let’s visit the definition of the two concepts.

What are knowledge graphs (KGs)?

They are a visual web of information that focuses on connecting factual data in a meaningful manner. Each set of data is represented as a node with edges building connections between them. This representational storage of data allows a computer to recognize information and relationships between the data points.

KGs organize data to highlight connections and new relationships in a dataset. Moreover, it enabled improved search results as knowledge graphs integrate the contextual information to provide more relevant results.


Large language model bootcamp

What are large language models (LLMs)?

LLMs are a powerful tool within the world of AI using deep learning techniques for general-purpose language generation and other natural language processing (NLP) tasks. They train on massive amounts of textual data to produce human-quality texts.

Large language models have revolutionized human-computer interactions with the potential for further advancements. However, LLMs are limited in the factual grounding of their results. It makes LLMs able to produce high-quality and grammatically accurate results that can be factually inaccurate.


knowledge graphs and LLMs
An overview of knowledge graphs and LLMs – Source: arXiv


Combining KGs and LLMs

Within the world of AI and NLP, integrating the concepts of KGs and LLMs has the potential to open up new avenues of exploration. While knowledge graphs cannot understand language, they are good at storing factual data. Unlike KGs, LLMs excel in language understanding but lack factual grounding.

Combining the two entities brings forward a solution that addresses the weaknesses of both. The strengths of KGs and LLMs cover each concept’s limitations, producing more accurate and better-represented results.

Frameworks to combine KGs and LLMs

It is one thing to talk about combining knowledge graphs and large language models, implementing the idea requires planning and research. So far, researchers have explored three different frameworks aiming to integrate KGs and LLMs for enhanced outputs.

In this section, we will explore these three frameworks that are published as a paper in IEEE Transactions on Knowledge and Data Engineering.


Frameworks for integrating KGs and LLMs
Frameworks for integrating KGs and LLMs – Source: arXiv


KG-enhanced LLMs

This framework focuses on using knowledge graphs for training LLMs. The factual knowledge and relationship links in the KGs become accessible to the LLMs in addition to the traditional textual data during the training phase. A LLM can then learn from the information available in KGs.

As a result, LLMs can get a boost in factual accuracy and grounding by incorporating the data from KGs. It will also enable the models to fact-check the outputs and produce more accurate and informative results.

LLM-augmented KGs

This design shifts the structure of the first framework. Instead of KGs enhancing LLMs, they leverage the reasoning power of large language models to improve knowledge graphs. It makes LLMs smart assistants to improve the output of KGs, curating their information representation.

Moreover, this framework can leverage LLMs to find problems and inconsistencies in information connections of KGs. The high reasoning of LLMs also enables them to infer new relationships in a knowledge graph, enriching its outputs.

This builds a pathway to create more comprehensive and reliable knowledge graphs, benefiting from the reasoning and inference abilities of LLMs.


Explore data visualization – the best way to communicate


Synergized LLMs + KGs

This framework proposes a mutually beneficial relationship between the two AI components. Each entity works to improve the other through a feedback loop. It is designed in the form of a continuous learning cycle between LLMs and KGs.

It can be viewed as a concept that combines the two above-mentioned frameworks into a single design where knowledge graphs enhance language model outputs and LLMs analyze and improve KGs.

It results in a dynamic cycle where KGs and LLMs constantly improve each other. The iterative design of this integration framework leads to a more powerful and intelligent system overall.

While we have looked at the three different frameworks of integration of KGs and LLMs, the synergized LLMs + KGs is the most advanced approach in this field. It promises to unlock the full potential of both entities, supporting the creation of superior AI systems with enhanced reasoning, knowledge representation, and text generation capabilities.


Explore a hands-on curriculum that helps you build custom LLM applications!


Future of LLM and KG integration

Combining the powers of knowledge graphs and large language models holds immense potential in various fields. Some plausible possibilities are discussed below.

Educational revolution

With access to knowledge graphs, LLMs can generate personalized educational content for students, encompassing a wide range of subjects and topics. The data can be used to generate interactive lessons, provide detailed feedback, and answer questions with factual accuracy.

Enhancing scientific research

The integrated frameworks provide an ability to analyze vast amounts of scientific data, identify patterns, and even suggest new hypotheses. The combination has the potential to accelerate scientific research across various fields.



Intelligent customer service

With useful knowledge representations of KGs, LLMs can generate personalized and more accurate support. It will also enhance their ability to troubleshoot issues and offer improved recommendations, providing an intelligent customer experience to the users of any enterprise.

Thus, the integration of knowledge graphs and LLMs has the potential to boost the development of AI-powered tasks and transform the field of NLP.

March 28
Dua Mahboob

If I were to ask you, can Generative AI in education outperform students in competitive assessments like that of Harvard’s or Stanford’s, what would your answer be? Maybe? Let me tell you, the answer is yes.

That’s the exciting world of generative AI, shaking things up everywhere across the globe, be it logical assessments, medical exams, or a thought-provoking essay at the Ivy Leagues.   

Read: Chatbot vs Medical Student Performance on Clinical Reasoning Examinations 

Now, before you imagine robots taking over classrooms, hold on! Generative AI isn’t here to replace humans, it’s more of a super-powered sidekick for education.

From unequal access to education to stressed-out teachers and confused students, the education landscape faces a lot of challenges. Generative AI isn’t here to steal anyone’s job, but maybe, it can help us fix the problems, ushering in a new era of learning and creativity.

Should ChatGPT be banned in schools? 

Role of AI in Education

Here’s how generative AI is reshaping the education landscape: 

Personalized learning

Traditionally, education has relied on a standardized approach. This “one-size-fits-all” method often leaves students behind or bored, failing to cater to their individual learning styles and paces. Generative AI disrupts this model by tailoring the education experience to individual students’ needs.  

With the help of vast amounts of data, it adapts the learning content, pace, and style to suit the strengths, weaknesses, and preferences of each learner, ensuring that no student is left behind.

This personalized approach accommodates different learning styles, such as visual, auditory, reading-writing, or kinesthetic, ensuring that students receive tailored support based on their unique preferences and abilities, while also providing immediate feedback and support. 

AI in Action

For instance, Duolingo leverages generative AI to create personalized learning experiences for young children. The app tailors its content based on a child’s progress, offering interactive activities, games, and even AI-generated stories that reinforce learning. In addition, Khan Academy has launched Khanmigo, an AI tutor that assists young students in various subjects on its platform.

AI in education - within the ed-tech landscape
Popular Generative AI Applications in the EdTech Landscape – Source: Reach Capital

Accessibility and Inclusivity: Breaking Barriers for All

Traditionally, access to quality education has been heavily reliant on individuals’ geographical access and socio-economic background. Generative AI disrupts this norm by delivering high-quality educational resources directly to students, regardless of their backgrounds.

Now, people in remote areas with limited access to knowledge bases, diverse learning environments, and styles, can leverage Generative AI, for personalized tutoring and learning. 

Generative AI further promotes inclusivity and global collaboration by facilitating language learning through the translation of educational content into multiple languages and adapting materials to fit local cultural contexts. It plays a crucial role in developing inclusive and accessible educational content suitable for diverse learner populations. 

Moreover, Generative AI can be personalized to support students with special needs by providing customized learning experiences through assistive functions and communication technologies. This ensures that students with diverse requirements have access to top-quality learning materials.

Curious how generative AI is reshaping the education landscape? Learn what an expert educator has to say!

AI in Action 

For instance, Dreamreader is an AI-powered platform that tailors reading experiences to a student’s reading level and interests. It generates personalized stories with adjustable difficulty, keeping students engaged and motivated to improve their reading skills. 

As technology becomes more accessible, platforms are emerging that enable anyone, even those without coding skills, to create their own “Chat GPT bots,” opening doors of accessibility for all.

Beyond Textbooks: Immersive Learning Adventures

Generative AI has also fostered the emergence of hybrid schools, virtual classrooms, remote learning, and micro-learning, allowing students to access education beyond the confines of a traditional classroom, and opening up a world of limitless learning opportunities. 

Generative AI can transport students to the heart of historical events, conduct virtual experiments in a simulated lab, or even practice a new language with an AI-powered conversation partner. 

AI in Action

Platforms like Historyverse and Hellohistory.AI are prime examples. This AI-powered platform allows students to step into historical simulations, interacting with virtual characters and environments to gain a deeper understanding of the past. 

Explore the 2024 trends of AI in marketing

Support for Educators: AI as a Partner in Progress

Far from replacing teachers, generative AI is here to empower them. With personalized lesson planning and content creation, AI-assisted evaluation and feedback, intelligent tutoring systems, and virtual teaching assistants, AI can free up valuable teacher time.

This allows educators to focus on what they do best: fostering student engagement, providing personalized instruction, and pursuing professional development. In a future where AI can be a leading source of disseminating information and taking the lead in delivering information, it becomes crucial to reconsider our approach towards education.

Rather than sticking to traditional classrooms, picture a flipped classroom model, a hybrid learning setup where students can engage in remote self-learning and use physical classrooms for interactive group activities and collaborative learning. It’s all about blending the best of both worlds for a more effective and engaging educational experience. 

Generative AI is reshaping the roles and dynamics of the education system, encouraging educators to evolve from knowledge deliverers to facilitators. They need to become mentors who guide and encourage student agency, fostering a collaborative environment built on co-agency and collective intelligence.


Large language model bootcamp

AI in Action

Take a look at GradeScope, a product by Turnitin, a real-world example of generative AI empowering teachers. This platform uses AI to automate the time-consuming task of grading written assignments. Teachers upload student work, and GradeScope utilizes AI to analyze handwriting, identify key concepts, and even provide students with initial grading and personalized feedback.

This frees up valuable teacher time, allowing them to focus on more individualized instruction, like one-on-one conferences or in-depth discussions about student writing. This is the power of generative AI as a partner in education – it empowers teachers to do what they do best: inspire, guide, and unlock the potential in every student

Here’s what every educator must know!

Shift towards Metacognitive Continuous Learning

Generative AI is ushering in a new era of “metacognitive continuous learning”. This approach to assessment focuses on students’ ability to understand, monitor, and regulate their cognitive and metacognitive processes, making it an integral part of the learning process.

In metacognitive continuous learning, students not only acquire knowledge but also reflect on their learning strategies and adapt them as needed. They actively engage in self-regulation to optimize their learning experience and become aware of their thinking processes.  

AI systems help students recognize their strengths and weaknesses, suggest strategies for improvement, and promote a deeper understanding of the subject matter. By leveraging AI-supported feedback, students develop essential skills for lifelong learning.

This shift represents a move away from traditional tests that measure memory recall or specific skills and towards a more student-centered and flexible approach to learning, making students self-directed learners.

It recognizes that learning is not just about acquiring knowledge but also about understanding how we think and continuously improving our learning strategies and focusing on personal growth.

Read about the game-changing moments in AI during 2023

Critical Skills to Survive and Thrive in an AI-driven World

While generative AI offers a treasure trove of educational content, it’s crucial to remember that information literacy is essential. Students need to develop the ability to critically evaluate AI-generated content, assessing its accuracy, and biases, leveraging AI to augment their own capabilities rather than blindly relying on it.

Here is a range of key skills that learners need to develop to thrive and adapt. These skills include: 

Critical Thinking: Learners must develop the ability to analyze information, evaluate its credibility, and make informed decisions. Critical thinking allows individuals to effectively navigate the vast amount of data and AI-generated content available. 

Problem-solving: AI presents new challenges and complexities. Learners need to be able to identify and define problems, think creatively, and develop innovative solutions. Problem-solving skills enable individuals to leverage AI technology to address real-world issues. 

Adaptability: The rapid pace of technological change requires learners to be adaptable. They must embrace change, learn new tools and technologies quickly, and be willing to continuously evolve their knowledge and skills. 

Data and AI Literacy: With AI generating vast amounts of data, learners need to develop the ability to understand, interpret, and analyze data so that they can make data-driven decisions and leverage AI technologies effectively. They must also possess AI literacy skills to navigate AI-driven platforms, understand the ethical implications of AI, and effectively use digital tools for learning and work.  

The Human Edge: Fostering Creativity, Emotional Intelligence, and Intuition: While AI excels at crunching numbers and following patterns, certain qualities remain uniquely human and will continue to be valuable in the age of AI. AI can generate content, but it takes human imagination to truly push boundaries and come up with groundbreaking ideas.

Our ability to empathize, build relationships, and navigate complex social situations will remain crucial for success in various fields. In addition, the ability to tap into our intuition and make gut decisions can be a valuable asset, even in the age of data-driven decision-making.

Can AI truly replace humans? Let’s find out now

Effectively Leveraging Generative AI for Education: The PAIR Framework

To equip students with critical thinking and problem-solving skills in the age of AI, the PAIR framework is a very useful tool. This four-step approach integrates generative AI tools into assignments, encouraging students to actively engage with the technology. 

  1. Problem Formulation:

The journey begins with students defining the problem or challenge they want to tackle. This initial step fosters critical thinking and sets the stage for their AI-powered exploration. 

  1. AI Tool Selection:

Students become discerning consumers of technology by learning to explore, compare, and evaluate different generative AI tools. Understanding available features allows them to choose the most appropriate tool for their specific problem. 

  1. Interaction:

Armed with their chosen AI tool, students put their problem-solving skills to the test. They experiment with various inputs and outputs, observing how the tool influences their approach and the outcome. 

  1. Reflection:

The final step involves critical reflection. Students assess their experience with the generative AI tool, reporting on its strengths, weaknesses, and overall impact on their learning process. This reflection solidifies their understanding and helps them become more self-aware learners. 

By incorporating the PAIR framework, students develop the skills necessary to navigate the world of AI, becoming not just passive users, but empowered learners who can leverage technology to enhance their problem-solving abilities.

the PAIR framework model
The PAIR framework model – Source: Harvard Business Publishing

The Road Ahead: Challenges, Considerations, and Responsible Implementation

As with any new technology, generative AI comes with its own set of challenges. Ensuring that AI systems are trained on unbiased data sets is crucial to prevent perpetuating stereotypes or misinformation. Additionally, it’s important to remember that the human element remains irreplaceable in education. 

Academic Dishonesty

AI tools can be misused for plagiarism, with students using them to generate essays or complete assignments without truly understanding the content.

Rather than outright banning these tools, educational institutions need to promote ethical and responsible AI usage. This entails establishing transparent guidelines and policies to deter dishonest or unethical practices.

Accuracy and Bias

Generative AI models are trained on vast amounts of data, which can perpetuate biases or inaccuracies present in that data. They are often trained on datasets that may not adequately represent the cultural and contextual diversity of different regions.

This can lead to a lack of relevance and inclusivity in AI-generated content. Uncritical use of AI-generated content could lead students to faulty information.

In addition, localization efforts are needed to ensure that generative AI systems are sensitive to cultural nuances and reflect diverse perspectives. 

Overdependence on Technology

Overreliance on AI tools for learning can hinder critical thinking and problem-solving skills. Students may become accustomed to having solutions generated for them, rather than developing the ability to think independently.

Educating users about AI’s limitations, potential risks, and responsible usage, becomes extremely important. It is important to promote AI as a tool designed to augment human capabilities rather than holding them back.

Explore a hands-on curriculum that helps you build custom LLM applications!

Readiness Disparities

While generative AI offers tremendous potential for improving accessibility and inclusion in education, on some occasions, it can also exacerbate existing disparities.

The integration of generative AI hinges on “technological readiness” – meaning adequate infrastructure, reliable internet access, proper training, and digital literacy.

These factors can vary greatly between regions and countries. Unequal access to these resources could create a situation where generative AI widens, rather than shrinks, the educational gap between developed and developing nations.

These disparities must be addressed to ensure that generative AI reaches all students, regardless of their background, ensuring a more equitable society.  

Way Forward: A Balanced Approach

Market projection of AI in education
Market projection of AI in education – Source: Yahoo Finance

Generative AI undoubtedly holds the potential to reshape the education landscape, by providing personalized learning, improving content, automating tasks, and reducing barriers to education.

To successfully leverage these benefits, a balanced approach is necessary that promotes responsible integration of AI in educational settings, while preserving the human touch. Moreover, it is crucial to empower educators and learners with the relevant skills and competencies to effectively utilize Generative AI while also fostering dialogue and collaboration among stakeholders.

By striking a balance between leveraging its potential benefits and mitigating the associated risks, the equitable integration of Generative AI in education can be achieved, creating a dynamic and adaptive learning environment that empowers students for the future.

March 27
Huda Mahmood - Author
Huda Mahmood

Natural language processing (NLP) and large language models (LLMs) have been revolutionized with the introduction of transformer models. These refer to a type of neural network architecture that excels at tasks involving sequences.

While we have talked about the details of a typical transformer architecture, in this blog we will explore the different types of the models.

How to categorize transformer models?

Transformers ensure the efficiency of LLMs in processing information. Their role is critical to ensure improved accuracy, faster training on data, and wider applicability. Hence, it is important to understand the different model types available to choose the right one for your needs.


Large language model bootcamp

However, before we delve into the many types of transformer models, it is important to understand the basis of their classification.

Classification by transformer architecture

The most fundamental categorization of transformer models is done based on their architecture. The variations are designed to perform specific tasks or cater to the limitations of the base architecture. The very common model types under this category include encoder-only, decoder-only, and encoder-decoder transformers.

Categorization based on pre-training approaches

While architecture is a basic component of consideration, the training techniques are equally crucial components for transformers. Pre-training approaches refer to the techniques used to train a transformer on a general dataset before finetuning it to perform specific tasks.

Some common approaches that define classification under this category include Masked Language Models (MLMs), autoregressive models, and conditional transformers.

This presents a general outlook on classifying transformer models. While we now know the types present under each broader category, let’s dig deeper into each transformer model type.


Read in detail about transformer architectures


Architecture-based classification


Architecture of transformer models
The general architecture of transformer models


Encoder-only transformer

As the name suggests, this architectural type uses only the encoder part of the transformer, focusing on encoding the input sequence. For this model type, understanding the input sequence is crucial while generating an output sequence is not required.

Some common applications of an encoder-only transformer include:

Text classification

It is focused on classifying the input data based on defined parameters. It is often used in email spam filters to categorize incoming emails. The transformer model can also train over the patterns for effective filtration of unwanted messages.

Sentimental analysis

This feature makes it an appropriate choice for social media companies to analyze customer feedback and their emotion toward a service or product. It provides useful data insights, leading to the creation of effective strategies to enhance customer satisfaction.

Anomaly detection

It is particularly useful for finance companies. The analysis of financial transactions allows the timely detection of anomalies. Hence, possible fraudulent activities can be addressed promptly.

Other uses of an encoder-only transformer include question-answering, speech recognition, and image captioning.

Decoder-only transformer

It is a less common type of transformer model that uses only the decoder component to generate text sequences based on input prompts. The self-attention mechanism allows the model to focus on previously generated outputs in the sequence, enabling it to refine the output and create more contextually aware results.

Some common uses of decoder-only transformers include:

Text summarization

It can iteratively generate textual summaries of the input, focusing on including the important aspects of information.

Text generation

It builds on a provided prompt to generate relevant textual outputs. The results cover a diverse range of content types, like poems, codes, and snippets. It is capable of iterating the process to create connected and improved responses.


It is useful to handle conversational interactions via chatbots. The decoder can also consider previous conversations to formulate relevant responses.


Explore the role of attention mechanism in transformers


Encoder-decoder Transformer

This is a classic architectural type of transformer, efficiently handling sequence-to-sequence tasks, where you need to transform one type of sequence (like text) into another (like a translation or summary). An encoder processes the input sequence while a decoder is used to generate an output sequence.

Some common uses of an encoder-decoder transformer include:

Machine translation

Since the sequence is important at both the input and output, it makes this transformer model a useful tool for translation. It also considers contextual references and relationships between words in both languages.

Text summarization

While this use overlaps with that of a decoder-only transformer, text summarization differs from an encoder-decoder transformer due to its focus on the input sequence. It enables the creation of summaries that focus on relevant aspects of the text highlighted in an input prompt.


It is important to understand the question before providing a relevant answer. An encoder-decoder transformer allows this focus on both ends of the communication, ensuring each question is understood and answered appropriately.

This concludes our exploration of architecture-based transformer models. Let’s explore the classification from the lens of pre-training approaches.

Categorization based on pre-training approaches

While the architectural differences provide a basis for transformer types, the models can be further classified based on their techniques of pre-training.

Let’s explore the various transformer models segregated based on pre-training approaches.

Masked Language Models (MLMs)

Models with this pre-training approach are usually encoder-only in architecture. They are trained to predict a masked word in a sentence based on the contextual information of the surrounding words. The training enables these model types to become efficient in understanding language relationships.

Some common MLM applications are:

Boosting downstream NLP tasks

MLMs train on massive datasets, enabling the models to develop a strong understanding of language context and relationships between words. This knowledge enables MLM models to contribute and excel in diverse NLP applications.

General-purpose NLP tool

The enhanced learning, knowledge, and adaptability of MLMs make them a part of multiple NLP applications. Developers leverage this versatility of pre-trained MLMs to build a basis for different NLP tools.

Efficient NLP development

The pre-trained foundation of MLMs reduces the time and resources needed for the deployment of NLP applications. It promotes innovation, faster development, and efficiency.


Explore a hands-on curriculum that helps you build custom LLM applications!


Autoregressive models

Typically built using a decoder-only architecture, this pre-training model is used to generate sequences iteratively. It can predict the next word based on the previous one in the text you have written. Some common uses of autoregressive models include:

Text generation

The iterative prediction from the model enables it to generate different text formats. From codes and poems to musical pieces, it can create all while iteratively refining the output as well.


The model can also be utilized in a conversational environment, creating engaging and contextually relevant responses,

Machine translation

While encoder-decoder models are commonly used for translation tasks, some languages with complex grammatical structures are supported by autoregressive models.

Conditional transformer

This transformer model incorporates the additional information of a condition along with the main input sequence. It enables the model to generate highly specific outputs based on particular conditions, ensuring more personalized results.

Some uses of conditional transformers include:

Machine translation with adaptation

The conditional aspect enables the model to set the target language as a condition. It ensures better adjustment of the model to the target language’s style and characteristics.

Summarization with constraints

Additional information allows the model to generate summaries of textual inputs based on particular conditions.

Speech recognition with constraints

With the consideration of additional factors like speaker ID or background noise, the recognition process enhances to produce improved results.

Future of transformer model types

While numerous transformer model variations are available, the ongoing research promises their further exploration and growth. Some major points of further development will focus on efficiency, specialization for various tasks, and integration of transformers with other AI techniques.

Transformers can also play a crucial role in the field of human-computer interaction with their enhanced capabilities. The growth of transformers will definitely impact the future of AI. However, it is important to understand the uses of each variation of a transformer model before you choose the one that fits your requirements.

March 23
Ayesha Imran

In the dynamic field of artificial intelligence, Large Language Models (LLMs) are groundbreaking innovations shaping how we interact with digital environments. These sophisticated models, trained on vast collections of text, have the extraordinary ability to comprehend and generate text that mirrors human language, powering a variety of applications from virtual assistants to automated content creation.

The essence of LLMs lies not only in their initial training but significantly in fine-tuning, a crucial step to refine these models for specialized tasks and ensure their outputs align with human expectations.

Introduction to finetuning

Finetuning LLMs involves adjusting pre-trained models to perform specific functions more effectively, enhancing their utility across different applications. This process is essential because, despite the broad knowledge base acquired through initial training, LLMs often require customization to excel in particular domains or tasks.


Explore the concept of finetuning in detail here


For instance, a model trained on a general dataset might need fine-tuning to understand the nuances of medical language or legal jargon, making it more relevant and effective in those contexts.

Enter Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), two leading methodologies for finetuning LLMs. RLHF utilizes a sophisticated feedback loop, incorporating human evaluations and a reward model to guide the AI’s learning process.

On the other hand, DPO adopts a more straightforward approach, directly applying human preferences to influence the model’s adjustments. Both strategies aim to enhance model performance and ensure the outputs are in tune with user needs, yet they operate on distinct principles and methodologies.


Large language model bootcamp

This blog post aims to unfold the layers of RLHF and DPO, drawing a comparative analysis to elucidate their mechanisms, strengths, and optimal use cases.

Understanding these fine-tuning methods paves the path to deploying LLMs that not only boast high performance but also resonate deeply with human intent and preferences, marking a significant step towards achieving more intuitive and effective AI-driven solutions. 

Examples of how fine-tuning improves performance in practical applications

  • Customer Service Chatbots: Fine-tuning an LLM on customer service transcripts can enhance its ability to understand and respond to user queries accurately, improving customer satisfaction. 
  • Legal Document Analysis: By fine-tuning on legal texts, LLMs can become adept at navigating complex legal language, aiding in tasks like contract review or legal research. 
  • Medical Diagnosis Support: LLMs fine-tuned with medical data can assist healthcare professionals by providing more accurate information retrieval and patient interaction, thus enhancing diagnostic processes.

Delving into reinforcement learning from human feedback (RLHF)

Explanation of RLHF and its components

Reinforcement Learning from Human Feedback (RLHF) is a technique used to fine-tune AI models, particularly language models, to enhance their performance based on human feedback.

The core components of RLHF include the language model being fine-tuned, the reward model that evaluates the language model’s outputs, and the human feedback that informs the reward model. This process ensures that the language model produces outputs more aligned with human preferences.

Theoretical foundations of RLHF

RLHF is grounded in reinforcement learning, where the model learns from actions rather than from a static dataset.

Unlike supervised learning, where models learn from labeled data or unsupervised learning, where models identify patterns in data, reinforcement learning models learn from the consequences of their actions, guided by rewards. In RLHF, the “reward” is determined by human feedback, which signifies the model’s success in generating desirable outputs.


The RLHF process for finetuning LLMs
The RLHF process – Source: AI Changes Everything


Four-step process of RLHF

  1. Pretraining the language model with self-supervision

  • Data Gathering: The process begins by collecting a vast and diverse dataset, typically encompassing a wide range of topics, languages, and writing styles. This dataset serves as the initial training ground for the language model. 
  • Self-Supervised Learning: Using this dataset, the model undergoes self-supervised learning. Here, the model is trained to predict parts of the text given other parts. For instance, it might predict the next word in a sentence based on the previous words. This phase helps the model grasp the basics of language, including grammar, syntax, and some level of contextual understanding. 
  • Foundation Building: The outcome of this stage is a foundational model that has a general understanding of language. It can generate text and understand some context but lacks specialization or fine-tuning for specific tasks or preferences. 
  1. Ranking model’s outputs based on human feedback

  • Generation and Evaluation: Once pretraining is complete, the model starts generating text outputs, which are then evaluated by humans. This could involve tasks like completing sentences, answering questions, or engaging in dialogue. 
  • Scoring System: Human evaluators use a scoring system to rate each output. They consider factors like how relevant, coherent, or engaging the text is. This feedback is crucial as it introduces the model to human preferences and standards. 
  • Adjustment for Bias and Diversity: Care is taken to ensure the diversity of evaluators and mitigate biases in feedback. This helps in creating a balanced and fair assessment criterion for the model’s outputs. 


Here’s your guide to understanding LLMs


  1. Training a reward model to mimic human ratings

  • Modeling Human Judgment: The scores and feedback from human evaluators are then used to train a separate model, known as the reward model. This model aims to understand and predict the scores human evaluators would give to any piece of text generated by the language model. 
  • Feedback Loop: The reward model effectively creates a feedback loop. It learns to distinguish between high-quality and low-quality outputs based on human ratings, encapsulating the criteria humans use to judge the text. 
  • Iteration for Improvement: This step might involve several iterations of feedback collection and reward model adjustment to accurately capture human preferences. 
  1. Finetuning the language model using feedback from the reward model

  • Integration of Feedback: The insights gained from the reward model are used to fine-tune the language model. This involves adjusting the model’s parameters to increase the likelihood of generating text that aligns with the rewarded behaviors. 
  • Reinforcement Learning Techniques: Techniques such as Proximal Policy Optimization (PPO) are employed to methodically adjust the model. The model is encouraged to “explore” different ways of generating text but is “rewarded” more when it produces outputs that are likely to receive higher scores from the reward model. 
  • Continuous Improvement: This fine-tuning process is iterative and can be repeated with new sets of human feedback and reward model adjustments, continuously improving the language model’s alignment with human preferences. 

The iterative process of RLHF allows for continuous improvement of the language model’s outputs. Through repeated cycles of feedback and adjustment, the model refines its approach to generating text, becoming better at producing outputs that meet human standards of quality and relevance.


Using a reward model for finetuning LLMs
Using a reward model for finetuning LLMs – Source: nownextlater.ai


Exploring direct preference optimization (DPO)

Introduction to the concept of DPO as a direct approach

Direct Preference Optimization (DPO) represents a streamlined method for fine-tuning large language models (LLMs) by directly incorporating human preferences into the training process.

This technique simplifies the adaptation of AI systems to better meet user needs, bypassing the complexities associated with constructing and utilizing reward models.

Theoretical foundations of DPO

DPO is predicated on the principle that direct human feedback can effectively guide the development of AI behavior.

By directly using human preferences as a training signal, DPO simplifies the alignment process, framing it as a direct learning task. This method proves to be both efficient and effective, offering advantages over traditional reinforcement learning approaches like RLHF.


Finetuning LLMs using DPO
Finetuning LLMs using DPO – Source: Medium


Steps involved in the DPO process

  1. Training the language model through self-supervision

  • Data Preparation: The model starts with self-supervised learning, where it is exposed to a wide array of text data. This could include everything from books and articles to websites, encompassing a variety of topics, styles, and contexts. 
  • Learning Mechanism: During this phase, the model learns to predict text sequences, essentially filling in blanks or predicting subsequent words based on the preceding context. This method helps the model to grasp the fundamentals of language structure, syntax, and semantics without explicit task-oriented instructions. 
  • Outcome: The result is a baseline language model capable of understanding and generating coherent text, ready for further specialization based on specific human preferences. 
  1. Collecting pairs of examples and obtaining human ratings

  • Generation of Comparative Outputs: The model generates pairs of text outputs, which might vary in tone, style, or content focus. These pairs are then presented to human evaluators in a comparative format, asking which of the two better meets certain criteria such as clarity, relevance, or engagement. 
  • Human Interaction: Evaluators provide their preferences, which are recorded as direct feedback. This step is crucial for capturing nuanced human judgments that might not be apparent from purely quantitative data. 
  • Feedback Incorporation: The preferences gathered from this comparison form the foundational data for the next phase of optimization. This approach ensures that the model’s tuning is directly influenced by human evaluations, making it more aligned with actual user expectations and preferences. 
  1. Training the model using a cross-entropy-based loss function

  • Optimization Technique: Armed with pairs of examples and corresponding human preferences, the model undergoes fine-tuning using a binary cross-entropy loss function. This statistical method compares the model’s output against the preferred outcomes, quantifying how well the model’s predictions match the chosen preferences.


finetuning LLMs


  • Adjustment Process: The model’s parameters are adjusted to minimize the loss function, effectively making the preferred outputs more likely in future generations. This process iteratively improves the model’s alignment with human preferences, refining its ability to generate text that resonates with users. 
  1. Constraining the model to maintain its generativity

  • Balancing Act: While the model is being fine-tuned to align closely with human preferences, it’s vital to ensure that it doesn’t lose its generative diversity. The process involves carefully adjusting the model to incorporate feedback without overfitting to specific examples or restricting its creative capacity. 
  • Ensuring Flexibility: Techniques and safeguards are put in place to ensure the model remains capable of generating a wide range of responses. This includes regular evaluations of the model’s output diversity and implementing mechanisms to prevent the narrowing of its generative abilities. 
  • Outcome: The final model retains its ability to produce varied and innovative text while being significantly more aligned with human preferences, demonstrating an enhanced capability to engage users in a meaningful way. 

DPO eliminates the need for a separate reward model by treating the language model’s adjustment as a direct optimization problem based on human feedback. This simplification reduces the layers of complexity typically involved in model training, making the process more efficient and directly focused on aligning AI outputs with user preferences.

Comparative analysis: RLHF vs. DPO

After exploring both Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), we’re now at a point where we can compare these two key methods used to fine-tune Large Language Models (LLMs). This side-by-side look aims to clarify the differences and help decide which method might be better for certain situations. 

Direct comparison

  • Training Efficiency: RLHF involves several steps, including pre-training, collecting feedback, training a reward model, and then fine-tuning. This process is detailed and requires a lot of computer power and setup time. On the other hand, DPO is simpler and more straightforward because it optimizes the model directly based on what people prefer, often leading to quicker results. 
  • Data Requirements: RLHF uses a variety of feedback, such as scores or written comments, which means it needs a wide range of input to train well. DPO, however, focuses on comparing pairs of options to see which one people like more, making it easier to collect the needed data. 
  • Model Performance: RLHF is very flexible and can be fine-tuned to perform well in complex situations by understanding detailed feedback. DPO is great for making quick adjustments to align with what users want, although it might not handle varied feedback as well as RLHF. 
  • Scalability: RLHF’s detailed process can make it hard to scale up due to its high computer resource needs. DPO’s simpler approach means it can be scaled more easily, which is particularly beneficial for projects with limited resources. 

Pros and cons

  • Advantages of RLHF: Its ability to work with many kinds of feedback gives RLHF an edge in tasks that need detailed customization. This makes it well-suited for projects that require a deep understanding and nuanced adjustments. 
  • Disadvantages of RLHF: The main drawback is its complexity and the need for a reward model, which makes it more demanding in terms of computational resources and setup. Also, the quality and variety of feedback can significantly influence how well the fine-tuning works. 
  • Advantages of DPO: DPO’s more straightforward process means faster adjustments and less demand on computational resources. It integrates human preferences directly, leading to a tight alignment with what users expect. 
  • Disadvantages of DPO: The main issue with DPO is that it might not do as well with tasks needing more nuanced feedback, as it relies on binary choices. Also, gathering a large amount of human-annotated data might be challenging.


Comparing the RLHF and DPO
Comparing the RLHF and DPO – Source: arxiv.org


Scenarios of application

  • Ideal Use Cases for RLHF: RLHF excels in scenarios requiring customized outputs, like developing chatbots or systems that need to understand the context deeply. Its ability to process complex feedback makes it highly effective for these uses. 
  • Ideal Use Cases for DPO: When you need quick AI model adjustments and have limited computational resources, DPO is the way to go. It’s especially useful for tasks like adjusting sentiments in text or decisions that boil down to yes/no choices, where its direct approach to optimization can be fully utilized.
Feature  RLHF  DPO 
Training Efficiency  Multi-step and computationally intensive due to the iterative nature and involvement of a reward model.  More straightforward and computationally efficient by directly using human preferences, often leading to faster convergence. 
Data Requirements  Requires diverse feedback, including numerical ratings and textual annotations, necessitating a comprehensive mix of responses.  Generally relies on pairs of examples with human ratings, simplifying the preference learning process with less complex input. 
Model Performance  Offers adaptability and nuanced influence, potentially leading to superior performance in complex scenarios.  Efficient in quickly aligning model outputs with user preferences but may lack flexibility for varied feedback. 
Scalability  May face scalability challenges due to computational demands but is robust across diverse tasks.  Easier to scale in terms of computational demands, suitable for projects with limited resources. 
Advantages  Flexible handling of diverse feedback types; suitable for detailed output shaping and complex tasks.  Simplified and rapid fine-tuning process; directly incorporates human preferences with fewer computational resources. 
Disadvantages  Complex setup and higher computational costs; quality and diversity of feedback can affect outcomes.  May struggle with complex feedback beyond binary choices; gathering a large amount of annotated data could be challenging. 
Ideal Use Cases  Best for tasks requiring personalized or tailored outputs, such as conversational agents or context-rich content generation.  Well-suited for projects needing quick adjustments and closely aligned with human preferences, like sentiment analysis or binary decision systems. 


Summarizing key insights and applications

As we wrap up our journey through the comparative analysis of Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) for fine-tuning Large Language Models (LLMs), a few key insights stand out.

Both methods offer unique advantages and cater to different needs in the realm of AI development. Here’s a recap and some guidance on choosing the right approach for your project. 

Recap of fundamental takeaways

  • RLHF is a detailed, multi-step process that provides deep customization potential through the use of a reward model. It’s particularly suited for complex tasks where nuanced feedback is crucial. 
  • DPO simplifies the fine-tuning process by directly applying human preferences, offering a quicker and less resource-intensive path to model optimization. 

Choosing the right finetuning method

The decision between RLHF and DPO should be guided by several factors: 

  • Task Complexity: If your project involves complex interactions or requires understanding nuanced human feedback, RLHF might be the better choice. For more straightforward tasks or when quick adjustments are needed, DPO could be more effective. 
  • Available Resources: Consider your computational resources and the availability of human annotators. DPO is generally less demanding in terms of computational power and can be more straightforward in gathering the necessary data. 
  • Desired Control Level: RLHF offers more granular control over the fine-tuning process, while DPO provides a direct route to aligning model outputs with user preferences. Evaluate how much control and precision you need in the fine-tuning process.


Explore a hands-on curriculum that helps you build custom LLM applications!


The future of finetuning LLMs

Looking ahead, the field of LLM fine-tuning is ripe for innovation. We can anticipate advancements that further streamline these processes, reduce computational demands, and enhance the ability to capture and apply complex human feedback.

Additionally, the integration of AI ethics into fine-tuning methods is becoming increasingly important, ensuring that models not only perform well but also operate fairly and without bias. As we continue to push the boundaries of what AI can achieve, the evolution of fine-tuning methods like RLHF and DPO will play a crucial role in making AI more adaptable, efficient, and aligned with human values.

By carefully considering the specific needs of each project and staying informed about advancements in the field, developers can leverage these powerful tools to create AI systems that are not only technologically advanced but also deeply attuned to the complexities of human communication and preferences.

March 22
Data Science Dojo
Ayesha Saleem

Have you ever read a sentence in a book that caught you off guard with its meaning? Maybe it started in one direction and then, suddenly, the meaning changed, making you stumble and re-read it. These are known as garden-path sentences, and they are at the heart of a fascinating study on human cognition—a study that also sheds light on the capabilities of AI, specifically the language model ChatGPT.


Certainly! Here is a comparison table outlining the key aspects of language processing in ChatGPT versus humans based on the study:


Feature ChatGPT Humans
Context Use Utilizes previous context to predict what comes next. Uses prior context and background knowledge to anticipate and integrate new information.
Predictive Capabilities Can predict human memory performance in language-based tasks . Naturally predict and create expectations about upcoming information.
Memory Performance Relatedness ratings by ChatGPT correspond with actual memory performance. Proven correlation between relatedness and memory retention, especially in the presence of fitting context.
Processing Manner Processes information autoregressively, using the preceding context to anticipate future elements . Sequentially processes language, constructing and updating mental models based on predictions.
Error Handling Requires updates in case of discrepancies between predictions and actual information . Creation of breakpoints and new mental models in case of prediction errors.
Cognitive Faculties Lacks an actual memory system, but uses relatedness as a proxy for foreseeing memory retention. Employs cognitive functions to process, comprehend, and remember language-based information.
Language Processing Mimics certain cognitive processes despite not being based on human cognition. Complex interplay of cognitive mechanisms for language comprehension and memory.
Applications Potential to assist in personalized learning and cognitive enhancements, especially in diverse and elderly groups. Continuous learning and cognitive abilities that could benefit from AI-powered enhancement strategies



This comparison table synthesizes the congruencies and distinctions discussed in the research, providing a broad understanding of how ChatGPT and humans process language and the potential for AI-assisted advancements in cognitive performance.

The Intrigue of Garden-Path Sentences

Certainly! Garden-path sentences are a unique and useful tool for linguists and psychologists studying human language processing and memory. These sentences are constructed in a way that initially leads the reader to interpret them incorrectly, often causing confusion or a momentary misunderstanding. The term “garden-path” refers to the idiom “to be led down the garden path,” meaning to be deceived or misled.

Usually, the first part of a garden-path sentence sets up an expectation that is violated by the later part, which forces the reader to go back and reinterpret the sentence structure to make sense of it. This reanalysis process is of great interest to researchers because it reveals how people construct meaning from language, how they deal with syntactic ambiguity, and how comprehension and memory interact.

The classic example given,

“The old man the boat,”

relies on the structural ambiguity of the word “man.”

Initially, “The old man” reads like a noun phrase, leading you to expect a verb to follow.

But as you read “the boat,” confusion arises because “the boat” doesn’t function as a verb.

Here’s where the garden-path effect comes into play:

To make sense of the sentence, you must realize “man” is being used as a verb, meaning to operate or staff, and “the old” functions as the subject. The corrected interpretation is that older individuals are the ones operating the boat.

Other examples of garden-path sentences might include:

  • The horse raced past the barn and fell.” At first read, you might think the sentence is complete after “barn,” making “fell” seem out of place. However, the sentence means the horse that was raced past the barn is the one that fell.
  • The complex houses married and single soldiers and their families.” Initially, “complex” might seem to be an adjective modifying “houses,” but “houses” is in fact a verb, and “the complex” refers to a housing complex.

These sentences demonstrate the cognitive work involved in parsing and understanding language. By examining how people react to and remember such sentences, researchers can gain insights into the psychological processes underlying language comprehension and memory formation

ChatGPT’s Predictive Capability

Garden-path sentences, with their inherent complexity and potential to mislead readers temporarily, have allowed researchers to observe the processes involved in human language comprehension and memory. The study at the core of this discussion aimed to push boundaries further by exploring whether an AI model, specifically ChatGPT, could predict human memory performance concerning these sentences.

The study presented participants with pairs of sentences, where the second sentence was a challenging garden-path sentence, and the first sentence provided context. This context was either fitting, meaning it was supportive and related to the garden-path sentence, making it easier to comprehend, or unfitting, where the context was not supportive and made comprehension more challenging.

ChatGPT, mirroring human cognitive processes to some extent, was used to assess the relatedness of these two sentences and to predict the memorability of the garden-path sentence.

The participants then participated in a memory task to see how well they recalled the garden-path sentences. The correlation between ChatGPT’s predictions and human performance was significant, suggesting that ChatGPT could indeed forecast how well humans would remember sentences based on the context provided.

For instance, if the first sentence was

Jane gave up on the diet,” followed by the garden-path sentence

Eating carrots sticks to your ribs,” the fitting context (“sticks” refers to adhering to a diet plan), makes it easier for both humans and

ChatGPT to make the sentence memorable. On the contrary, an unfitting context like

The weather is changing” would offer no clarity, making the garden-path sentence less memorable due to a lack of relatability.

This reveals the role of context and relatability in language processing and memory. Sentences placed in a fitting context were rated as more memorable and, indeed, better remembered in subsequent tests. This alignment between AI assessments and human memory performance underscores ChatGPT’s predictive capability and the importance of cohesive information in language retention.

Memory Performance in Fitting vs. Unfitting Contexts

In the study under discussion, the experiment involved presenting participants with two types of sentence pairs. Each pair consisted of an initial context-setting sentence (Sentence 1) and a subsequent garden-path sentence (Sentence 2), which is a type of sentence designed to lead the reader to an initial misinterpretation.

In a “fitting” context, the first sentence provided would logically lead into the garden-path sentence, aiding comprehension by setting up the correct framework for interpretation.

For example, if Sentence 1 was “The city has no parks,” and Sentence 2 was “The ducks the children feed are at the lake,” the concept of feed here would fit with the absence of city parks, and the readers can easily understand that “the children feed” is a descriptive action relating to “the ducks.”

Conversely, in an “unfitting” context, the first sentence would not provide a supportive backdrop for the garden-path sentence, making it harder to parse and potentially less memorable.

If Sentence 1 was “John is a skilled carpenter,” and Sentence 2 remained “The ducks the children feed are at the lake,” the relationship between Sentence 1 and Sentence 2 is not clear because carpentry has no apparent connection to feeding ducks or the lake.

Participants in the study were asked to first rate the relatedness of these two sentences on a scale. The study found that participants rated fitting contexts as more related than unfitting ones.

The second part of the task was a surprise memory test where only garden-path sentences were presented, and the participants were required to recall them. It was discovered that the garden-path sentences that had a preceding fitting context were better remembered than those with an unfitting context—this indicated that context plays a critical role in how we process and retain sentences.

ChatGPT, a generative AI system, predicted this outcome. The model also rated garden-path sentences as more memorable when they had a fitting context, similar to human participants, demonstrating its capability to forecast memory performance based on context.

This highlights not only the role of context in human memory but also the potential for AI to predict human cognitive processes.

Stochastic Reasoning: A Potential Cognitive Mechanism

The study in question introduces the notion of stochastic reasoning as a potential cognitive mechanism affecting memory performance. Stochastic reasoning involves a probabilistic approach to understanding the availability of familiar information, also known as retrieval cues, which are instrumental in bolstering memory recall.

The presence of related, coherent information can elevate activation within our cognitive processes, leading to an increased likelihood of recalling that information later on.

Let’s consider an example to elucidate this concept. Imagine you are provided with the following two sentences as part of the study:

“The lawyer argued the case.”
“The evidence was compelling.”

In this case, the two sentences provide a fitting context where the first sentence creates a foundation of understanding related to legal scenarios and the second sentence builds upon that context by introducing “compelling evidence,” which is a familiar concept within the realm of law.

This clear and potent relation between the two sentences forms strong retrieval cues that enhance memory performance, as your brain more easily links “compelling evidence” with “lawyer argued the case,” which aids in later recollection.

Alternatively, if the second sentence was entirely unrelated, such as “The roses in the garden are in full bloom,” the lack of a fitting context would mean weak or absent retrieval cues. As the information related to law does not connect well with the concept of blooming roses, this results in less effective memory performance due to the disjointed nature of the information being processed.

The study found that when sentences are placed within a fitting context that aligns well with our existing knowledge and background, the relationship between the sentences is clear, thus providing stronger cues that streamline the retrieval process and lead to better retention and recall of information.

This reflects the significance of stochastic reasoning and the role of familiarity and coherence in enhancing memory performance.

ChatGPT vs. Human Language Processing

The paragraph delves into the intriguing observation that ChatGPT, a language model developed by OpenAI, and humans share a commonality in how they process language despite the underlying differences in their “operating systems” or cognitive architectures 1. Both seem to rely significantly on the surrounding context to comprehend incoming information and to integrate it coherently with the preceding context.

To illustrate, consider the following example of a garden-path sentence: “The old man the boat.” This sentence is confusing at first because “man” is often used as a verb, and the reader initially interprets “the old man” as a noun phrase.

The confusion is cleared up when provided with a fitting context, such as “elderly people are in control.” Now, the phrase makes sense—’man’ is understood as a verb meaning ‘to staff,’ and the garden-path sentence is interpreted correctly to mean that elderly people are the ones operating the boat.

However, if the preceding sentence was unrelated, such as “The birds flew to the south,” there is no helpful context to parse “The old man the boat” correctly, and it remains confusing, illustrating an unfitting context. This unfitness affects the recall of the garden-path sentence in the memory task, as it lacks clear, coherent links to preexisting knowledge or context that facilitate understanding and later recall.

The study’s findings depicted that when humans assess two sentences as being more related, which is naturally higher in fitting contexts than in unfitting ones, the memory performance for the ambiguous (garden-path) sentence also improves.

In a compelling parallel, ChatGPT generated similar assessments when given the same sentences, assigning higher relatedness values to fitting contexts over unfitting ones. This correlation suggests a similarity in how ChatGPT and humans use context to parse and remember new information.

Furthermore, the relatedness ratings were not just abstract assessments but tied directly to the actual memorability of the sentences. As with humans, ChatGPT’s predictions of memorability were also higher for sentences in fitting contexts, a phenomenon that may stem from its sophisticated language processing capabilities that crudely mimic cognitive processes involved in human memory.

This similarity in the use of context and its impact on memory retention is remarkable, considering the different mechanisms through which humans and machine learning models operate.

Broader Implications and the Future

The paragraph outlines the wider ramifications of the research findings on the predictive capabilities of generative AI like ChatGPT regarding human memory performance in language tasks. The research suggests that these AI models could have practical applications in several domains, including:


AI could be used to tailor learning experiences for students with diverse cognitive needs. By understanding how different students retain information, AI applications could guide educators in adjusting teaching materials, pace, and instructional approaches to cater to individual learning styles and abilities.

For example, if a student is struggling with remembering historical dates, the AI might suggest teaching methods or materials that align with their learning patterns to improve retention.


The study indicates that older adults often face a cognitive slowdown, which could lead to more frequent memory problems. AI, once trained on data taking into account individual cognitive differences, could aid in developing personalized cognitive training and therapy plans aimed at enhancing mental functions in the elderly.

For instance, a cognitive enhancement program might be customized for an older adult who has difficulty recalling names or recent events by using strategies found effective through AI analysis.

Impact of AI on human cognition

The implications here go beyond just predicting human behavior; they extend to potentially improving cognitive processes through the intervention of AI.

These potential applications represent a synergistic relationship between AI and human cognitive research, where the insights gained from one field can materially benefit the other.

Furthermore, adaptive AI systems could continually learn and improve their predictions and recommendations based on new data, thereby creating a dynamic and responsive tool for cognitive enhancement and education.

March 21
Moneebah Noman

This is the second blog in the series of RAG and finetuning, highlighting a detailed comparison of the two approaches.


You can read the first blog of the series here – A guide to understanding RAG and finetuning


While we provided a detailed guideline on understanding RAG and finetuning, a comparative analysis of the two provides a deeper insight. Let’s explore and address the RAG vs finetuning debate to determine the best tool to optimize LLM performance.


RAG vs finetuning LLM – A detailed comparison of the techniques

It’s crucial to grasp that these methodologies while targeting the enhancement of large language models (LLMs), operate under distinct paradigms. Recognizing their strengths and limitations is essential for effectively leveraging them in various AI applications.

This understanding allows developers and researchers to make informed decisions about which technique to employ based on the specific needs of their projects. Whether it’s adapting to dynamic information, customizing linguistic styles, managing data requirements, or ensuring domain-specific performance, each approach has its unique advantages.

By comprehensively understanding these differences, you’ll be equipped to choose the most suitable method—or a blend of both—to achieve your objectives in developing sophisticated, responsive, and accurate AI models.


Summarizing the RAG vs finetuning comparison
Summarizing the RAG vs finetuning comparison


Team RAG or team Fine-Tuning? Tune in to this podcast now to find out their specific benefits, trade-offs, use-cases, enterprise adoption, and more!

Adaptability to dynamic information

RAG shines in environments where information is constantly updated. By design, RAG leverages external data sources to fetch the latest information, making it inherently adaptable to changes.

This quality ensures that responses generated by RAG-powered models remain accurate and relevant, a crucial advantage for applications like real-time news summarization or updating factual content.

Fine-tuning, in contrast, optimizes a model’s performance for specific tasks through targeted training on a curated dataset.

While it significantly enhances the model’s expertise in the chosen domain, its adaptability to new or evolving information is constrained. The model’s knowledge remains as current as its last training session, necessitating regular updates to maintain accuracy in rapidly changing fields.


Customization and linguistic style

RAG‘s primary focus is on enriching responses with accurate, up-to-date information retrieved from external databases.

This process, though excellent for fact-based accuracy, means RAG models might not tailor their linguistic style as closely to specific user preferences or nuanced domain-specific terminologies without integrating additional customization techniques.

Fine-tuning excels in personalizing the model to a high degree, allowing it to mimic specific linguistic styles, adhere to unique domain terminologies, and align with particular content tones.

This is achieved by training the model on a dataset meticulously prepared to reflect the desired characteristics, enabling the fine-tuned model to produce outputs that closely match the specified requirements.


Large language model bootcamp

Data efficiency and requirements

RAG operates by leveraging external datasets for retrieval, thus requiring a sophisticated setup to manage and query these vast data repositories efficiently.

The model’s effectiveness is directly tied to the quality and breadth of its connected databases, demanding rigorous data management but not necessarily a large volume of labeled training data.

Fine-tuning, however, depends on a substantial, well-curated dataset specific to the task at hand.

It requires less external data infrastructure compared to RAG but relies heavily on the availability of high-quality, domain-specific training data. This makes fine-tuning particularly effective in scenarios where detailed, task-specific performance is paramount and suitable training data is accessible.


Efficiency and scalability

RAG is generally considered cost-effective and efficient for a wide range of applications, particularly because it can dynamically access and utilize information from external sources without the need for continuous retraining.

This efficiency makes RAG a scalable solution for applications requiring access to the latest information or coverage across diverse topics.

Fine-tuning demands a significant investment in time and resources for the initial training phase, especially in preparing the domain-specific dataset and computational costs.

However, once fine-tuned, the model can operate with high efficiency within its specialized domain. The scalability of fine-tuning is more nuanced, as extending the model’s expertise to new domains requires additional rounds of fine-tuning with respective datasets.


Explore further how to tune LLMs for optimal performance


Domain-specific performance

RAG demonstrates exceptional versatility in handling queries across a wide range of domains by fetching relevant information from its external databases.

Its performance is notably robust in scenarios where access to wide-ranging or continuously updated information is critical for generating accurate responses.

Fine-tuning is the go-to approach for achieving unparalleled depth and precision within a specific domain.

By intensively training the model on targeted datasets, fine-tuning ensures the model’s outputs are not only accurate but deeply aligned with the domain’s subtleties, making it ideal for specialized applications requiring high expertise.


Hybrid approach: Enhancing LLMs with RAG and finetuning

The concept of a hybrid model that integrates Retrieval-Augmented Generation (RAG) with fine-tuning presents an interesting advancement. This approach allows for the contextual enrichment of LLM responses with up-to-date information while ensuring that outputs are tailored to the nuanced requirements of specific tasks.

Such a model can operate flexibly, serving as either a versatile, all-encompassing system or as an ensemble of specialized models, each optimized for particular use cases.

In practical applications, this could range from customer service chatbots that pull the latest policy details to enrich responses and then tailor these responses to individual user queries, to medical research assistants that retrieve the latest clinical data for accurate information dissemination, adjusted for layman understanding.

The hybrid model thus promises not only improved accuracy by grounding responses in factual, relevant data but also ensures that these responses are closely aligned with specific domain languages and terminologies.

However, this integration introduces complexities in model management, potentially higher computational demands, and the need for effective data strategies to harness the full benefits of both RAG and fine-tuning.

Despite these challenges, the hybrid approach marks a significant step forward in AI, offering models that combine broad knowledge access with deep domain expertise, paving the way for more sophisticated and adaptable AI solutions.


Choosing the best approach: Finetuning, RAG, or hybrid

Choosing between fine-tuning, Retrieval-Augmented Generation (RAG), or a hybrid approach for enhancing a Large Language Model should consider specific project needs, data accessibility,  and the desired outcome alongside computational resources and scalability.

Fine-tuning is best when you have extensive domain-specific data and seek to tailor the LLM’s outputs closely to specific requirements, making it a perfect fit for projects like creating specialized educational content that adapts to curriculum changes. RAG, with its dynamic retrieval capability, suits scenarios where responses must be informed by the latest information, ideal for financial analysis tools that rely on current market data.

A hybrid approach merges these advantages, offering the specificity of fine-tuning with the contextual awareness of RAG, suitable for enterprises needing to keep pace with rapid information changes while maintaining deep domain relevance. As technology evolves, a hybrid model might offer the flexibility to adapt, providing a comprehensive solution that encompasses the strengths of both fine-tuning and RAG.


Evolution and future directions

As the landscape of artificial intelligence continues to evolve, so too do the methodologies and technologies at its core. Among these, Retrieval-Augmented Generation (RAG) and fine-tuning are experiencing significant advancements, propelling them toward new horizons of AI capabilities.


Advanced enhancements in RAG

Enhancing the retrieval-augmented generation pipeline

RAG has undergone significant transformations and advancements in each step of its pipeline. Each research paper on RAG introduces advanced methods to boost accuracy and relevance at every stage.

Let’s use the same query example from the basic RAG explanation: “What’s the latest breakthrough in renewable energy?”, to better understand these advanced techniques.

  • Pre-retrieval optimizations: Before the system begins to search, it optimizes the query for better outcomes. For our example, Query Transformations and Routing might break down the query into sub-queries like “latest renewable energy breakthroughs” and “new technology in renewable energy.” This ensures the search mechanism is fine-tuned to retrieve the most accurate and relevant information.


  • Enhanced retrieval techniques: During the retrieval phase, Hybrid Search combines keyword and semantic searches, ensuring a comprehensive scan for information related to our query. Moreover, by Chunking and Vectorization, the system breaks down extensive documents into digestible pieces, which are then vectorized. This means our query doesn’t just pull up general information but seeks out the precise segments of texts discussing recent innovations in renewable energy.


  • Post-retrieval refinements: After retrieval, Reranking and Filtering processes evaluate the gathered information chunks. Instead of simply using the top ‘k’ matches, these techniques rigorously assess the relevance of each piece of retrieved data. For our query, this could mean prioritizing a segment discussing a groundbreaking solar panel efficiency breakthrough over a more generic update on solar energy. This step ensures that the information used in generating the response directly answers the query with the most relevant and recent breakthroughs in renewable energy.


Through these advanced RAG enhancements, the system not only finds and utilizes information more effectively but also ensures that the final response to the query about renewable energy breakthroughs is as accurate, relevant, and up-to-date as possible.

Towards multimodal integration

RAG, traditionally focused on enhancing text-based language models by incorporating external data, is now also expanding its horizons towards a multimodal future.

Multimodal RAG integrates various types of data, such as images, audio, and video, alongside text, allowing AI models to generate responses that are not only informed by a vast array of textual information but also enriched by visual and auditory contexts.

This evolution signifies a move towards AI systems capable of understanding and interacting with the world more holistically, mimicking human-like comprehension across different sensory inputs.


Here’s your fundamental introduction to RAG


Advanced enhancements in finetuning

Parameter efficiency and LoRA

In parallel, fine-tuning is transforming more parameter-efficient methods. Fine-tuning large language models (LLMs) presents a unique challenge for AI practitioners aiming to adapt these models to specific tasks without the overwhelming computational costs typically involved.

One such innovative technique is Parameter-Efficient Fine-Tuning (PEFT), which offers a cost-effective and efficient method for fine-tuning such a model.

Techniques like Low-Rank Adaptation (LoRA) are at the forefront of this change, enabling fine-tuning to be accomplished with significantly less computational overhead. LoRA and similar approaches adjust only a small subset of the model’s parameters, making fine-tuning not only more accessible but also more sustainable.

Specifically, it introduces a low-dimensional matrix that captures the essence of the downstream task, allowing for fine-tuning with minimal adjustments to the original model’s weights.

This method exemplifies how cutting-edge research is making it feasible to tailor LLMs for specialized applications without the prohibitive computational cost typically associated.


The emergence of long-context LLMs


The evolution toward long context LLMs
The evolution toward long context LLMs – Source: Google Blog


As we embrace these advancements in RAG and fine-tuning, the recent introduction of Long Context LLMs, like Gemini 1.5 Pro, poses an intriguing question about the future necessity of these technologies. Gemini 1.5 Pro, for instance, showcases a remarkable capability with its 1 million token context window, setting a new standard for AI’s ability to process and utilize extensive amounts of information in one go.

The big deal here is how this changes the game for technologies like RAG and advanced fine-tuning. RAG was a breakthrough because it helped AI models to look beyond their training, fetching information from outside when needed, to answer questions more accurately. But now, with Long Context LLMs’ ability to hold so much information in memory, the question arises: Do we still need RAG anymore?


Explore a hands-on curriculum that helps you build custom LLM applications!


This doesn’t mean RAG and fine-tuning are becoming obsolete. Instead, it hints at an exciting future where AI can be both deeply knowledgeable, thanks to its vast memory, and incredibly adaptable, using technologies like RAG to fill in any gaps with the most current information.

In essence, Long Context LLMs could make AI more powerful by ensuring it has a broad base of knowledge to draw from, while RAG and fine-tuning techniques ensure that the AI remains up-to-date and precise in its answers. So the emergence of Long Context LLMs like Gemini 1.5 Pro does not diminish the value of RAG and fine-tuning but rather complements it.



Concluding Thoughts

The trajectory of AI, through the advancements in RAG, fine-tuning, and the emergence of long-context LLMs, reveals a future rich with potential. As these technologies mature, their combined interaction will make systems more adaptable, efficient, and capable of understanding and interacting with the world in ways that are increasingly nuanced and human-like.

The evolution of AI is not just a testament to technological advancement but a reflection of our continuous quest to create machines that can truly understand, learn from, and respond to the complex landscape of human knowledge and experience.

March 20
Huda Mahmood - Author
Huda Mahmood

Vector embeddings have revolutionized the representation and processing of data for generative AI applications. The versatility of embedding tools has produced enhanced data analytics for its use cases.

In this blog, we will explore Google’s recent development of specialized embedding tools that particularly focus on promoting research in the fields of dermatology and pathology.

Let’s start our exploration with an overview of vector embedding tools.

What are vector embedding tools?

Vector embeddings are a specific embedding tool that uses vectors for data representation. While the direction of a vector determines its relationship with other data points in space, the length of a vector signifies the importance of the data point it represents.

A vector embedding tool processes input data by analyzing it and identifying key features of interest. The tool then assigns a unique vector to any data point based on its features. These are a powerful tool for the representation of complex datasets, allowing more efficient and faster data processing.


Large language model bootcamp


General embedding tools process a wide variety of data, capturing general features without focusing on specialized fields of interest. On the contrary, there are specialized embedding tools that enable focused and targeted data handling within a specific field of interest.

Specialized embedding tools are particularly useful in fields like finance and healthcare where unique datasets form the basis of information. Google has shared two specialized vector embedding tools, dealing with the demands of healthcare data processing.

However, before we delve into the details of these tools, it is important to understand their need in the field of medicine.

Why does healthcare need specialized embedding tools?

Embeddings are an important tool that enables ML engineers to develop apps that can handle multimodal data efficiently. These AI-powered applications using vector embeddings encompass various industries. While they deal with a diverse range of uses, some use cases require differentiated data-processing systems.

Healthcare is one such type of industry where specialized embedding tools can be useful for the efficient processing of data. Let’s explore major reasons for such differentiated use of embedding tools.


Explore the role of vector embeddings in generative AI


Domain-specific features

Medical data, ranging from patient history to imaging results, are crucial for diagnosis. These data sources, particularly from the field of dermatology and pathology, provide important information to medical personnel.

The slight variation of information in these sources requires specialized knowledge for the identification of relevant information patterns and changes. While regular embedding tools might fail at identifying the variations between normal and abnormal information, specialized tools can be created with proper training and contextual knowledge.

Data scarcity

While data is abundant in different fields and industries, healthcare information is often scarce. Hence, specialized embedding tools are needed to train on the small datasets with focused learning of relevant features, leading to enhanced performance in the field.

Focused and efficient data processing

The AI model must be trained to interpret particular features of interest from a typical medical image. This demands specialized tools that can focus on relevant aspects of a particular disease, assisting doctors in making accurate diagnoses for their patients.

In essence, specialized embedding tools bridge the gap between the vast amount of information within medical images and the need for accurate, interpretable diagnoses specific to each field in healthcare.

A look into Google’s embedding tools for healthcare research

The health-specific embedding tools by Google are focused on enhancing medical image analysis, particularly within the field of dermatology and pathology. This is a step towards addressing the challenge of developing ML models for medical imaging.

The two embedding tools – Derm Foundation and Path Foundation – are available for research use to explore their impact on the field of medicine and study their role in improving medical image analysis. Let’s take a look at their specific uses in the medical world.

Derm Foundation: A step towards redefining dermatology

It is a specialized embedding tool designed by Google, particularly for the field of dermatology within the world of medicine. It specifically focuses on generating embeddings from skin images, capturing the critical skin features that are relevant to diagnosing a skin condition.

The pre-training process of this specialized embedding tool consists of learning from a library of labeled skin images with detailed descriptions, such as diagnoses and clinical notes. The tool learns to identify relevant features for skin condition classification from the provided information, using it on future data to highlight similar features.


Derm Foundation outperforms BiT-M (a standard pre-trained image model)
Derm Foundation outperforms BiT-M (a standard pre-trained image model) – Source: Google Research Blog


Some common features of interest for derm foundation when analyzing a typical skin image include:

  • Skin color variation: to identify any abnormal pigmentation or discoloration of the skin
  • Textural analysis: to identify and differentiate between smooth, rough, or scaly textures, indicative of different skin conditions
  • Pattern recognition: to highlight any moles, rashes, or lesions that can connect to potential abnormalities

Potential use cases of the Derm Foundation

Based on the pre-training dataset and focus on analyzing skin-specific features, Derm Foundation embeddings have the potential to redefine the data-processing and diagnosing practices for dermatology. Researchers can use this tool to develop efficient ML models. Some leading potential use cases for these models include:

Early detection of skin cancer

Efficient identification of skin patterns and textures from images can enable dermatologists to timely detect skin cancer in patients. Early detection can lead to better treatments and outcomes overall.

Improved classification of skin diseases

Each skin condition, such as dermatitis, eczema, and psoriasis, shows up differently on a medical image. A specialized embedding tool empowers the models to efficiently detect and differentiate between different skin conditions, leading to accurate diagnoses and treatment plans.

Hence, the Derm Foundation offers enhanced accuracy in dermatological diagnoses, faster deployment of models due to the use of pre-trained embeddings, and focused analysis by dealing with relevant features. It is a step towards a more accurate and efficient diagnosis of skin conditions, ultimately improving patient care.


Here’s your guide to choosing the right vector embedding model for your generative AI use case


Path Foundation: Revamping the world of pathology in medical sciences

While the Derm Foundation was specialized to study and analyze skin images, the Path Foundation embedding is designed to focus on images from pathology.


An outlook of SSL training used by Path Foundation
An outlook of SSL training used by Path Foundation – Source: Google Research Blog


It analyzes the visual data of tissue samples, focusing on critical features that can include:

  • Cellular structures: focusing on cell size, shape, or arrangement to identify any possible diseases
  • Tumor classification: differentiating between different types of tumors or assessing their aggressiveness

The pre-training process of the Path Foundation embedding comprises of labeled pathology images along with detailed descriptions and diagnoses relevant to them.


Learn to build LLM applications


Potential use cases of the Path Foundation

Using the training dataset empowers the specialized embedding tool for efficient diagnoses in pathology. Some potential use cases within the field for this embedding tool include:

Improved cancer diagnosis

Improved analysis of pathology images can lead to timely detection of cancerous tissues. It will lead to earlier diagnoses and better patient outcomes.

Better pathology workflows

Analysis of pathology images is a time-consuming process that can be expedited with the use of an embedding tool. It will allow doctors to spend more time on complex cases while maintaining an improved workflow for their pathology diagnoses.

Thus, Path Foundation promises the development of pathology processes, supporting medical personnel in improved diagnoses and other medical processes.

Transforming healthcare with vector embedding tools

The use of embedding tools like Derm Foundation and Path Foundation has the potential to redefine data handling for medical processes. Specialized focus on relevant features offers enhanced diagnostic accuracy with efficient processes and workflows.

Moreover, the development of specialized ML models will address data scarcity often faced within healthcare when developing such solutions. It will also promote faster development of useful models and AI-powered solutions.

While the solutions will empower doctors to make faster and more accurate diagnoses, they will also personalize medicine for patients. Hence, embedding tools have the potential to significantly improve healthcare processes and treatments in the days to come.

March 19
Moneebah Noman

This is the first blog in the series of RAG and finetuning, focusing on providing a better understanding of the two approaches.

RAG and finetuning: You’ve likely seen these terms tossed around on social media, hailed as the next big leap in artificial intelligence. But what do they really mean, and why are they so crucial in the evolution of AI? 

To truly understand their significance, it’s essential to recognize the practical challenges faced by current language models, such as ChatGPT, renowned for their ability to mimic human-like text across essays, dialogues, and even poetry.

Yet, despite these impressive capabilities, their limitations became more apparent when tasked with providing up-to-date information on global events or expert knowledge in specialized fields.

Take, for instance, the FIFA World Cup.


Fifa World Cup Winner-Messi
Messi’s winning shot at the Fifa World Cup – Source: Economic Times


If you were to ask ChatGPT, “Who won the FIFA World Cup?” expecting details on the most recent tournament, you might receive an outdated response citing France as the champions despite Argentina’s triumphant victory in Qatar 2022.


ChatGPT's response to an inquiry of the winner of FIFA World Cup 2022
ChatGPT’s response to an inquiry about the winner of the FIFA World Cup 2022


Moreover, the limitations of AI models extend beyond current events to specialized knowledge domains. Try asking ChatGPT for treatments in neurodegenerative diseases, a highly specialized medical field. The model might offer generic advice based on its training data but lacks depth or specificity – and, most importantly, accuracy.


Symptoms of Parkinson's disease
Symptoms of Parkinson’s disease – Source: Neuro2go


GPT's response to inquiry about Parkinson's disease
GPT’s response to inquiry about Parkinson’s disease


These scenarios precisely illustrate the problem: a language model might generate text relevant to a past context or data but falls short when current or specialized knowledge is required.


Revisit the best large language models of 2023


Enter RAG and finetuning

RAG revolutionizes the way language models access and use information. Incorporating a retrieval step allows these models to pull in data from external sources in real-time. This means that when you ask a RAG-powered model a question, it doesn’t just rely on what it learned during training; instead, it can consult a vast, constantly updated external database to provide an accurate and relevant answer. This would bridge the gap highlighted by the FIFA World Cup example.

On the other hand, fine-tuning offers a way to specialize a general AI model for specific tasks or knowledge domains. Additional training on a focused dataset sharpens the model’s expertise in a particular area, enabling it to perform with greater precision and understanding.

This process transforms a jack-of-all-trades into a master of one, equipping it with the nuanced understanding required for tasks where generic responses just won’t cut it. This would allow it to perform as a seasoned medical specialist dissecting a complex case rather than a chatbot giving general guidelines to follow.


Curious about the LLM context augmentation approaches like RAG and fine-tuning and their benefits, trade-offs and use-cases? Tune in to this podcast with Co-founder and CEO of LlamaIndex now!

This blog will walk you through RAG and finetuning, unraveling how they work, why they matter, and how they’re applied to solve real-world problems. By the end, you’ll not only grasp the technical nuances of these methodologies but also appreciate their potential to transform AI systems, making them more dynamic, accurate, and context-aware.


Large language model bootcamp


Understanding the RAG LLM duo

What is RAG?

Retrieval-augmented generation (RAG) significantly enhances how AI language models respond by incorporating a wealth of updated and external information into their answers. It could be considered a model consulting an extensive digital library for information as needed.

Its essence is in the name:  Retrieval, Augmentation, and Generation.


The process starts when a user asks a query, and the model needs to find information beyond its training data. It searches through a vast database that is loaded with the latest information, looking for data related to the user’s query.


Next, the information retrieved is combined, or ‘augmented,’ with the original query. This enriched input provides a broader context, helping the model understand the query in greater depth.


Finally, the language model generates a response based on the augmented prompt. This response is informed by the model’s training and the newly retrieved information, ensuring accuracy and relevance.


Why use RAG?

Retrieval-augmented generation (RAG) brings an approach to natural language processing that’s both smart and efficient. It solved many problems faced by current LLMs, and that’s why it’s the most talked about technique in the NLP space.

Always up-to-date

RAG keeps answers fresh by accessing the latest information. RAG ensures the AI’s responses are current and correct in fields where facts and data change rapidly.

Sticks to the facts

Unlike other models that might guess or make up details (a ” hallucinations ” problem), RAG checks facts by referencing real data. This makes it reliable, giving you answers based on actual information.

Flexible and versatile

RAG is adaptable, working well across various settings, from chatbots to educational tools and more. It meets the need for accurate, context-aware responses in a wide range of uses, and that’s why it’s rapidly being adapted in all domains.


Explore the power of the RAG LLM duo for enhanced performance


Exploring the RAG pipeline

To understand RAG further, consider when you interact with an AI model by asking a question like “What’s the latest breakthrough in renewable energy?”. This is when the RAG system springs into action. Let’s walk through the actual process.


A visual representation of a RAG pipeline
A visual representation of an RAG pipeline


Query initiation and vectorization

  • Your query starts as a simple string of text. However, computers, particularly AI models, don’t understand text and its underlying meanings the same way humans do. To bridge this gap, the RAG system converts your question into an embedding, also known as a vector.
  • Why a vector, you might ask? Well, A vector is essentially a numerical representation of your query, capturing not just the words but the meaning behind them. This allows the system to search for answers based on concepts and ideas, not just matching keywords.


Searching the vector database

  • With your query now in vector form, the RAG system seeks answers in an up-to-date vector database. The system looks for the vectors in this database that are closest to your query’s vector—the semantically similar ones, meaning they share the same underlying concepts or topics.


  • But what exactly is a vector database? 
    • Vector databases defined: A vector database stores vast amounts of information from diverse sources, such as the latest research papers, news articles, and scientific discoveries. However, it doesn’t store this information in traditional formats (like tables or text documents). Instead, each piece of data is converted into a vector during the ingestion process.
    • Why vectors?: This conversion to vectors allows the database to represent the data’s meaning and context numerically or into a language the computer can understand and comprehend deeply, beyond surface-level keywords.
    • Indexing: Once information is vectorized, it’s indexed within the database. Indexing organizes the data for rapid retrieval, much like an index in a textbook, enabling you to find the information you need quickly. This process ensures that the system can efficiently locate the most relevant information vectors when it searches for matches to your query vector.


  • The key here is that this information is external and not originally part of the language model’s training data, enabling the AI to access and provide answers based on the latest knowledge.


Selecting the top ‘k’ responses

  • From this search, the system selects the top few matches—let’s say the top 5. These matches are essentially pieces of information that best align with the essence of your question.
  • By concentrating on the top matches, the RAG system ensures that the augmentation enriches your query with the most relevant and informative content, avoiding information overload and maintaining the response’s relevance and clarity.


Augmenting the query

  • Next, the information from these top matches is used to augment the original query you asked the LLM. This doesn’t mean the system simply piles on data. Instead, it integrates key insights from these top matches to enrich the context for generating a response. This step is crucial because it ensures the model has a broader, more informed base from which to draw when crafting its answer.


Generating the response

  • Now comes the final step: generating a response. With the augmented query, the model is ready to reply. It doesn’t just output the retrieved information verbatim. Instead, it synthesizes the enriched data into a coherent, natural-language answer. For your renewable energy question, the model might generate a summary highlighting the most recent and impactful breakthrough, perhaps detailing a new solar panel technology that significantly increases power output. This answer is informative, up-to-date, and directly relevant to your query.


Learn to build LLM applications


Understanding fine-tuning

What is fine-tuning?

Fine-tuning could be likened to sculpting, where a model is precisely refined, like shaping marble into a distinct figure. Initially, a model is broadly trained on a diverse dataset to understand general patterns—this is known as pre-training. Think of pre-training as laying a foundation; it equips the model with a wide range of knowledge.

Fine-tuning, then, adjusts this pre-trained model and its weights to excel in a particular task by training it further on a more focused dataset related to that specific task. From training on vast text corpora, pre-trained LLMs, such as GPT or BERT, have a broad understanding of language.

Fine-tuning adjusts these models to excel in targeted applications, from sentiment analysis to specialized conversational agents.


Why fine-tune?

The breadth of knowledge LLMs acquire through initial training is impressive but often lacks the depth or specificity required for certain tasks. Fine-tuning addresses this by adapting the model to the nuances of a specific domain or function, enhancing its performance significantly on that task without the need to train a new model from scratch.


The fine-tuning process

Fine-tuning involves several key steps, each critical to customizing the model effectively. The process aims to methodically train the model, guiding its weights toward the ideal configuration for executing a specific task with precision.


A look at the finetuning process
A look at the finetuning process


Selecting a task

Identify the specific task you wish your model to perform better on. The task could range from classifying emails into spam or not spam to generating medical reports from patient notes.


Choosing the right pre-trained model

The foundation of fine-tuning begins with selecting an appropriate pre-trained large language model (LLM) such as GPT or BERT. These models have been extensively trained on large, diverse datasets, giving them a broad understanding of language patterns and general knowledge.

The choice of model is critical because its pre-trained knowledge forms the basis for the subsequent fine-tuning process. For tasks requiring specialized knowledge, like medical diagnostics or legal analysis, choose a model known for its depth and breadth of language comprehension.


Preparing the specialized dataset

For fine-tuning to be effective, the dataset must be closely aligned with the specific task or domain of interest. This dataset should consist of examples representative of the problem you aim to solve. For a medical LLM, this would mean assembling a dataset comprised of medical journals, patient notes, or other relevant medical texts.

The key here is to provide the model with various examples it can learn from. This data must represent the types of inputs and desired outputs you expect once the model is deployed.


Reprocess the data

Before your LLM can start learning from this task-specific data, the data must be processed into a format the model understands. This could involve tokenizing the text, converting categorical labels into numerical format, and normalizing or scaling input features.

At this stage, data quality is crucial; thus, you’ll look out for inconsistencies, duplicates, and outliers, which can skew the learning process, and fix them to ensure cleaner, more reliable data.

After preparing this dataset, you divide it into training, validation, and test sets. This strategic division ensures that your model learns from the training set, tweaks its performance based on the validation set, and is ultimately assessed for its ability to generalize from the test set.


Read more about Finetuning LLMs


Adapting the model for the specific task

Once the pre-trained model and dataset are ready, you must better tailor the model to suit your specific task. An LLM comprises multiple neural network layers, each learning different aspects of the data.

During fine-tuning, not every layer is tweaked—some represent foundational knowledge that applies broadly. In contrast, the top or later layers are more plastic and customized to align with the specific nuances of the task. The architecture requires two key adjustments:

  • Layer freezing: To preserve the general knowledge the model has gained during pre-training, freeze most of its layers, especially the lower ones closer to the input. This ensures the model retains its broad understanding while you fine-tune the upper layers to be more adaptable to the new task.
  • Output layer modification: Replace the model’s original output layer with a new one tailored to the number of categories or outputs your task requires. This involves configuring the output layer to classify various medical conditions accurately for a medical diagnostic task.


Fine-tuning hyperparameters

With the model’s architecture now adjusted, we turn your attention to hyperparameters. Hyperparameters are the settings and configurations that are crucial for controlling the training process. They are not learned from the data but are set before training begins and significantly impact model performance. Key hyperparameters in fine-tuning include:

  • Learning rate: Perhaps the most critical hyperparameter in fine-tuning. A lower learning rate ensures that the model’s weights are adjusted gradually, preventing it from “forgetting” its pre-trained knowledge.
  • Batch size:  The number of training examples used in one iteration. It affects the model’s learning speed and memory usage.
  • Epochs: The number of times the entire dataset is passed through the model. Enough epochs are necessary for learning, but too many can lead to overfitting.


Training process

With the dataset prepared, the model was adapted, and the hyperparameters were set, so the model is now ready to be fine-tuned.

The training process involves repeatedly passing your specialized dataset through the model, allowing it to learn from the task-specific examples, it involves adjusting the model’s internal parameters, the weights, and biases of those fine-tuned layers so the output predictions get as close to the desired outcomes as possible.

This is done in iterations (epochs), and thanks to the pre-trained nature of the model, it requires fewer epochs than training from scratch.  Here is what happens in each iteration:

  • Forward pass: The model processes the input data, making predictions based on its current state.
  • Loss calculation: The difference between the model’s predictions and the actual desired outputs (labels) is calculated using a loss function. This function quantifies how well the model is performing.
  • Backward pass (Backpropagation): The gradients of the loss for each parameter (weight) in the model are computed. This indicates how the changes being made to the weights are affecting the loss. 
  • Update weights: Apply an optimization algorithm to update the model’s weights, focusing on those in unfrozen layers. This step is where the model learns from the task-specific data, refining its predictions to become more accurate.

A tight feedback loop where you incessantly monitor the model’s validation performance guides you in preventing overfitting and determining when the model has learned enough. It gives you an indication of when to stop the training.


Evaluation and iteration

After fine-tuning, assess the model’s performance on a separate validation dataset. This helps gauge how well the model generalizes to new data. You do this by running the model against the test set—data it hadn’t seen during training.

Here, you look at metrics appropriate to the task, like BLEU and ROUGE for translation or summarization, or even qualitative evaluations by human judges, ensuring the model is ready for real-life application and isn’t just regurgitating memorized examples.

If the model’s performance is not up to par, you may need to revisit the hyperparameters, adjust the training data, or further tweak the model’s architecture.


For medical LLM applications, it is this entire process that enables the model to grasp medical terminologies, understand patient queries, and even assist in diagnosing from text descriptions—tasks that require deep domain knowledge.


You can read the second part of the blog series here – RAG vs finetuning: Which is the best tool?


Key takeaways

Hence, this provides a comprehensive introduction to RAG and fine-tuning, highlighting their roles in advancing the capabilities of large language models (LLMs). Some key points to take away from this discussion can be put down as:

  • LLMs struggle with providing up-to-date information and excelling in specialized domains.
  • RAG addresses these limitations by incorporating external information retrieval during response generation, ensuring informative and relevant answers.
  • Fine-tuning refines pre-trained LLMs for specific tasks, enhancing their expertise and performance in those areas.
March 18
Huda Mahmood - Author
Huda Mahmood

Covariant AI has emerged in the news with the introduction of its new model called RFM-1. The development has created a new promising avenue of exploration where humans and robots come together. With its progress and successful integration into real-world applications, it can unlock a new generation of AI advancements.

Explore the potential of generative AI and LLMs for non-profit organizations

In this blog, we take a closer look at the company and its new model.

What is Covariant AI?

The company develops AI-powered robots for warehouses and distribution centers. It spun off in 2017 from OpenAI by its ex-research scientists, Peter Chen and Pieter Abbeel. Its robots are powered by a technology called the Covariant Brain, a machine-learning (ML) model to train and improve robots’ functionality in real-world applications.

The company has recently launched a new AL model that takes up one of the major challenges in the development of robots with human-like intelligence. Let’s dig deeper into the problem and its proposed solution.

Large language model bootcamp

What was the challenge?

Today’s digital world is heavily reliant on data to progress. Since generative AI is an important aspect of this arena, data and information form the basis of its development as well. So the development of enhanced functionalities in robots, and the appropriate training requires large volumes of data.

The limited amount of available data poses a great challenge, slowing down the pace of progress. It was a result of this challenge that OpenAI disbanded its robotics team in 2021. The data was insufficient to train the movements and reasoning of robots appropriately.

However, it all changed when Covariant AI introduced its new AI model.


Understanding the Covariant AI model

The company presented the world with RFM-1, its Robotics Foundation Model as a solution and a step ahead in the development of robotics. Integrating the characteristics of large language models (LLMs) with advanced robotic skills, the model is trained on a real-world dataset.

Covariant used its years of data from its AI-powered robots already operational in warehouses. For instance, the item-picking robots working in the warehouses of Crate & Barrel and Bonprix. With these large enough datasets, the challenge of data limitation was addressed, enabling the development of RFM-1.

Since the model leverages real-world data of robots operating within the industry, it is well-suited to train the machines efficiently. It brings together the reasoning of LLMs and the physical dexterity of robots which results in human-like learning of the robots.


An outlook of RFM-1
An outlook of the features and benefits of RFM-1


Unique features of RFM-1

The introduction of the new AI model by Covariant AI has definitely impacted the trajectory of future developments in generative AI. While we still have to see how the journey progresses, let’s take a look at some important features of RFM-1.

Multimodal training capabilities

The RFM-1 is designed to deal with five different types of input: text, images, video, robot instructions, and measurements. Hence, it is more diverse in data processing than a typical LLM that is primarily focused on textual data input.

Integration with the physical world

Unlike your usual LLMs, this AI model engages with the physical world around it through a robot. The multimodal data understanding enables it to understand the surrounding environment in addition to the language input. It enables the robot to interact with the physical world.

Advanced reasoning skills

The advanced AI model not only processes the available information but engages with it critically. Hence, RFM-1 has enhanced reasoning skills that provide the robot with a better understanding of situations and improved prediction skills.


Learn to build LLM applications


Benefits of RFM-1

The benefits of the AI model align with its unique features. Some notable advantages of this development are:

Enhanced performance of robots

The multimodal data enables the robots to develop a deeper understanding of their environments. It results in their improved engagement with the physical world, allowing them to perform tasks more efficiently and accurately. It will directly result in increased productivity and accuracy of business operations where the robots operate.

Improved adaptability

Based on the model’s improved reasoning skills, it ensure that the robots are equipped to understand, learn, and reason with new data. Hence, the robots become more versatile and adaptable to their changing environment.

Reduced reliance on programming

RFM-1 is built to constantly engage with and learn from its surroundings. Since it enables the robot to comprehend and reason with the changing input data, the reliance on pre-programmed instructions is reduced. The process of development and deployment becomes simpler and faster.

Hence, the multiple new features of RFM-1 empower it to create useful changes in the world of robotic development. Here’s a short video from Covariant AI, explaining and introducing their new AI model.

The future of RFM-1

The future of RFM-1 looks very promising, especially within the world of robotics. It has opened doors to a completely new possibility of developing a range of flexible and reliable robotic systems.

Covariant AI has taken the first step towards empowering commercial robots with an enhanced understanding of their physical world and language. Moreover, it has also introduced new avenues to integrate LLMs within the arena of generative AI applications.

Read about the top 10 industries that can benefit from LLMs

March 15