Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Traditional databases in healthcare struggle to grasp the complex relationships between patients and their clinical histories. This limitation hinders personalized medicine and hampers rapid diagnosis. Vector databases, with their ability to store and query high-dimensional patient data, emerge as a revolutionary solution.

This blog delves into the technical details of how AI in healthcare empowers patient similarity searches and paves the path for precision medicine.

Impact of AI on healthcare

The healthcare landscape is brimming with data such as demographics, medical records, lab results, imaging scans, – the list goes on. While these large datasets hold immense potential for personalized medicine and groundbreaking discoveries, traditional relational databases cannot store such high-dimensional data at a large scale and often fall short.

Their rigid structure struggles to represent the intricate connections and nuances inherent in patient data.

Vector databases are revolutionizing healthcare data management. Unlike traditional, table-like structures, they excel at handling the intricate, multi-dimensional nature of patient information.

Each patient becomes a unique point in a high-dimensional space, defined by their genetic markers, lab values, and medical history. This dense representation unlocks powerful capabilities discussed later.

Working with vector data is tough because regular databases, which usually handle one piece of information at a time, can’t handle the complexity and large amount of this type of data. This makes it hard to find important information and analyze it quickly.

That’s where vector databases come in handy—they are made on purpose to handle this special kind of data. They give you the speed, ability to grow, and flexibility you need to get the most out of your data.


how vector databases work
Understand the functionality of vector databases – Source: kdb.ai


Patient similarity search with vector databases in healthcare

The magic lies in the ability to perform a similarity search. By calculating the distance between patient vectors, we can identify individuals with similar clinical profiles. This opens a large span of possibilities.

Personalized treatment plans

By uncovering patients with comparable profiles and treatment outcomes, doctors can tailor interventions with greater confidence and optimize individual care. It also serves as handy for medical researchers to look for efficient cures or preventions for a disease diagnosed over multiple patients by analyzing their data, particularly for a certain period. 

Here’s how vector databases transform treatment plans:

  • Precise Targeting: By comparing a patient’s vector to those of others who have responded well to specific treatments, doctors can identify the most promising options with laser-like accuracy. This reduces the guesswork and minimizes the risk of ineffective therapies.
  • Predictive Insights: Vector databases enable researchers to analyze the trajectories of similar patients, predicting their potential responses to different treatments. This foresight empowers doctors to tailor interventions, preventing complications and optimizing outcomes proactively.
  • Unlocking Untapped Potential: By uncovering hidden connections between seemingly disparate data points, vector databases can reveal new therapeutic targets and treatment possibilities. This opens doors for personalized medicine breakthroughs that were previously unimaginable.
  • Dynamic Adaptation: As a patient’s health evolves, their vector map shifts and readjusts accordingly. This allows for real-time monitoring and continuous refinement of treatment plans, ensuring the best possible care at every stage of the journey.


Large language model bootcamp


Drug discovery and repurposing

Identifying patients similar to those successfully treated with a specific drug can accelerate clinical trials and uncover unexpected connections for existing medications.

  • Accelerated exploration: They transform complex drug and disease data into dense vectors, allowing for rapid similarity searches and the identification of promising drug candidates. Imagine sifting through millions of molecules at a single glance, pinpointing those with similar properties to known effective drugs.
  • Repurposing potential: Vector databases can unearth hidden connections between existing drugs and potential new applications. By comparing drug vectors to disease vectors, they can reveal unexpected repurposing opportunities, offering a faster and cheaper path to new treatments. 
  • Personalization insights: By weaving genetic and patient data into the drug discovery tapestry, vector databases can inform the development of personalized medications tailored to individual needs and responses. This opens the door to a future where treatments are as unique as the patients themselves. 
  • Predictive power: Analyzing the molecular dance within the vector space can unveil potential side effects and predict drug efficacy before entering clinical trials. This helps navigate the treacherous waters of development, saving time and resources while prioritizing promising candidates. 

Cohort analysis in research

Grouping patients with similar characteristics facilitates targeted research efforts, leading to faster breakthroughs in disease understanding and treatment development.

  • Exploring Disease Mechanisms: Vector databases facilitate the identification of patient clusters that share similar disease progression patterns. This can shed light on underlying disease mechanisms and guide the development of novel diagnostic markers and therapeutic target 
  • Unveiling Hidden Patterns: Vector databases excel at similarity search, enabling researchers to pinpoint patients with similar clinical trajectories, even if they don’t share the same diagnosis or traditional risk factors. This reveals hidden patterns that might have been overlooked in traditional data analysis methods.


Learn to build LLM applications


Technicalities of vector databases

Using a vector database enables the incorporation of advanced functionalities into our artificial intelligence, such as semantic information retrieval and long-term memory. The diagram provided below enhances our comprehension of the significance of vector databases in such applications.


query result using vector healthcare databases
Role of vector databases in information retrieval – Source: pinecone.io


Let’s break down the illustrated process:

  • Initially, we employ the embedding model to generate vector embeddings for the content intended for indexing.
  • The resulting vector embedding is then placed into the vector database, referencing the original content from which the embedding was derived. 
  • Upon receiving a query from the application, we utilize the same embedding model to create embeddings for the query. These query embeddings are subsequently used to search the database for similar vector embeddings. As previously noted, these analogous embeddings are linked to the initial content from which they were created.

In comparison to the working of a traditional database, where data is stored as common data types like string, integer, date, etc. Users query the data by comparison with each row; the result of this query is the rows where the condition of the query is withheld.

In vector databases, this process of querying is more optimized and efficient with the use of a similarity metric for searching the most similar vector to our query. The search involves a combination of various algorithms, like approximate nearest neighbor optimization, which uses hashing, quantization, and graph-based detection.

Here are a few key components of the discussed process described below:

  • Feature engineering: Transforming raw clinical data into meaningful numerical representations suitable for vector space. This may involve techniques like natural language processing for medical records or dimensionality reduction for complex biomolecular data. 
  • Distance metrics: Choosing the appropriate distance metric to calculate the similarity between patient vectors. Popular options include Euclidean distance, cosine similarity, and Manhattan distance, each capturing different aspects of the data relationships.


distance metrics to calculate similarity in vector databases
Distance metrics to calculate similarity – Source: Camelot


    • Cosine Similarity: Calculates the cosine of the angle between two vectors in a vector space. It varies from -1 to 1, with 1 indicating identical vectors, 0 denoting orthogonal vectors, and -1 representing diametrically opposed vectors.
    • Euclidean Distance: Measures the straight-line distance between two vectors in a vector space. It ranges from 0 to infinity, where 0 signifies identical vectors and larger values indicate increasing dissimilarity between vectors.
    • Dot Product: Evaluate the product of the magnitudes of two vectors and the cosine of the angle between them. Its range is from -∞ to ∞, with a positive value indicating vectors pointing in the same direction, 0 representing orthogonal vectors, and a negative value signifying vectors pointing in opposite directions. 
  • Nearest neighbor search algorithms: Efficiently retrieving the closest patient vectors to a given query. Techniques like k-nearest neighbors (kNN) and Annoy trees excel in this area, enabling rapid identification of similar patients.


A general pipeline from storing vectors to querying them is shown in the figure below:


pipeline for vector database
Pipeline for vector database – Source: pinecone.io


  • Indexing: The vector database utilizes algorithms like PQ, LSH, or HNSW (detailed below) to index vectors. This process involves mapping vectors to a data structure that enhances search speed. 
  • Querying: The vector database examines the indexed query vector against the dataset’s indexed vectors, identifying the nearest neighbors based on a similarity metric employed by that specific index. 
  • Post Processing: In certain instances, the vector database retrieves the ultimate nearest neighbors from the dataset and undergoes post-processing to deliver the final results. This step may involve re-evaluating the nearest neighbors using an alternative similarity measure.

Challenges and considerations

While vector databases offer immense potential, challenges remain:

  • Data privacy and security: Safeguarding patient data while harnessing its potential for enhanced healthcare outcomes requires the implementation of robust security protocols and careful consideration of ethical standards.

This involves establishing comprehensive measures to protect sensitive information, ensuring secure storage, and implementing stringent access controls.

Additionally, ethical considerations play a pivotal role, emphasizing the importance of transparent data handling practices, informed consent procedures, and adherence to privacy regulations. As healthcare organizations leverage the power of data to advance patient care, a meticulous approach to security and ethics becomes paramount to fostering trust and upholding the integrity of the healthcare ecosystem. 

  • Explainability and interpretability: Gaining insight into the reasons behind patient similarity is essential for informed clinical decision-making. It is crucial to develop transparent models that not only analyze the “why” behind these similarities but also offer insights into the importance of features within the vector space.

This transparency ensures a comprehensive understanding of the factors influencing patient similarities, contributing to more effective and reasoned clinical decisions. Integration with existing infrastructure: Seamless integration with legacy healthcare systems is essential for the practical adoption of vector database technology.



Revolution of medicine – AI in healthcare

In summary, the integration of vector databases in healthcare is revolutionizing patient care and diagnostics. Overcoming the limitations of traditional systems, these databases enable efficient handling of complex patient data, leading to precise treatment plans, accelerated drug discovery, and enhanced research capabilities.

While the technical aspects showcase the sophistication of these systems, challenges such as data privacy and seamless integration with existing infrastructure need attention. Despite these hurdles, the potential benefits promise a significant impact on personalized medicine and improved healthcare outcomes.

January 30, 2024

Mistral AI, a startup co-founded by individuals with experience at Google’s DeepMind and Meta, made a significant entrance into the world of LLMs with Mistral 7B.

This model can be easily accessed and downloaded from GitHub or via a 13.4-gigabyte torrent, emphasizing accessibility. Mistral 7b, a 7.3 billion parameter model with the sheer size of some of its competitors, Mistral 7b punches well above its weight in terms of capability and efficiency. 

What makes Mistral 7b a great competitor? 

One of the key strengths of Mistral 7b lies in its architecture. Unlike many LLMs relying solely on transformer networks, Mistral 7b incorporates a hybrid approach, leveraging transformers and recurrent neural networks (RNNs). This unique blend allows Mistral 7b to excel at tasks that require both long-term memory and context awareness, such as question answering and code generation. 

Furthermore, Mistral 7b utilizes innovative attention mechanisms like group query attention and sliding window attention. These techniques enable the model to focus on relevant parts of the input data more effectively, improving performance and efficiency. 


Learn in detail about llm evaluation method


Mistral 7b architecture 

Mistral 7B is an architecture based on transformer architecture and introduces several innovative features and parameters. Here’s a gist of the architectural details: 


  1. Sliding window attention: 

Mistral 7B addresses the quadratic complexity of vanilla attention by implementing Sliding Window Attention (SWA). 

SWA allows each token to attend to a maximum of W tokens from the previous layer (here, W = 3). 

Tokens outside the sliding window still influence next-word prediction. 

Information can propagate forward by up to k × W tokens after k attention layers. 

Parameters include dim = 4096, n_layers = 32, head_dim = 128, hidden_dim = 14336, n_heads = 32, n_kv_heads = 8, window_size = 4096, context_len = 8192, and vocab_size = 32000. 



sliding window attention




2. Rolling Buffer Cache: 

This fixed-size cache serves as the “memory” for the sliding window attention. It efficiently stores key-value pairs for recent timesteps, eliminating the need for recomputing that information. A set attention span stays constant, managed by a rolling buffer cache limiting its size. 

Within the cache, each time step’s keys and values are stored at a specific location, determined by i mod W, where W is the fixed cache size. When the position i exceeds W, previous values in the cache get replaced. 

This method slashes cache memory usage by 8 times while maintaining the model’s effectiveness. 



Rolling buffer cache




3. Pre-fill and chunking: 

During sequence generation, the cache is pre-filled with the provided prompt to enhance context. For long prompts, chunking divides them into smaller segments, each treated with both cache and current chunk attention, further optimizing the process.

When creating a sequence, tokens are guessed step by step, with each token relying on the ones that came before it. The starting information, known as the prompt, lets us fill the (key, value) cache beforehand with this prompt.

The chunk size can determine the window size, and the attention mask is used across both the cache and the chunk. This ensures the model gets the necessary information while staying efficient. 


pre fill and chunking




Comparison of performance: Mistral 7B vs Llama2-13B  

The true test of any LLM lies in its performance on real-world tasks. Mistral 7b has been benchmarked against several established models, including Llama 2 (13B parameters) and Llama 1 (34B parameters).

The results are impressive, with Mistral 7b outperforming both models on all tasks tested. It even approaches the performance of CodeLlama 7B (also 7B parameters) on code-related tasks while maintaining strong performance on general language tasks. Performance comparisons were conducted across a wide range of benchmarks, encompassing various aspects.


Large language model bootcamp


1. Performance comparison 

Mistral 7B surpasses Llama2-13B across various benchmarks, excelling in commonsense reasoning, world knowledge, reading comprehension, and mathematical tasks. Its dominance isn’t marginal; it’s a robust demonstration of its capabilities. 


2. Equivalent Model Capacity 

In reasoning, comprehension, and STEM tasks, Mistral 7B functions akin to a Llama2 model over three times its size. This not only highlights its efficiency in memory usage but also its enhanced processing speed. Essentially, it offers immense power within an elegantly streamlined design. 


3. Knowledge-based assessments 

Mistral 7B demonstrates superiority in most assessments and competes equally with Llama2-13B in knowledge-based benchmarks. This parallel performance in knowledge tasks is especially intriguing, given Mistral 7B’s comparatively restrained parameter count. 


mistral 7b assessment 



Beyond benchmarks: Practical applications 

The capabilities of Mistral 7b extend far beyond benchmark scores Mistral 7B isn’t limited to a single skill. It performs exceptionally well across various tasks, spanning code-related fields and English language tasks. Remarkably, it matches CodeLlama-7B’s performance in coding tasks, highlighting its adaptability and wide-ranging abilities.  Some of the common works in each field are mentioned below: 

  • Natural Language Processing (NLP): Machine translation, text summarization, question answering, and sentiment analysis. 
  • Code Generation and Analysis: Generate code snippets, translate natural language to code, and analyze existing code for potential issues. 
  • Creative Writing: Compose poems, scripts, musical pieces, and other creative text formats. 
  • Education and Research: Assist with research tasks, generate educational materials, and personalize learning experiences. 



mistral 7b and llama 



llama 2 and mistral



A cost-effective Solution 

One of the most compelling aspects of Mistral 7b is its cost-effectiveness. Compared to models of similar size, Mistral 7b requires significantly less computational resources to run. This makes it a more accessible option for individuals and organizations with limited budgets. Additionally, Mistral AI offers flexible deployment options, allowing users to run the model on their own infrastructure or through the cloud. 


Versatile deployment 

Mistral 7B stands out due to its Apache 2.0 license, granting broad accessibility for diverse users, including individuals, major corporations, and governmental bodies.

This open-source license not only ensures inclusivity but also permits customization and adaptation to suit specific needs. It empowers users to modify, share, and utilize Mistral 7B for a wide array of applications, fostering innovation and collaboration in the community. 


The decentralization issue vs transparency 

Mistral AI prioritizes transparency and open access, yet safety concerns arise due to the fully decentralized ‘Mistral-7B-v0.1’ model, capable of unmoderated response generation.

Unlike models such as GPT and Llama, it lacks mechanisms to discern appropriate responses, posing potential exploitation risks. However, despite safety concerns, decentralized Language Model Models (LLMs) offer advantages, democratizing AI access and enabling positive applications. 


Are large language models the zero shot reasoners? Read more here



Mistral 7b is a testament to the power of innovation in the LLM domain. Despite its relatively small size, it has established itself as a force to be reckoned with, delivering impressive performance across a wide range of tasks. With its focus on efficiency and cost-effectiveness, Mistral 7b is poised to democratize access to cutting-edge language technology and shape the future of how we interact with machines. 


January 15, 2024

 Large language models (LLMs), such as OpenAI’s GPT-4, are swiftly metamorphosing from mere text generators into autonomous, goal-oriented entities displaying intricate reasoning abilities. This crucial shift carries the potential to revolutionize the manner in which humans connect with AI, ushering us into a new frontier.

This blog will break down the working of these agents, illustrating the impact they impart on what is known as the ‘Lang Chain’. 


Working of the agents 

Our exploration into the realm of LLM agents begins with understanding the key elements of their structure, namely the LLM core, the Prompt Recipe, the Interface and Interaction, and Memory. The LLM core forms the fundamental scaffold of an LLM agent. It is a neural network trained on a large dataset, serving as the primary source of the agent’s abilities in text comprehension and generation. 

The functionality of these agents heavily relies on prompt engineering. Prompt recipes are carefully crafted sets of instructions that shape the agent’s behaviors, knowledge, goals, and persona and embed them in prompts. 


langchain agents



The agent’s interaction with the outer world is dictated by its user interface, which could vary from command-line, graphical, to conversational interfaces. In the case of fully autonomous agents, prompts are programmatically received from other systems or agents.

Another crucial aspect of their structure is the inclusion of memory, which can be categorized into short-term and long-term. While the former helps the agent be aware of recent actions and conversation histories, the latter works in conjunction with an external database to recall information from the past. 


Learn in detail about LangChain


Ingredients involved in agent creation 

Creating robust and capable LLM agents demands integrating the core LLM with additional components for knowledge, memory, interfaces, and tools.



The LLM forms the foundation, while three key elements are required to allow these agents to understand instructions, demonstrate essential skills, and collaborate with humans: the underlying LLM architecture itself, effective prompt engineering, and the agent’s interface. 



Tools are functions that an agent can invoke. There are two important design considerations around tools: 

  • Giving the agent access to the right tools 
  • Describing the tools in a way that is most helpful to the agent 

Without thinking through both, you won’t be able to build a working agent. If you don’t give the agent access to a correct set of tools, it will never be able to accomplish the objectives you give it. If you don’t describe the tools well, the agent won’t know how to use them properly. Some of the vital tools a working agent needs are:


  1. SerpAPI : This page covers how to use the SerpAPI search APIs within Lang Chain. It is broken into two parts: installation and setup, and then references to the specific SerpAPI wrapper. Here are the details for its installation and setup:
  • Install requirements with pip install google-search-results 
  • Get a SerpAPI api key and either set it as an environment variable (SERPAPI_API_KEY) 

You can also easily load this wrapper as a tool (to use with an agent). You can do this with:



2. Math-tool: The llm-math tool wraps an LLM to do math operations. It can be loaded into the agent tools like: 

Python-REPL tool: Allows agents to execute Python code. To load this tool, you can use: 


Blog | Data Science Dojo

Blog | Data Science Dojo




The action of python REPL allows agent to execute the input code and provide the response. 


The impact of agents: 

A noteworthy advantage of LLM agents is their potential to exhibit self-initiated behaviors ranging from purely reactive to highly proactive. This can be harnessed to create versatile AI partners capable of comprehending natural language prompts and collaborating with human oversight. 


Large language model bootcamp


LLM agents leverage LLMs innate linguistic abilities to understand instructions, context, and goals, operate autonomously and semi-autonomously based on human prompts, and harness a suite of tools such as calculators, APIs, and search engines to complete assigned tasks, making logical connections to work towards conclusions and solutions to problems. Here are few of the services that are highly dominated by the use of Lang Chain agents:


Blog | Data Science Dojo



Facilitating language services 

Agents play a critical role in delivering language services such as translation, interpretation, and linguistic analysis. Ultimately, this process steers the actions of the agent through the encoding of personas, instructions, and permissions within meticulously constructed prompts.

Users effectively steer the agent by offering interactive cues following the AI’s responses. Thoughtfully designed prompts facilitate a smooth collaboration between humans and AI. Their expertise ensures accurate and efficient communication across diverse languages. 



Quality assurance and validation 

Ensuring the accuracy and quality of language-related services is a core responsibility. Agents verify translations, validate linguistic data, and maintain high standards to meet user expectations. Agents can manage relatively self-contained workflows with human oversight.

Use internal validation to verify the accuracy and coherence of their generated content. Agents undergo rigorous testing against various datasets and scenarios. These tests validate the agent’s ability to comprehend queries, generate accurate responses, and handle diverse inputs. 


Types of agents 

Agents use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning a response to the user. Here are the agents available in Lang Chain.  

Zero-Shot ReAct: This agent uses the ReAct framework to determine which tool to use based solely on the tool’s description. Any number of tools can be provided. This agent requires that a description is provided for each tool. Below is how we can set up this Agent: 


Blog | Data Science Dojo


Let’s invoke this agent and check if it’s working in chain 

Blog | Data Science Dojo



This will invoke the agent. 

Structured-Input ReAct: The structured tool chat agent is capable of using multi-input tools. Older agents are configured to specify an action input as a single string, but this agent can use a tool’s argument schema to create a structured action input. This is useful for more complex tool usage, like precisely navigating around a browser. Here is how one can setup the React agent:


Blog | Data Science Dojo


The further necessary imports required are:

Blog | Data Science Dojo



Setting up parameters:


Blog | Data Science Dojo

Creating the agent:

Blog | Data Science Dojo



Improving performance of an agent 

Enhancing the capabilities of agents in Large Language Models (LLMs) necessitates a multi-faceted approach. Firstly, it is essential to keep refining the art and science of prompt engineering, which is a key component in directing these systems securely and efficiently. As prompt engineering improves, so does the competencies of LLM agents, allowing them to venture into new spheres of AI assistance.

Secondly, integrating additional components can expand agents’ reasoning and expertise. These components include knowledge banks for updating domain-specific vocabularies, lookup tools for data gathering, and memory enhancement for retaining interactions.

Thus, increasing the autonomous capabilities of agents requires more than just improved prompts; they also need access to knowledge bases, memory, and reasoning tools.

Lastly, it is vital to maintain a clear iterative prompt cycle, which is key to facilitating natural conversations between users and LLM agents. Repeated cycling allows the LLM agent to converge on solutions, reveal deeper insights, and maintain topic focus within an ongoing conversation. 



The advent of large language model agents marks a turning point in the AI domain. With increasing advances in the field, these agents are strengthening their footing as autonomous, proactive entities capable of reasoning and executing tasks effectively.

The application and impact of Large Language Model agents are vast and game-changing, from conversational chatbots to workflow automation. The potential challenges or obstacles include ensuring the consistency and relevance of the information the agent processes, and the caution with which personal or sensitive data should be treated. The promising future outlook of these agents is the potentially increased level of automated and efficient interaction humans can have with AI. 

December 20, 2023