Welcome to Data Science Dojo’s weekly newsletter, “The Data-Driven Dispatch”.
LLMs can sometimes be confusing. On one hand, they can give you responses that would require a human prodigy. But on the other hand, they would fail to do something a 4-year-old child could easily do otherwise.
Let’s take the mayonnaise example.
Why did the model behave this way?
Well, the fact of the matter is that we often treat these language models as intelligent human beings answering our questions on the other side of the screen.
However, that is not the case. Because of how these models are designed, they sometimes behave in a very unpredictable manner.
We’ve compiled all of them for you. Time to dig in!
While it is complex to entirely understand how language models work, you can get a fair idea by learning about the transformers and attention mechanisms. Read: The Brains Behind LLMs: An Introductory Guide to Transformers and Attention Mechanisms
But the idea here is that LLMs do not process information as we humans do. To put it simply, they are like big probabilistic models trying to predict the best responses possible.
And because of how they process information and predict outputs, they often behave weirdly. We have compiled all the interesting ones for you!
Hallucinations in LLMs
LLMs hallucinate, a lot! This means if you make up a situation, something completely made up, they’ll answer you confidently! Time for some play:
Want to know a fun fact? The LLM bootcamp has never happened in Singapore yet. Check our website to learn more about the bootcamp. But the idea here is LLMs hallucinate with confidence on hypothetical stuff.
Read more: AI hallucinations: Risks associated with large language models
Training on Huge Amounts of Data
Did you know how many tokens GPT -4 is trained on? It’s about 1.76 trillion tokens. To put this staggering number into perspective, think of it this way. It would take a person 22,000 years to read through 1 trillion words at normal speed for 8 hours a day.
This reveals that LLMs are developed using vast data collections. However, the nature of this data remains a mystery. What exactly are these models trained on? The specifics of the training materials are something companies typically keep under wraps, marking a peculiar aspect of LLMs.
How Does Huge Training Data Affect Model Performance?
The popular discourse is that the bigger the model the better it works. But in reality, it is a popular myth. And in this newsletter, we love busting myths.
-
LLMs Memorize a lot and Hence They’re Easier to Trick
LLMs are more susceptible to deception due to their tendency to heavily rely on memorization. This leads to a phenomenon known as the ‘Memo Trap,’ where the model generates predictions based on its extensive prior training, rendering it inattentive to the actual question being asked.
It’s almost like when humans think they’re too smart to handle a situation because of their good understanding however they fail to comprehend the issue at hand.
The graph indicates a consistent pattern where, as we use more computing power (measured in FLOPs), the models’ ability to avoid falling into memory traps gets worse. This aligns with the observation that bigger language models tend to rely more on memorization, resulting in lower performance on tasks.
2. Deep Double Descent – More Data Leading to Lower Accuracy
Deep Double Descent implies that as these models become larger and more complex, their performance doesn’t always scale linearly with size. At certain scales, the accuracy of LLMs may drop as they increase in size due to their complexity before potentially improving again if they continue to scale up.
Read: Deep double descent – where data can damage performance
LLMs Change their Mind Under Pressure
Jamie Bernardi noted that ChatGPT often switches from a correct to an incorrect response when questioned with “Are you sure?”.
This tendency could stem from the model’s training, which involves reinforcement learning from human feedback—a method that encourages the model to align closely with user satisfaction.
Read more: Are you sure? Testing ChatGPT’s confidence
LLMs Spread Myths
The researchers behind the TruthfulQA paper discovered that the accuracy, or “truthfulness,” of responses given by large language models (LLMs) tends to decline as the size of the model increases. The study involved a dataset named “TruthfulQA,” which consists of 817 questions across various subjects. These questions were not chosen at random; instead, they were the result of adversarial selection.
Recommended Read: Small Language Models: The Unsung Heroes of AI
Upcoming Live Talks
Here’s a great talk happening soon by Wojtek Kuberski, Founder of NannyML. He’ll be covering why and how to monitor LLMs deployed to production.
The talk will focus on state-of-the-art solutions for detecting hallucinations.
What Will You Learn?
You will build an intuitive understanding of the LLM monitoring methods, their strengths and weaknesses, and learn how to easily set up an LLM monitoring system.
Pop Quiz! Answer the question in the comments at the end of the newsletter.
Finally, let’s end the week with some interesting headlines in the Gen-AI-Verse.
- Vision Pro is Apple’s first major push into AI, which will help drive a 30% stock gain this year, Wedbush says. Read more
- Google unveils mobile diffusion: A leap forward in on-the-go text-to-image generation technology. Read more
- OpenAI revolutionizes ChatGPT conversations with the new feature, GPT mentions. Read more
- Prophetic releases Morpheus 1, the world’s first multi-modal generative transformer designed to induce and stabilize lucid dreams. Read more
- Google Maps employs generative AI to enhance local guides. Read more