fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Falcon 180B language model overtakes Meta and Google

DSD icon
Data Science Dojo Staff

September 20

The recently unveiled Falcon Large Language Model, boasting 180 billion parameters, has surpassed Meta’s LLaMA 2, which had 70 billion parameters.

 


Falcon 180B: A game-changing open-source language model

The artificial intelligence community has a new champion in Falcon 180B, an open-source large language model (LLM) boasting a staggering 180 billion parameters, trained on a colossal dataset. This powerhouse newcomer has outperformed previous open-source LLMs on various fronts.

Falcon AI, particularly Falcon LLM 40B, represents a significant achievement by the UAE’s Technology Innovation Institute (TII). The “40B” designation indicates that this Large Language Model boasts an impressive 40 billion parameters.

Notably, TII has also developed a 7 billion parameter model, trained on a staggering 1500 billion tokens. In contrast, the Falcon LLM 40B model is trained on a dataset containing 1 trillion tokens from RefinedWeb. What sets this LLM apart is its transparency and open-source nature.

 

Large language model bootcamp

Falcon operates as an autoregressive decoder-only model and underwent extensive training on the AWS Cloud, spanning two months and employing 384 GPUs. The pretraining data predominantly comprises publicly available data, with some contributions from research papers and social media conversations.

Significance of Falcon AI

The performance of Large Language Models is intrinsically linked to the data they are trained on, making data quality crucial. Falcon’s training data was meticulously crafted, featuring extracts from high-quality websites, sourced from the RefinedWeb Dataset. This data underwent rigorous filtering and de-duplication processes, supplemented by readily accessible data sources. Falcon’s architecture is optimized for inference, enabling it to outshine state-of-the-art models such as those from Google, Anthropic, Deepmind, and LLaMa, as evidenced by its ranking on the OpenLLM Leaderboard.

Beyond its impressive capabilities, Falcon AI distinguishes itself by being open-source, allowing for unrestricted commercial use. Users have the flexibility to fine-tune Falcon with their data, creating bespoke applications harnessing the power of this Large Language Model. Falcon also offers Instruct versions, including Falcon-7B-Instruct and Falcon-40B-Instruct, pre-trained on conversational data. These versions facilitate the development of chat applications with ease.

Hugging Face Hub Release

Announced through a blog post by the Hugging Face AI community, Falcon 180B is now available on Hugging Face Hub.

This latest-model architecture builds upon the earlier Falcon series of open-source LLMs, incorporating innovations like multiquery attention to scale up to its massive 180 billion parameters, trained on a mind-boggling 3.5 trillion tokens.

Unprecedented Training Effort

Falcon 180B represents a remarkable achievement in the world of open-source models, featuring the longest single-epoch pretraining to date. This milestone was reached using 4,096 GPUs working simultaneously for approximately 7 million GPU hours, with Amazon SageMaker facilitating the training and refinement process.

Surpassing LLaMA 2 & commercial models

To put Falcon 180B’s size in perspective, its parameters are 2.5 times larger than Meta’s LLaMA 2 model, previously considered one of the most capable open-source LLMs. Falcon 180B not only surpasses LLaMA 2 but also outperforms other models in terms of scale and benchmark performance across a spectrum of natural language processing (NLP) tasks.

It achieves a remarkable 68.74 points on the open-access model leaderboard and comes close to matching commercial models like Google’s PaLM-2, particularly on evaluations like the HellaSwag benchmark.

Falcon AI: A strong benchmark performance

Falcon 180B consistently matches or surpasses PaLM-2 Medium on widely used benchmarks, including HellaSwag, LAMBADA, WebQuestions, Winogrande, and more. Its performance is especially noteworthy as an open-source model, competing admirably with solutions developed by industry giants.

Comparison with ChatGPT

Compared to ChatGPT, Falcon 180B offers superior capabilities compared to the free version but slightly lags behind the paid “plus” service. It typically falls between GPT 3.5 and GPT-4 in evaluation benchmarks, making it an exciting addition to the AI landscape.

Falcon AI with LangChain

LangChain is a Python library designed to facilitate the creation of applications utilizing Large Language Models (LLMs). It offers a specialized pipeline known as HuggingFacePipeline, tailored for models hosted on HuggingFace. This means that integrating Falcon with LangChain is not only feasible but also practical.

Installing LangChain package

Begin by installing the LangChain package using the following command:

This command will fetch and install the latest LangChain package, making it accessible for your use.

Creating a pipeline for Falcon model

Next, let’s create a pipeline for the Falcon model. You can do this by importing the required components and configuring the model parameters:

Here, we’ve utilized the HuggingFacePipeline object, specifying the desired pipeline and model parameters. The ‘temperature’ parameter is set to 0, reducing the model’s inclination to generate imaginative or off-topic responses. The resulting object, named ‘llm,’ stores our Large Language Model configuration.

PromptTemplate and LLMChain

LangChain offers tools like PromptTemplate and LLMChain to enhance the responses generated by the Large Language Model. Let’s integrate these components into our code:

In this section, we define a template for the PromptTemplate, outlining how our LLM should respond, emphasizing humor in this case. The template includes a question placeholder labeled {query}. This template is then passed to the PromptTemplate method and stored in the ‘prompt’ variable.

To finalize our setup, we combine the Large Language Model and the Prompt using the LLMChain method, creating an integrated model configured to generate humorous responses.

Putting it into action

Now that our model is configured, we can use it to provide humorous answers to user questions. Here’s an example code snippet:

In this example, we presented the query “How to reach the moon?” to the model, which generated a humorous response. The Falcon-7B-Instruct model followed the prompt’s instructions and produced an appropriate and amusing answer to the query.

This demonstrates just one of the many possibilities that this new open-source model, Falcon AI, can offer.

A promising future

Falcon 180B’s release marks a significant leap forward in the advancement of large language models. Beyond its immense parameter count, it showcases advanced natural language capabilities from the outset.

With its availability on Hugging Face, the model is poised to receive further enhancements and contributions from the community, promising a bright future for open-source AI.

 

 

Learn to build LLM applications

 

DSD icon
Written by Data Science Dojo Staff
Interested in writing for us? Apply here: Submit your guest post with us
Newsletters | Data Science Dojo
Up for a Weekly Dose of Data Science?

Subscribe to our weekly newsletter & stay up-to-date with current data science news, blogs, and resources.

Data Science Dojo | data science for everyone

Discover more from Data Science Dojo

Subscribe to get the latest updates on AI, Data Science, LLMs, and Machine Learning.