Interested in a hands-on learning experience for developing LLM applications?
Join our LLM Bootcamp today and Get 28% Off for a Limited Time!

Essential Data Preparation Toolkit for LLM Application Developers

Agenda

Optimize LLM Development with Advanced Data Preparation Techniques

In the world of AI, conversations often revolve around models but conclude with data. As the Generative AI landscape evolves, data preparation has become a critical phase in crafting high-performing Large Language Models (LLMs). The success of LLMs hinges on the quality and quantity of the text and code corpora used during their training. The data preparation phase is essential for cleaning, filtering, and transforming datasets into a tokenized form, suitable for either pre-training or fine-tuning LLMs.

Key Takeaways:

  • Discover how DPK fosters collaboration within the AI community.
  • Learn how DPK can accelerate your development process and reduce time-to-value.
  • See how DPK has been a driving force behind the IBM open-source Granite models.
Data Preparation - Shahrokh Daijavad
Shahrokh Daijavad

Research Scientist, IBM Almaden Research Center

Shahrokh Daijavad, a distinguished Research Scientist in the Watsonx Data Engineering group at IBM Almaden Research Center, has a rich background in Edge Computing and Data Engineering. He earned his B.Eng. and Ph.D. in electrical engineering from McMaster University and spent years at IBM T. J. Watson Research Center. His recent research focuses on AI@Edge and Data Engineering for IBM Watsonx AI offerings.

We are looking for passionate people willing to cultivate and inspire the next generation of leaders in tech, business, and data science. If you are one of them get in touch with us!