fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Tokenization in NLP: From Basics to Advanced Techniques

Agenda

Have you ever wondered how machines understand the nuances of human language? It all starts with tokenization, the foundational step in training language models to grasp our complex languages.

Join us for our upcoming live talk with Suman Debnath, Principal Developer Advocate for Machine Learning at Amazon Web Services, as he delves deep into this foundational element. Suman will share how tokenization enables machines to decode and process human speech through the lens of natural language processing (NLP). From the initial challenges of segmenting text into manageable pieces to the sophisticated techniques that enable deeper language understanding, this talk is tailored for enthusiasts eager to deepen their knowledge and refine their skills in NLP.

Key Takeaways:

  • Understand tokenization’s impact on language models
  • Learn text splitting for deeper analysis
  • Explore Byte Pair Encoding’s efficiency
  • Discover sliding windows for better training data
  • Learn about converting tokens into vectors

Whether you’re just starting out or looking to brush up on the latest in NLP, this session promises a blend of foundational knowledge and advanced insights, all presented in an accessible and engaging format.

Tokenization in NLP: From Basics to Advanced Techniques
Suman Debnath

Principal Developer Advocate for Machine Learning at Amazon Web Services

Suman Debnath, Principal Developer Advocate for Machine Learning at Amazon Web Services, is passionate about deep learning, natural language understanding, and large-scale distributed systems. He is also an avid fan of Python.

RSVP

We are looking for passionate people willing to cultivate and inspire the next generation of leaders in tech, business, and data science. If you are one of them get in touch with us!