How to Efficiently Fine-Tune and Serve OSS LLMs

Most teams start with a commercial LLM like GPT-4 and the appeal is pretty clear: users gain instant access to a large language model with trillions of parameters that shows great performance out of the box and beats almost all open source alternatives. However, the story changes once open-source LLMs are fine-tuned, where they not only consistently beat GPT-4 by 4-15 points on downstream tasks, but do it while being 250x smaller and with a 100x cheaper inference cost. Arnav Garg, Machine Learning Engineer at Predibase, discusses how one can efficiently fine-tune and serve OSS LLMs using open source packages like Ludwig and LoRAX with a focus on core fine-tuning and inference ideas that enable faster, cheaper, smaller and better models compared to closed source providers. This talk was originally delivered at Arize:Observe 2024 at Shack 15 in San Francisco on July 11, 2024.

Arnav Garg

Predibase

Arnav is a Senior Machine Learning Engineer at Predibase, where he leads building LLM fine-tuning capabilities and is a co-maintainer of popular open-source packages Ludwig (11K stars) and LoRAX (1.5K stars). Prior to Predibase, Arnav worked as a Machine Learning Scientist at Atlassian where he built large scale content recommendation systems for Confluence and Trello to power various machine learning features across both products. Arnav holds a bachelor’s degree in Computer Science from UCLA.

Subscribe to our resources and blogs

Subscribe