Google introduced an LLM inference engine, a library of reference diffusion models, and TPU optimizations for transformer models at Google Cloud Next ’24.
Google at Google Cloud Next 24 unveiled three open source projects for building and running generative AI models. The company also introduced new large language models to its MaxText project of JAX-built LLMs.
The new LLM models in MaxText include Gemma, GPT-3, Llama 2, and Mistral, which are supported across both Google Cloud TPUs and Nvidia GPUs, the company said.
The newly unveiled open source projects are MaxDiffusion, JetStream, and Optimum-TPU.
MaxDiffusion is a collection of high-performance and scalable reference implementations for diffusion models such as Stable Diffusion. Like the MaxText models, the MaxDiffusion models are built on JAX, which is a framework for high-performance numerical computing and large-scale machine learning.
JAX in turn is integrated with the OpenXLA compiler, which optimizes numerical functions and delivers excellent performance at scale, allowing model builders to focus on the math and let the software drive the most effective implementation.
“We’ve heavily optimized JAX and OpenXLA performance on Cloud TPU and partnered closely with Nvidia to optimize OpenXLA performance on large Cloud GPU clusters,” Google said.
The company also introduced Jetstream, which is an open source optimized LLM inference engine supporting XLA compilers.
“As customers bring their AI workloads to production, there’s an increasing demand for a cost-efficient inference stack that delivers high performance. JetStream helps with this need and offers support for models trained with both JAX and PyTorch/XLA, and includes optimizations for popular open models such as Llama 2 and Gemma,” Mark Lohmeyer, general manager of compute and ML infrastructure at Google Cloud, said.
Finally, Google’s open source announcements included the launch of Optimum-TPU for PyTorch users in the Hugging Face community. Optimum-TPU brings Google Cloud TPU performance optimizations for both training and inference. It supports the Gemma 2b model now and Llama and Mistral soon, Google said.