Google introduced an LLM inference engine, a library of reference diffusion models, and TPU optimizations for transformer models at Google Cloud Next ’24. Credit: Arthur Osipyan Google at Google Cloud Next 24 unveiled three open source projects for building and running generative AI models. The company also introduced new large language models to its MaxText project of JAX-built LLMs. The new LLM models in MaxText include Gemma, GPT-3, Llama 2, and Mistral, which are supported across both Google Cloud TPUs and Nvidia GPUs, the company said. The newly unveiled open source projects are MaxDiffusion, JetStream, and Optimum-TPU. MaxDiffusion is a collection of high-performance and scalable reference implementations for diffusion models such as Stable Diffusion. Like the MaxText models, the MaxDiffusion models are built on JAX, which is a framework for high-performance numerical computing and large-scale machine learning. JAX in turn is integrated with the OpenXLA compiler, which optimizes numerical functions and delivers excellent performance at scale, allowing model builders to focus on the math and let the software drive the most effective implementation. “We’ve heavily optimized JAX and OpenXLA performance on Cloud TPU and partnered closely with Nvidia to optimize OpenXLA performance on large Cloud GPU clusters,” Google said. The company also introduced Jetstream, which is an open source optimized LLM inference engine supporting XLA compilers. “As customers bring their AI workloads to production, there’s an increasing demand for a cost-efficient inference stack that delivers high performance. JetStream helps with this need and offers support for models trained with both JAX and PyTorch/XLA, and includes optimizations for popular open models such as Llama 2 and Gemma,” Mark Lohmeyer, general manager of compute and ML infrastructure at Google Cloud, said. Finally, Google’s open source announcements included the launch of Optimum-TPU for PyTorch users in the Hugging Face community. Optimum-TPU brings Google Cloud TPU performance optimizations for both training and inference. It supports the Gemma 2b model now and Llama and Mistral soon, Google said. Related content analysis Beyond the usual suspects: 5 fresh data science tools to try today The mid-month report includes quick tips for easier Python installation, a new VS Code-like IDE just for Python and R users, and five newer data science tools you won't want to miss. By Serdar Yegulalp Jul 12, 2024 2 mins Python Programming Languages Software Development analysis Generative AI won’t fix cloud migration You’ve probably heard how generative AI will solve all cloud migration problems. It’s not that simple. Generative AI could actually make it harder and more costly. By David Linthicum Jul 12, 2024 5 mins Generative AI Artificial Intelligence Cloud Computing news HR professionals trust AI recommendations HireVue survey finds 73% of HR professionals trust AI to make candidate recommendations, while 75% of workers are opposed to AI making hiring decisions. By Paul Krill Jul 11, 2024 3 mins Technology Industry Careers how-to Safety off: Programming in Rust with `unsafe` What does it mean to write unsafe code in Rust, and what can you do (and not do) with the 'unsafe' keyword? The facts may surprise you. By Serdar Yegulalp Jul 11, 2024 8 mins Rust Programming Languages Software Development Resources Videos