Without basic computer architecture best practices, generative AI systems are sluggish. Here are a few tips to optimize complex systems. I’ve been asked if generative AI systems are always slow. Of course, I reply, “Slow, as compared to what?” The response I always get is funny. “Slower than we thought it would be.” And the circle continues. Performance is often an afterthought with generative AI development and deployment. Most deploying generative AI systems on the cloud, and even not the cloud, have yet to learn what the performance of their generative AI systems should be, take no steps to determine performance, and end up complaining about the performance after deployment. Or, more often, the users complain, and then generative AI designers and developers complain to me. Challenges of generative AI performance At their essence, generative AI systems are complex, distributed data-oriented systems that are challenging to build, deploy, and operate. They are all different, with different moving parts. Most of the parts are distributed everywhere, from the source databases for the training data, to the output data, to the core inference engines that often exist on cloud providers. Here is my list of the most common difficulties: Complex deployment landscapes. Generative AI systems often comprise various components. They include data ingestion services, storage, computing, and networking. Architecting these components to work synergistically often leads to overcomplexity, where performance issues, determined by the poorest performing components, are different from isolating. I’ve seen poorly performing networks and saturated databases. Those things are not directly related to generative AI, but they can cause performance problems, nonetheless. AI model tuning. Performance is not solely a function of infrastructure, which is a conclusion that many reach. The AI models must be tuned and optimized, requiring deep technical expertise that few have. Vendors could have done a better job establishing best practices in performance tuning. Many enterprises are concerned that they may worsen things or introduce issues that cause erroneous outcomes. This can’t be ignored, and depending on the type of generative AI system you’re working on in the cloud, you need to figure this out by working with the generative AI service providers. Security concerns. Protecting AI models and their data against unauthorized access and breaches goes without saying, especially in cloud environments where multitenancy is common. Too many performance issues raise security risks. In many instances, security mechanisms, such as encryption, introduce performance issues that if not resolved will worsen as the data grows. Architecture and testing are your friends here. Take some time to understand how security affects generative AI performance. Regulatory compliance. Related to security is adherence to data governance and compliance standards. They can impose additional layers of performance management complexity. Much like security, we need to figure out how to work with these requirements. Most of the time, we can find a happy medium to provide the compliance we need. As with optimized performance, it just takes some trial and error. Generative AI best practices Remember that if I list best practices here, they are holistic. They don’t consider the specific type of generative AI systems you’re running, all of which have very different components and platform considerations. You’ll have to check with your specific generative AI provider about how these are carried out for your particular use cases. Given that warning, here are a few to consider: Implement automation for scaling and resource optimization, or autoscaling, which cloud providers provide. This includes using machine learning operations (MLOps) techniques and approaches for operating AI models. Utilize serverless computing, which abstracts away infrastructure management. This means you no longer must allocate the resources your generative AI will need; it’s done automatically. Although I’m not always okay with turning the keys over to an automated process that will allocate resources that we have to pay for, given all the other things you need to be concerned with, this is one less thing to worry about. Conduct regular load testing and performance evaluations. Ensure that your generative AI systems can handle peak demands. Most skip this and guess how much the load will be at the top of the curve. Can you say “outage”? Employ a continuous learning approach. AI models should be regularly updated with new data and refined to maintain performance and relevance. Tap into the expertise and support of cloud service providers. Also, make sure to monitor online communities supporting your specific technology stack. You’ll find many answers there that $700-an-hour consultants won’t be able to provide. I suspect that generative AI performance will become an area of focus more than it is today. Perhaps it should be, given the amount of resources and cash we’re focusing on this exploding space. Related content analysis Generative AI won’t fix cloud migration You’ve probably heard how generative AI will solve all cloud migration problems. It’s not that simple. Generative AI could actually make it harder and more costly. By David Linthicum Jul 12, 2024 5 mins Generative AI Artificial Intelligence Cloud Computing analysis All the brilliance of AI on minimalist platforms Buy all the processing and storage you can or go with a minimum viable platform? AI developers and designers are dividing into two camps. By David Linthicum Jul 09, 2024 5 mins Generative AI Cloud Architecture Artificial Intelligence analysis The next 10 years for cloud computing Despite AI's explosive growth, the industry still needs to face facts that customers are unhappy about costs and vendor lock-in. By David Linthicum Jul 05, 2024 5 mins Amazon Web Services Google Cloud Platform Microsoft Azure analysis Serverless cloud technology fades away Serverless was a big deal for a hot minute, but now it seems old-fashioned, even though its basic elements, agility and scalability, are still relevant. By David Linthicum Jul 02, 2024 4 mins Serverless Computing Cloud Computing Software Development Resources Videos