matt_asay
Contributor

Getting through the awkward toddler phase of generative AI

analysis
08 Jan 20244 mins
Artificial IntelligenceEmerging TechnologyGenerative AI

Our unrealistic expectations of genAI are like hoping a two-year-old will calmly act like an adult. Try patiently experimenting with prompts and spending ‘quality time’ with this developing technology.

crying whining baby after tantrum
Credit: Thinkstock

We put up with bad software all the time. Anyone who has raged against their enterprise travel booking machine or tried to decipher the interface to their corporate tool for logging employee feedback knows what I’m talking about. Despite these problems, we continue to use (and, let’s be honest, write) bad software.

Yet when it comes to large language models, ChatGPT, and other aspects of our generative AI (genAI) universe, we don’t seem to accord the same level of patience. As developer Simon Willison notes, “While normally you would see people complain about how hard software is to use, in this case [of genAI], people having trouble getting good results instead assume that it’s actually useless and give up on it entirely.”

Are we holding generative AI to an unrealistic standard?

Inflated expectations

The answer is clearly yes, but who is to blame? Pretty much everyone. From the people who fear AI-driven machines will take our jobs, to vendors AI-washing their tired products, to the media looking for interesting content, to [insert demographic here], we’ve collectively come to expect too much from AI, both good and bad.

In the case of genAI, this has led proponents to overlook or soft-pedal some of genAI’s obvious shortcomings. Bill Gates, for example, has an incredibly ambitious vision for where genAI is going that seems divorced from even the most optimistic present-day reality. Such hype helps no one and makes it harder to tackle some of genAI’s core problems.

For starters, as Amelia Wattenberger argues, chat is a strange, unintuitive way to discover genAI’s smarts. As she notes, things like ChatGPT “greet” users with a text box, with no real guidance on what to type into the box and, essentially, no visibility into why it responds in a certain way. She continues, “Of course, users can learn over time what prompts work well and which don’t, but the burden to learn what works still lies with every single user.”

Compounding this problem, researchers Zamfirescu-Pereira, Wong, Hartmann, and Yang claim, “Even for [natural language processing] experts, prompt engineering requires extensive trial and error, iteratively experimenting and assessing the effects of various prompt strategies on concrete input-output pairs before assessing them more systematically on large data sets.” We’re all trying to figure out how to create inputs that yield great output, and we’re mostly failing.

It doesn’t help that the industry has been moving so fast, as Benj Edwards points out:  “Whatever techniques you develop to use [large language models] well [are] obsolete in three to four months.” Surely vendors like OpenAI could be baking more guardrails into their products, making it easier for non-experts to become productive and eliminate some of these UX issues.

These teething pains with genAI, however, don’t warrant the conclusion that it’s either all hype or that it doesn’t work.

Practical reality

The friction inherent in ChatGPT and other genAI tools is real, as Sebastian Bensusan details in a friction log, but also solvable. And some of that “solving” comes down to user experience. Yes, the tools can and should bake more smarts into the interface, but it’s also true that one key way to get more value from genAI is to keep practicing until we figure out where its sharp edges are.

Few have more experience with this than Willison, who suggests, “To get the most value out of [large language models]—and to avoid the many traps that they set for the unwary user—you need to spend time with them and work to build an accurate mental model of how they work, what they are capable of, and where they are most likely to go wrong.” Yes, the tools need to improve, but this doesn’t eliminate the need for users to get smarter and savvier as well.

For those inclined to dismiss genAI because it’s hard, I’d urge patience and practice, as Willison does. As he concludes, genAI “can be flawed and lying and have all [sorts of] problems … and it can also be a massive productivity boost.”

matt_asay
Contributor

Matt Asay runs developer relations at MongoDB. Previously. Asay was a Principal at Amazon Web Services and Head of Developer Ecosystem for Adobe. Prior to Adobe, Asay held a range of roles at open source companies: VP of business development, marketing, and community at MongoDB; VP of business development at real-time analytics company Nodeable (acquired by Appcelerator); VP of business development and interim CEO at mobile HTML5 start-up Strobe (acquired by Facebook); COO at Canonical, the Ubuntu Linux company; and head of the Americas at Alfresco, a content management startup. Asay is an emeritus board member of the Open Source Initiative (OSI) and holds a J.D. from Stanford, where he focused on open source and other IP licensing issues.

More from this author

Exit mobile version