Getting through the awkward toddler phase of generative AI

Our unrealistic expectations of genAI are like hoping a two-year-old will calmly act like an adult. Try patiently experimenting with prompts and spending ‘quality time’ with this developing technology.

Credit: Thinkstock

We put up with bad software all the time. Anyone who has raged against their enterprise travel booking machine or tried to decipher the interface to their corporate tool for logging employee feedback knows what I’m talking about. Despite these problems, we continue to use (and, let’s be honest, write) bad software.

Yet when it comes to large language models, ChatGPT, and other aspects of our generative AI (genAI) universe, we don’t seem to accord the same level of patience. As developer Simon Willison notes, “While normally you would see people complain about how hard software is to use, in this case [of genAI], people having trouble getting good results instead assume that it’s actually useless and give up on it entirely.”

Are we holding generative AI to an unrealistic standard?

Inflated expectations

The answer is clearly yes, but who is to blame? Pretty much everyone. From the people who fear AI-driven machines will take our jobs, to vendors AI-washing their tired products, to the media looking for interesting content, to [insert demographic here], we’ve collectively come to expect too much from AI, both good and bad.

In the case of genAI, this has led proponents to overlook or soft-pedal some of genAI’s obvious shortcomings. Bill Gates, for example, has an incredibly ambitious vision for where genAI is going that seems divorced from even the most optimistic present-day reality. Such hype helps no one and makes it harder to tackle some of genAI’s core problems.

For starters, as Amelia Wattenberger argues, chat is a strange, unintuitive way to discover genAI’s smarts. As she notes, things like ChatGPT “greet” users with a text box, with no real guidance on what to type into the box and, essentially, no visibility into why it responds in a certain way. She continues, “Of course, users can learn over time what prompts work well and which don’t, but the burden to learn what works still lies with every single user.”

Compounding this problem, researchers Zamfirescu-Pereira, Wong, Hartmann, and Yang claim, “Even for [natural language processing] experts, prompt engineering requires extensive trial and error, iteratively experimenting and assessing the effects of various prompt strategies on concrete input-output pairs before assessing them more systematically on large data sets.” We’re all trying to figure out how to create inputs that yield great output, and we’re mostly failing.

It doesn’t help that the industry has been moving so fast, as Benj Edwards points out: “Whatever techniques you develop to use [large language models] well [are] obsolete in three to four months.” Surely vendors like OpenAI could be baking more guardrails into their products, making it easier for non-experts to become productive and eliminate some of these UX issues.

These teething pains with genAI, however, don’t warrant the conclusion that it’s either all hype or that it doesn’t work.

Practical reality

The friction inherent in ChatGPT and other genAI tools is real, as Sebastian Bensusan details in a friction log, but also solvable. And some of that “solving” comes down to user experience. Yes, the tools can and should bake more smarts into the interface, but it’s also true that one key way to get more value from genAI is to keep practicing until we figure out where its sharp edges are.

Few have more experience with this than Willison, who suggests, “To get the most value out of [large language models]—and to avoid the many traps that they set for the unwary user—you need to spend time with them and work to build an accurate mental model of how they work, what they are capable of, and where they are most likely to go wrong.” Yes, the tools need to improve, but this doesn’t eliminate the need for users to get smarter and savvier as well.

For those inclined to dismiss genAI because it’s hard, I’d urge patience and practice, as Willison does. As he concludes, genAI “can be flawed and lying and have all [sorts of] problems … and it can also be a massive productivity boost.”

Getting through the awkward toddler phase of generative AI

Our unrealistic expectations of genAI are like hoping a two-year-old will calmly act like an adult. Try patiently experimenting with prompts and spending ‘quality time’ with this developing technology.

Inflated expectations

Practical reality

More from this author

AI’s moment of disillusionment

Your generative AI project is going to fail

AI is in the tire-kicking phase

JavaScript needs more money

We need a Red Hat for AI

AI supply is way ahead of AI demand

AI needs adult supervision

The AI revolution will take time

Most popular authors

Show me more

Beyond the usual suspects: 5 fresh data science tools to try today

Generative AI won’t fix cloud migration

HR professionals trust AI recommendations

How to use dbm to stash data quickly in Python

How to auto-generate Python type hints with Monkeytype

How to make HTML GUIs in Python with NiceGUI