The White Clam Pizza at Frank Pepe Pizzeria Napoletana in New Haven, Conn., is a revelation. The crust, kissed by the intense heat of the charcoal oven, strikes a perfect balance of crisp and chewy. Topped with freshly cut mussels, garlic, oregano and a dusting of grated cheese, it’s a testament to the magic that simple, high-quality ingredients can conjure up.
Sound like me? Is not. The entire paragraph, except for the pizzeria name and city, was created by GPT-4 in response to a simple prompt asking for a Pete Wells-style restaurant review.
I have a few lines. I would never call any food an apocalypse, nor would I describe heat as a kiss. I don’t believe in magic and rarely call anything perfect without using “almost” or some other hedge. But these lazy descriptors are so common in food writing that I imagine many readers barely notice them. I’m unusually attuned to them because every time I commit a cliché in my copy, my editor gives me an earful.
He wouldn’t be fooled by fake Pete. Me neither. But as much as it pains me to admit it, I guess a lot of people would say it’s a four-star fake.
The person responsible for Phony Me is Balazs Kovacs, a professor of organizational behavior at the Yale School of Management. In a recent study, he fed a large batch of Yelp reviews to GPT-4, the technology behind ChatGPT, and asked it to mimic them. His subjects—humans—couldn’t tell the difference between genuine reviews and those generated by artificial intelligence. In fact, they were more likely to believe the AI reviews were real. (The phenomenon of computer-generated imitations that are more convincing than the real thing is so well known that there’s a name for it: artificial intelligence hyperrealism.)
The study of Dr. Kovacs is part of a growing body of research that suggests the latest versions of genetic artificial intelligence can pass the Turing test, a scientifically vague but culturally consistent standard. When a computer can fool us into believing that the language it outputs was written by a human, we say it has passed the Turing test.
It has long been assumed that artificial intelligence would eventually pass the test, first proposed by mathematician Alan Turing in 1950. But even some experts are surprised by how quickly the technology is improving. “It’s happening faster than people expected,” Dr Kovacs said.
The first time Dr. Kovacs asked GPT-4 to imitate Yelp, few were fooled. The prose was very perfect. That changed when Dr. Kovacs instructed the program to use colloquial spelling, emphasize a few capitalized words, and insert typos — one or two in each review. This time, GPT-4 passed the Turing test.
Beyond the limit on machine learning, the ability of AI to sound exactly like us has the potential to undermine any trust we still have in verbal communications, especially the briefest ones. Text messages, emails, comment sections, news articles, social media posts, and user reviews will be even more suspect than they already are. Who would believe a Yelp post about a croissant pizza or a glowing OpenTable dispatch about a $400 omakase sushi tasting, knowing that its author might be a machine that can neither chew nor swallow?
“With consumer-generated reviews, it’s always been a big question about who’s behind the screen,” said Phoebe Ng, a restaurant communications strategist in New York. “Now it’s a matter of what’s behind the screen.”
Online opinions are the grease on the wheels of modern commerce. In a 2018 survey by the Pew Research Center, 57 percent of Americans surveyed said they always or almost always read online reviews and ratings before purchasing a product or service for the first time. Another 36 percent said they sometimes did.
For businesses, a few points in a Google or Yelp star rating can mean the difference between making money and going under. “We live on reviews,” the manager of an Enterprise Rent-a-Car location in Brooklyn told me last week as I picked up a car.
A business traveler who needs a ride that won’t break down on the New Jersey Turnpike might be more affected by a negative report than, say, someone just looking for brunch. But for restaurant owners and chefs, Yelp, Google, TripAdvisor and other sites that allow customers to have their say are a source of endless worry and occasional outrage.
A particular cause of frustration is the large number of people who don’t bother to eat at the place they write about. Before an Eater article pointed it out last week, the first New York location of Taiwan-based dim sum chain Din Tai Fung was being targeted by one-star Google reviews, dropping its average rating to 3.9 from a possible 5. the restaurant hasn’t opened yet.
Some ghost reviewers are more sinister. Restaurants are flooded with one-star reviews, followed by an email offering to remove them in exchange for gift cards.
To fight back against bad faith, some owners enlist their nearest and dearest to flood the zone with positive points. “One question is, how many nicknames do we all have in the restaurant industry?” said Steven Hall, owner of a public relations firm in New York.
A step up from an organized ballot-stuffing campaign, or perhaps a step down, is the practice of trading paid meals or cash for positive write-ins. Beyond that looms the vast and shadowy realm of critics who don’t exist.
To advertise their own businesses or bring their rivals to their knees, companies can hire brokers who have built up small armies of virtual reviewers. According to Kay Dean, a consumer advocate who investigates online review fraud, these accounts are typically given an extensive history of past reviews that act as camouflage for the pay-to-play performance.
In two recent videos, he pointed to a chain of mental health clinics that had received glowing Yelp reviews, apparently submitted by satisfied patients whose accounts were filled with restaurant reviews taken verbatim from TripAdvisor.
“It’s an ocean of falsification and much worse than people imagine,” Ms Dean said. “Consumers are cheated, honest businesses are harmed and trust is eroded.”
All this is done by ordinary people. But, as Dr. Kovacs writes in his study, “the situation is now fundamentally changing because people will not be required to write authentic-looking reviews.”
Ms. Dean said that if AI-generated content infiltrates Yelp, Google and other sites, it will be “even more difficult for consumers to make informed decisions.”
Major sites say they have ways to spot Potemkin accounts and other forms of fakery. Yelp invites users to flag questionable reviews and after an investigation will remove those found to be in violation of its policies. It also hides reviews that its algorithm deems less trustworthy. Last year, according to its most recent Trust and Safety Report, the company boosted its use of artificial intelligence “to even better identify and not recommend less useful and less trustworthy reviews.”
Dr Kovacs believes websites should try harder now to show they are not regularly publishing the thoughts of bots. They could, for example, adopt something like the “Verified Purchase” label that Amazon affixes to listings of products purchased or streamed through its site. If readers become even more suspicious of posted restaurant reviews than they already are, it could be an opportunity for OpenTable and Resy, which only accept reviews from those customers who show up for their reservations.
One thing that probably won’t work is asking computers to parse language on their own. Dr. Kovacs ran the real and grainy Yelp memes through programs that purportedly identify AI Like the test takers, he said, the software “thought the fakes were real.”
That didn’t surprise me. I took Dr.’s research myself. Kovacs, sure I could spot the small, specific details a real diner would mention. After clicking a box to certify that I wasn’t a robot, I quickly found myself lost in a wasteland of wondering and frowning faces. When I got to the end of the test, I can only guess. I got seven out of 20 reviews right, a result somewhere between flipping a coin and asking a monkey.
What bugged me was that GPT-4 didn’t construct its views out of thin air. He stitched them together from bits and pieces of Yelpers’ descriptions of their afternoon snacks and Sunday brunches.
“It’s not completely made up in terms of what people value and what they’re interested in,” said Dr. Kovacs. “What’s scary is that it can create an experience that looks and smells like a real experience, but it’s not.”
Incidentally, Dr. Kovacs told me that he gave the first draft of his paper to an AI editor and took many of his suggestions into the final copy.
It probably won’t be long before the idea of a purely human revision seems strange. Bots will be asked to read over our shoulders, alert us when we’ve used the same adjective too many times, nudge us toward a more active verb. Machines will be our teachers, our editors, our collaborators. They will even help us feel human.