SearchGPT hallucinations might make you miss your favorite musical festival
On July 26, 2024, OpenAI unveiled SearchGPT, a prototype search engine that melds traditional search functionalities with the advanced capabilities of generative AI.
SearchGPT is set to challenge Google’s dominance in the online search market, with OpenAI planning to eventually incorporate it into ChatGPT.
The service, powered by the GPT-4 family of models, will initially be available to only 10,000 test users, according to OpenAI spokesperson Kayla Wood, who shared the information with The Verge.
But even before users have had a chance to test it out, SearchGPT appears to be misinforming.
OopsGPT
In a prerecorded demo video released alongside the announcement, a mock user types “music festivals in Boone, North Carolina in August” into the SearchGPT interface.
The tool generates a list of festivals supposedly taking place in Boone that month, with the first being An Appalachian Summer Festival.
According to SearchGPT, this festival hosts a series of arts events from July 29 to August 16. However, anyone in Boone looking to buy tickets to one of these concerts would encounter issues.
The festival actually began on June 29 and concludes with its final concert on July 27. The dates from July 29 to August 16 refer to the period when the festival’s box office will be officially closed, as confirmed by the festival’s box office.
Citing sources
Then there’s source attribution. A 2024 study found that current LLMs, including those with RAG capabilities, often don’t support their responses with accurate and relevant sources. What’s worse, they tend to make stuff up, attributing fake quotes to real people and incorrectly summarizing stories.
In recent months, several news outlets and media organizations, such as the New York Times, the Chicago Tribune, the Intercept, and various local papers, have initiated legal action against OpenAI. They claim that the company unlawfully trained its AI models on their published content without permission or compensation, thereby profiting from protected material and effectively plagiarizing their work.
The core issue
Despite the enthusiasm surrounding searchbots, nearly every attempt to create an AI-based search engine encounters problems.
Fundamentally, these language models operate by predicting the most likely next word in a sentence. Unlike humans, they lack true comprehension of concepts like dates on a calendar or geographical locations.
As a result, their predictions often contain errors, producing answers with “hallucinations,” or false information. This issue is not merely a glitch to fix but is inherent to the way these prediction-based models work.
This highlights a major issue with tech companies’ predictions about AI transformation: Chatbots are expected to revolutionize the internet and eventually the physical world. Yet, currently, they struggle to accurately copy-paste information from a music festival’s website.
Want a generative AI tool that can accurately cite its sources? Try Seraf.