Now a group of Google researchers are publishing a suggestion for a radical design as well discarded the ranking method and replaced it with a larger model of language AI, such as BERT or GPT-3—Or a future version of them. The idea is that instead of searching for information on multiple lists of web pages, users will ask questions and have a language model trained on the pages that will directly answer them. The approach can change not only how search engines work, but what they do-and how we interact with them.
Search engines are becoming faster and more accurate, even as the web explodes in size. AI is now used to rank outcomes, and Google uses BERT to understand search queries better. Even under these tweaks, all major search engines still work the same way they did 20 years ago: web pages are indexed by crawlers (software that reads on the web non-stop and keeps a list of everything it finds), results that match a user’s query are gathered from this index, and the results are sorted.
“This index-retrieve-then-rank blueprint has withstood the test of time and is temporarily challenged or seriously considered,” wrote Donald Metzler and his colleagues at Google Research.
The problem is that even the most advanced search engines today still respond to a list of documents that include the information requested, not the information itself. Search engines are also not good at answering questions that require answers taken from multiple sources. This is so if you ask your doctor for advice and receive a list of articles to read instead of a straightforward answer.
Metzler and his colleagues are interested in a search engine that acts like a human expert. It should produce natural language responses, blended from more than one document, and back up its responses with references to supporting evidence, as Wikipedia articles intend to do.
The great speech models took us about the road there. Trained on most of the web and hundreds of books, GPT-3 draws information from multiple sources to answer natural language questions. The problem is that it does not trace those sources and cannot provide evidence for its answers. There’s no way to know if GPT-3 is parracting reliable information or disinformation-or simply spreading its own nonsense.
Metzler and his colleagues call speech models dilettantes— “They know a lot is known but their knowledge is deep in the skin.” The solution, they claim, is to build and train future BERTs and GPT-3s to keep records of where their words come from. Such models have not yet been developed, but are possible in principle, and there is early work in that direction.
There have been decades of progress in a variety of search areas, from answering questions to summarizing documents to creating information, said Ziqi Zhang of the University of Sheffield, UK, who studies information retrieval in web. But none of these technologies fix the search because each of them addresses specific problems and they don’t become a concern. The exciting thought in this role is that big speech models are able to do all things simultaneously, he said.
However Zhang said that the speech models are not well done in technical or specialist subjects because there are few text examples they have trained. “There’s hundreds of times more e-commerce data on the web than data about mechanics as a whole,” he says. Today’s language models are also leaning toward English, leaving non -English parts of the web serviceless.
However, Zhang accepted the idea. “It wasn’t possible in the past, because the big language models just passed away,” he said. “If it works, it will change our search experience.”