Google Gets A Second Brain, Changing Everything About Search

In the 1983 sci-fi/comedy flick The Man with Two Brains, Steve Martin played Michael Hfuhruhurr, a neurosurgeon who marries one of his patients but then falls in love with the disembodied brain of another woman, Anne. Michael and Anne share an entirely telepathic relationship, until Michael’s gold-digging wife is murdered, giving him the opportunity to transplant Anne’s brain into her body.
Well, you may not have noticed it yet, but the search engine you use every day—by which I mean Google, of course—is also in the middle of a brain transplant. And, just as Dr. Hfuhruhurr did, you’re probably going to like the new version a lot better.
You can think of Google, in its previous incarnation, as a kind of statistics savant. In addition to indexing hundreds of billions of Web pages by keyword, it had grown talented at tricky tasks like recognizing names, parsing phrases, and correcting misspelled words in users’ queries. But this was all mathematical sleight-of-hand, powered mostly by Google’s vast search logs, which give the company a detailed day-to-day picture of the queries people type and the links they click. There was no real understanding underneath; Google’s algorithms didn’t know that “San Francisco” is a city, for instance, while “San Francisco Giants” is a baseball team.
Now that’s changing. Today, when you enter a search term into Google, the company kicks off two separate but parallel searches. One runs against the traditional keyword-based Web index, bringing back matches that are ranked by statistical relevance—the familiar “ten blue links.” The other search runs against a much newer database of named entities and relationships.
Type in the query “Philadelphia,” and this second search will produce a new “knowledge panel” in the right-hand margin of the results page, complete with a map and other basic facts about the city William Penn founded. (Hedging its bets, however, Google will also include a thumbnail of the movie poster from the 1993 Tom Hanks film Philadelphia.) To use Google’s own description, the new database helps the search engine understand “things, not strings.”
This second brain is called the Knowledge Graph. English speakers began to see the newly supercharged searches that it powers back in May, and last week the service was rolled out to speakers of seven other languages. But the story behind the knowledge panels goes back to mid-2010, when Google bought a San Francisco search startup called Metaweb Technologies and decided to use its massive semantic database, called Freebase, as the nucleus for its own project to approximate the way humans understand the world.
Metaweb’s creation doesn’t boil down to a collection of guesses about how documents are related to one another, the way Google’s other databases do. Rather, it’s a human-curated encyclopedia of verified facts about things in the world and relationships between them—more than 570 million things and 3.5 billion relationships, at last count. (Philadelphia is a city that’s part of a state that’s part of a nation; it has a known population, a typical set of weather patterns, et cetera.)
While the knowledge panels are the most visible manifestation of the Knowledge Graph, the new information is helping to order and rationalize almost everything else Google does. The consequences will be sweeping. While true AI is still a long way off, the Knowledge Graph represents a shortcut to a time when software will be better at meeting, or even anticipating, our information needs. In essence, Google’s engineers are building toward a future when the company’s famous “I’m Feeling Lucky” option is all you need, and the search engine returns the right result the first time, every time.
Amit Singhal, Google's top search guru
Amit Singhal, Google's top search guru
“This is a baby step toward having an understanding computer,” says Amit Singhal, a senior vice president of engineering at Google and the man with ultimate responsibility for improving Google’s core search algorithms. “Now, when you encounter the letters T-A-J-M-A-H-A-L on any Web page, the computers suddenly start understanding that this document is about the monument, and this one is about the musician, and this one is about a restaurant. That ‘aboutness’ is foundational to building the search of tomorrow.”
In a recent interview with Fortune, Google CEO Larry Page said he’s been pushing for such changes for the last 10 years. “The perfect search engine would really understand whatever your need is,” Page said. “It would understand everything in the world deeply, give you back kind of exactly what you need.”
Of course, Google (NASDAQ: GOOG) isn’t just a search engine—it’s also an advertising marketplace that generated $37 billion in revenue last year, and a media platform (YouTube), and a provider of cloud computing services (Gmail and Google Drive), and the leading maker of browser software (Chrome) and mobile operating systems (Android). Having a search engine that understands “aboutness” at its center will alter this whole empire. There are few hints so far of exactly how, but the changes will likely be at least as far-reaching as previous large-scale overhauls of the company’s core technology.
Principal engineer Shashi Thakur compares the Knowledge Graph project to the introduction of universal search in 2007. That change, which made it possible for the first time for users to search Web pages, videos, maps, images, and books from a single unified interface, resulted in a huge boost to Google’s overall traffic. Ultimately, the Knowledge Graph project could have an even “larger possible strategic impact” than universal search, Thakur says.
Statistical Magic Hits a Limit
In the search business, you couldn’t ask for a better pedigree than Amit Singhal’s. He studied at Cornell with information retrieval pioneer Gerry Salton, who’s been called “the father of digital search” and was himself a student of Howard Aiken, the Harvard professor who designed IBM’s first computer in 1944. After Google recruited Singhal away from AT&T Labs in 2000, his first job was to rewrite Sergey Brin’s original ranking algorithms to go beyond PageRank and
take many new types of relevancy signals into account. The improvements were so dramatic that Singhal was later made a Google Fellow and awarded a prize “in the millions of dollars,” according to journalist Steve Levy’s account of Google’s early years, In the Plex.
Still, despite these accomplishments, Singhal says the history of search is basically one big kludge designed to simulate actual human understanding of language.
“The compute power was not there and various other pieces were not there, and the most effective way to search ended up being what today is known as keyword-based search,” Singhal explains. “You give us a query, we find out what is important in that query, and we find out if those important words are also important in a document, using numerous heuristics. This process worked incredibly well—we built the entire field of search on it, including every search company you know of, Google included. But the dream to actually go farther and get closer to human understanding was always there.”
After his initial rewrite of Google’s relevance algorithms, Singhal went on to tackle other problems like morphological analysis: figuring out how to reduce words like “runner” and “running” to their roots (“run,” in this case), in order to perform broader searches, while at the same time learning how to sidestep anomalies (apple and Apple obviously come form the same root, but have very different meanings in the real world). Universal search came next, then autocomplete and Google Instant, which begins to return customized search results even before a user finishes typing a query. (Type “wea,” for example, and you’ll get a local weather forecast.)
“But throughout this process, one thing always bothered us,” Singhal says. “It was that we didn’t ever represent the real world properly in the computer. It was still all a lot of statistical magic, built on top of runs of letters. Even though it almost looked like an incredibly intelligent computer, and we did it far better than anyone, the truth was it was still working on strings of letters.”
This frustration wasn’t just a matter of intellectual aesthetics. Singhal says that by 2009 or 2010, Google had run up against a serious barrier. The goal of the company’s search engineers had always been to connect users with the information they need as efficiently as possible. But for a large group of ambiguous search terms, statistical correlations alone couldn’t help Google intuit the user’s intent. Take Singhal’s favorite example, Taj Mahal. Is the user who types that query searching for the famous mausoleum in Uttar Pradesh (Singhal’s home state), the Grammy-winning blues musician, or the Indian restaurant down the street? Google’s engineers realized that using statistics alone, “we would never be able to say that one of those [interpretations] was more important than the other,” Singhal says.
“I’m very proud of what we achieved using the statistical method, and we still have huge components of our system that are built upon that,” Singhal says. “But we couldn’t take that to the system that we would all want five years from now. Those statistical matching approaches were starting to hit some fundamental limits.”
What Google needed was a way to know more about all of the world’s Taj Mahals, so that it could get better at guessing which one a user wants based on other contextual clues such as their location. And that’s where Metaweb comes into the story. “They were on this quest to represent real-world things, entities, and what is important, what should be known about them,” says Singhal. When Google came across the startup, it had just 12 million entities in its database, which Singhal calls “a toy” compared to real world. “But we saw the promise in the representation technology, and the process they had built to scale that to what we really needed to build a representation of the real world.”
The Database of Everything
Metaweb Technologies has a fascinating history of its own. The company was born as a 2005 spinoff of Applied Minds, the Glendale, CA-based consulting firm and invention factory founded five years before by former Disney R&D head Bran Ferren and former Thinking Machines CEO Danny Hillis. John Giannandrea, a director of engineering at Google who was Metaweb’s chief technology officer, says the idea behind the startup was to build “a machine-readable encyclopedia” to help computers mimic human understanding.
“If you and I are having a conversation, we share a vocabulary,” says Giannandrea, who came to Metaweb after CTO roles at Tellme Networks and Netscape/AOL. “If I say ‘fiscal cliff,’ you know what I mean, and the reason is that you have a dictionary in your head about ideas. Computers don’t have that. That’s what we set about doing.”

Comments

Popular posts from this blog

How to prepare your PC for the Windows 10 upgrade Source: WC

Top 5 Japanese Anime (Cartoons)

TOP FREE ANTIVIRUS