How Google produces relevant search results

Google
Google's speed and accuracy have made it the number one search engine on the web

Search engines can seem like magic. You type what you're looking for into a text box, and – ping! – it appears mere milliseconds later. Of course, we all know that Google relies on more than a magic wand to produce relevant search results; but what exactly does it use instead?

Well, any search engine has three distinct phases: spidering the web to find and read web pages, indexing the web pages found and responding to user queries to produce a set of ranked results.

Figure 1: A forward index lists all of the words that are found in a particular document

Here I've named the documents as numbers, and in practice we'd have another table that cross referenced these numbers to the actual URL for the document. The next step is to invert this table; after all, for a search we're going to be given a word, and we have to identify the documents that contain this word.

The word should be the key, and at the moment the table is keyed by document number. What we end up with is a table as shown in Figure 2. The table is keyed by word, and the data for each record in the table is a list of document numbers. This structure is known as an inverted file. Starting from a table of words per document, we end up with a table of documents per word.

Figure 2: This inverted index is a list that shows which documents contain a certain word

TOPICS