The Limits of Searching the Web

Let's consider a few important points.

  1. No search tool in existence today keeps track of all of the information on the World Wide Web. The most popular estimates indicate the two largest search tools cover about 27% and 25% of the web respectively. (These estimates are based on 550 million total web pages, which is probably low. A more accurate, but unverifiable, number is probably 700 million.) The search tools frequently miss information on large web sites, like most of the Federal government sites, and also sites that are updated frequently like the CNN web site.

  2. Not all of the information in the world is available on the World Wide Web. To make this a little more concrete, take a look at the following approximation. If you take the first 25 pages from every book in the US Library or Congress, about 20 million books, you would have the rough equivalent of the information available on the Web. But this number corresponds to a very small percentage of the total number of book pages. So the entire content of the web is equivalent to a fraction of the books in the Library of Congress. The Library of Congress has an additional 80 million items like drawings, photographs and manuscripts. If we added additional information sources from outside the Library of Congress, including other books, personal pictures, newspapers, audio and video recordings the difference between the web content and total information is astronomical.

  3. Searching is not always "real time". Some of the search databases can even be up to a month old depending on when they are updated.

  4. Information found on the Web can be outdated or removed from a website.

  5. Some of the information on the Web is incorrect and/or fabricated. At this point in the evolution of the Web, there is no standard for validity. Keep in mind that there is a lot of information on the Web that is someone else's opinion.

       It's a good idea to keep these points in the back of your mind, because they are important to remember, however, they should not discourage you from using the Web. The flip side of this situation is that the current and easiest to find contents of the Web include the most popular and useful information. The complete works of Shakespeare and the periodic table are available online, but if you want a copy of the story I wrote in 5th grade, you would have to go look in my parents attic. Compared to the total mass of human information, the contents of the web may seem a bit limited. But 550 million Web pages, is a lot of information and it is increasing everyday. In fact, there is so much information that you need special tools to keep it all organized and help find the piece of information that you are looking for, thus, this tutorial!!