Going Beyond the Web’s Surface

Going Beyond the Web’s Surface
by Reid Goldsborough

March 15, 2006

The deep Web, also called the invisible Web and the hidden Web, has an aura about it of secrecy and transcendent mystery, conjuring up images of private caches of supremely useful information beyond the reach of mortal Web surfers.

The reality is much more pedestrian.

The deep Web simply consists of information that’s accessible over the Web but that can’t be found through ordinary search tools such as Google and Yahoo!. These search engines can’t find it for two main reasons: It’s stored within databases and can be retrieved only by using a particular site’s search tool, or it resides at sites that require registration or subscription.

How much information resides below the Web’s gleaming surface depends on who you believe. One number frequently repeated on the Web is that the deep Web is about 500 times bigger than the surface Web, but this number comes from BrightPlanet Corp. (http://www.brightplanet.com), a company that sells one of the programs out there for accessing it. Another estimate is that the deep Web is about twice as big.

Deep Web information is usually narrow and specialized. Nuclear Explosions Database (http://www.ga.gov.au/oracle/nukexp_form.jsp) is typical. A free offering of the Australian government, it lets you search for the location, time, and size of nuclear explosions worldwide since 1945.

Other examples of deep Web information include data found in professional directories and phone books, laws and patents, items for sale at a Web store or Web auction site such as eBay, archived magazine and newspaper articles, job postings, and stock and bond prices.

The best way to get a feel for the deep Web and what it can do for you may be to manually go to several of the database sites used to store much of its information. Unlike regular Web sites, these database sites create pages on the fly based upon what you search for upon arriving, which is the reason Google and Yahoo! can’t find this information.

CompletePlanet.com (http://www.completeplanet.com), from BrightPlanet, is a directory of more than 70,000 searchable databases. You can’t search through all of the databases simultaneously, but you can search for appropriate databases and then search through them individually.

Though useful, CompletePlanet.com hasn’t been updated since 2004. Much information on the Web about the deep Web is even older, with many links no longer working and sites no longer existing. This is a common problem in general when using the Web for research. Always check for a “date last updated” notice to help ensure that whatever page you’re reading doesn’t include old and obsolete information.

Another frequently recommended site for accessing deep Web sites is InfoMine (http://www.infomine.ucr.edu), an offering from the University of California–Riverside with federal government support. It’s maintained by librarians and is designed for university-level research. Many of the databases it accesses are fee-based compilations of articles in scholarly journals.

Yahoo is currently testing a tool to let you quickly get at information stored at multiple pay sites. With Yahoo Subscriptions (http://www.search.yahoo.com/subscriptions), you can currently search through nine subscription sites, including Consumer Reports, The Wall Street Journal, New England Journal of Medicine, and LexisNexis. You’ll need to have paid a subscription to any given site, however, to fully access its information.

Google has also made strides in helping people access deep Web information. The information stored in PDF files, created by Adobe Acrobat, used to be considered part of the deep Web, for instance. But ever since Google started indexing such documents, this material has migrated from the deep Web to the surface Web.

One of the more intriguing deep Web tools is Turbo10 (http://www.turbo10.com). It lets you search through nearly 1,000 deep Web and other sites by typing in a search query once, just as with Google or Yahoo!. You can optionally create your own sublist of these sites and search only through them, which can be helpful if you repeatedly do similar types of searches. The brains behind this advertising-supported site are Nigel and Megan Hamilton, a brother-and-sister team in London.

Much deep Web information resides in U.S. government databases (the U.S. government is the world’s largest publisher). FirstGov (http://www.firstgov.gov) is a searchable portal to such government data as economic forecasts, industry reports, government regulations, and new legislation.

ScienceGov (http://www.science.gov), a part of FirstGov, is a searchable portal to scientific papers and technical data generated by 17 U.S. government science organizations within 12 different federal agencies.

Depending on your purposes, accessing the deep Web can be an important part of any given search strategy.

Reid Goldsborough is a syndicated columnist and author of the book Straight Talk About the Information Superhighway. He can be reached at reidgoldsborough@gmail.com or reidgold.com.

Back to top