[ONLINE]
feature

Humans Do It Better:
Inside the Open Directory Project

Chris Sherman

ONLINE, July 2000
Copyright © 2000 Information Today, Inc.

Subscribe

"They could hear Circe within, singing most beautifully as she worked at her loom, making a web so fine, so soft, and of such dazzling colors as no one but a goddess could weave."

Homer, The Odyssey, Book X

Webs fascinate us. Just as Odysseus' men were dazzled by the allure of Circe's web, so are we powerfully drawn to understand and map the varied and intricate strands of the World Wide Web. Virtually all who come in contact with the Web find themselves to one degree or another creating guides to share its enchantments with others. Ranging from simple lists of links to massive indexes compiled by software driven search engines, the Web seems to bring out the inner librarian in us all.

Unfortunately, many who take on the daunting task of cataloging the Web find themselves hopelessly ensnared--like Odysseus' men, who succumbed to Circe's enticements only to find themselves transformed into swine. The crew of the Open Directory Project (ODP) has avoided this unfortunate fate, and in the process have seen their efforts gain an unprecedented acceptance. Little more than two years old, the ODP is leading a resurgence of human-compiled Web directories, in the process, toppling spider-compiled search engines from their dominant positions as principal gateways to the Internet.

The ODP has succeeded by coupling a simple, yet elegant, Web directory to the engine of a grassroots marketing campaign that has won the hearts and captured the minds of millions of Web users. The ODP's straightforward Web directory needs little introduction or explanation. Its workman-like interface and competent search capabilities make it an appropriate tool for searchers at nearly all levels of competence. The truly interesting part of the ODP story lies, not in its mechanics, but in its genesis and evolution, and the behind the scenes goings-on in the inner sanctum of the project.

THE HISTORY OF THE OPEN DIRECTORY PROJECT

By 1997, however, it became increasingly apparent...that Yahoo!'s management was being lured by the siren call of ecommerce and "community."
Like many early adopters of the Web, ODP co-founders Rich Skrenta and Bob Truel were ardent Yahoo! users during the mid-90s. Yahoo!'s directory was tiny compared to the indexes built by search engine rivals Excite, Lycos, and AltaVista, but its hierarchical structure and well-designed taxonomy made Yahoo! a preferred tool for finding things on the then nascent Web.

By 1997, however, it became increasingly apparent to Skrenta and Truel that Yahoo!'s management was being lured by the siren call of ecommerce and "community." As Yahoo!'s management spent billions buying up companies and expanding their services to include shopping, personal home pages, and local community sites, the core directory languished, at least in the eyes of many veteran users. "Link rot" had set in, resulting in many "not found" messages. Compounding the problem, the Yahoo! directory was simply not keeping pace with the explosive growth of the Web.

Chris Tolles, Director of Marketing for the Open Directory Project, recounts how Skrenta and Truel's frustration with Yahoo! led to the birth of the ODP. "Looking at the consternation of the Web community over the extreme difficulty of being listed in Yahoo! and how crufty their directory was becoming (large amounts of dead links, increasingly irrelevant information, etc.), it was clear that there was an opportunity to make things better," says Tolles.

Skrenta and Truel reasoned that Yahoo!'s problems might simply be the result of inadequate resources being allocated to the directory. After all, with millions of new pages being added to the Web, and existing pages being moved around or deleted by legions of unwitting Webmasters who didn't realize they were breaking directory listings, how could Yahoo!'s relatively small staff of "surfers" be expected to keep up? In early 1998, Skrenta and Truel experienced an epiphany. The pair were working as engineers at Sun Microsystems and had seen first-hand how successful the "Open Source" model had been in bootstrapping a wide range of software projects with little investment or expense. In particular, Gnu, the open source UNIX platform had caught on like wildfire, and its success spurred other open source initiatives. Apache, the most popular Web server software, was open source. And Netscape had just taken the bold step of freely licensing the source code of its Communicator browser, hoping to marshall the combined efforts of the open source community to help win its "browser war" with Microsoft.

The idea was born: Instead of standing by in frustrated anger, why not use open source tactics to build a Web directory? "Bob Truel came up with the hook--get the Web community itself to work on the directory," says Tolles. Thus, the pair began work on the open source Web directory. Initially named "Gnuhoo" (a cross between Gnu and Yahoo!), the service went live June 5, 1998. Within a month, it had attracted attention from Wired magazine, Red Herring, and Search Engine Watch.

By June 18, Gnuhoo claimed that over 200 editors were working on the project, having listed 27,000 sites in over 2,000 categories. The project was clearly touching a nerve in the Web community. And its success wasn't going unnoticed by its larger competitors. "We were approached by three major portals, and by October, had been acquired by Netscape--five months after putting the service up," says Tolles.

Gnuhoo's name evolved as quickly as its Web directory. When a firestorm of controversy on the influential news source Slashdot alleged that the Gnuhoo name didn't comply with the terms of the GNU General Public License, the directory changed its name to "Newhoo." By the time it was acquired by Netscape, the service had grown to 100,000 listed sites in 2,500 categories, with a volunteer force of 4,500 editors. Rather than risk the ire of deep-pocketed competitor Yahoo! with its new directory with a similar-sounding name, Netscape scuttled Newhoo in favor of the bland, but egalitarian, Open Directory Project (ODP).

But the whimsical nature of the ODP's initial name was not entirely lost. When Netscape spun its browser code out into the open source community, its official site became Mozilla.org. Mozilla had been the code name for the first Netscape browser, a combination of Mosaic, the browser the Netscape team had authored while they were college students, and Godzilla. In a nod to both Netscape's and Gnuhoo's initial incarnations, the ODP informally became known as the Mozilla Directory and its official site became dmoz.org, which remains its current home on the Web.

ODP EVERYWHERE

With the number of major sites using its data, ODP's reach of potential users now comes close to rivaling Yahoo!'s.
Tapping into the Web's vast reservoir of talent to help construct and maintain the directory was just part of the ODP's success equation. Taking yet another lesson from the open source movement, the ODP made its most important strategic move: The directory data would be freely available to all takers. The move was risky. What if a major search service took the directory data and used it to (horrors!) make money by serving ads in the results? Would the directory's idealistic volunteer editors abandon the search service they had worked so hard to build?

Initially, beyond the obligatory presence in Netscape's Netcenter portal, only a few small Web sites integrated ODP data into their search results. Then, on April 16, 1999, Lycos announced that it would boldly feature ODP results, even giving them prominence over results drawn from its own spidered index. HotBot would also use ODP data. At this point, the ODP had grown to more than 8,000 editors, with 430,000 sites and 65,000 categories.

There was surprisingly little protest from the volunteer corps of editors. In fact, the rate of growth of volunteer editors began to grow dramatically, so much so that more stringent selection and screening processes were put into place. The reason for the spurt in growth? A brilliant provision in the terms of the ODP license agreement that required all users of directory data to provide an attribution statement on every page. And this attribution statement included links not just to ODP's home page, but also to the "Become an editor" application page.

This master stroke turned the "free" use of data into one of the most effective "viral" marketing campaigns ever unleashed on the Web, attracting thousands of new editors who dramatically increased the size of the directory. In short order, AltaVista, AOL, DirectHit, Dogpile, Euroseek, and dozens of other search engines licensed the use of ODP data. Today, the ODP boasts more than 22,000 editors, with more than 1.5 million sites in over 240,000 categories. With the number of major sites using its data, ODP's reach of potential users now comes close to rivaling Yahoo!'s. And, according to its own internal measurements, the directory contains more links than Yahoo! and is growing at a considerably faster rate. The service expects to have more than two-million links by mid-summer 2000.

The ODP is a textbook case of entrepreneurs responding to a perceived failing in an existing service. Skrenta and Truel didn't get mad. They got even.

ODP EDITORS AND QUALITY CONTROL

In time, as the ODP became more mainstream, stricter quality control measures were put into place.
As impressive as the ODP's growth has been, it's only natural to question how a loosely-knit organization with tens of thousands of contributors can maintain strict quality control measures and avoid the Yahoo!-like problems that initiated the effort in the first place. The two fundamental concerns for any Web directory are the knowledge and skill of the editors who compile the directory, and the quality of the links they create.

In the early days, the ODP exercised little formal quality control. Editors simply chose a category and started populating it with links. Similarly, there were few editorial guidelines other than to pick the "best" links for a category. As the directory grew, this laissez-faire approach generated a moderate amount of chaos. In a press release issued in mid-summer of 1998, Tolles was quoted as saying, "This won't be stable and static. I'm sure there will be pissing contests between editors and so forth. But the whole thing is self-governing. It will even itself out."

In time, as the ODP became more mainstream, stricter quality control measures were put into place. "The core concept of quality control for the Open Directory Project is that of peer review," says Tolles. "We grant a very small slice of initial control to a new editor, and they have to prove themselves before they qualify for additional categories within the directory. Also, there are often multiple editors within the same category cross checking each other's work."

The peer review process is supported by various mechanisms, including subject-based forums that are restricted to ODP editors, and email between editors and the hierarchy. "The editors are pretty damn tough on each other," according to Steven Kassel, editor of more than two-dozen ODP categories. "The forums are for questions to and from editors. Just reading a day's worth of posts make it clear that quality is the number one concern and that editors are primarily interested in doing a good job," says Kassel. The forums have had more than 100,000 posts in the year that they have been available to editors, according to Tolles.

Some editors impose strict guidelines on their own selection process. Kassel, who edits finance and tax categories, says "I am also careful not to add sites for which the owner does not appear to carry a valid license. If someone purports to represent individuals with tax problems, they must be licensed as an EA, CPA, or attorney. When it is clear that the site is not owned by a licensed individual, I exclude it or move it to a more appropriate location."

The selection process for accepting new editors for the directory has also become more rigorous. "The editorial application process is indeed selective, and we are currently accepting less than 20% of the applications we get. With the reach our directory enjoys, we're getting a better and better applicant pool," says Tolles. "Selection criteria include, but are not limited to, the number of editors in the category at the time of application, the ranking in the hierarchy, the qualifications listed, and the quality of the application."

To address the link rot problem that plagued Yahoo!, the ODP has instituted an automated link checking system. "We have a crawler that is run periodically which tells editors that there are sites which seem to be down, and need to be looked at--which is why we are consistently below 1/2 of 1% dead links in our directory," says Tolles.

INEVITABLE PROBLEMS

The first challenge will be maintaining its autonomy now that it is part of the huge AOL-Time Warner complex.
Naturally, for a service that enjoyed grassroots support and now is now an important player in the world of Web search, not everyone is happy with the ODP. At the Search Engines Strategies conference in San Francisco in November 1999, a conference participant claimed that one of her business competitors became an ODP editor and then maliciously deleted all of her company's listings in the directory. Others have raised similar charges in online discussion groups, such as the I-Search mailing list.

But the charges are hard to substantiate. "I have never seen this kind of behavior in the area I edit. ODP has log files and monitoring mechanisms in place to review complaints," says editor Robert Hoffman. Former editor Rick Bier concurs. "If an editor is giving his site preferential treatment and the meta editors became aware of that practice, the editor in question will soon be an ex-editor," says Bier.

Complaints about the ODP can also be found on the Web. Writer Andrew Goodman in "Why the Open Directory Isn't Open" takes an interesting, but controversial, position targeting the policies and motives of the ODP. And criticism of the ODP can also be found in other public Web forums, such as Deja.com, Search Engine Forums, and Alexa's reviews of the ODP.

But many of these criticisms read like they were penned by former editors, or by Webmasters whose pages were rejected by ODP editors. The ODP doesn't appear to be making any official effort to stifle critical comments--indeed, a link to the Alexa reviews can be found on the ODP site with this even-handed annotation: "Some reviews biased by reviewers being rejected as editors or already being editors."

On balance, praise for the directory far outweighs criticism. "We have very few problems, relatively speaking," says Tolles. "The goal of us on the staff side is to improve the system of checks and balances, rather than get in the middle of editorial discussion." Perhaps the greatest testament to the quality of ODP data comes from companies who are building innovative search engines that rely on extending and enhancing its value. Two notable examples are Oingo and Google.

GROWING IN THE COMPANY OF GIANTS

In two short years, the Open Directory Project has grown from a scrappy grassroots movement to a bona fide Web powerhouse. Though the challenges it faced in its formative years were certainly daunting, the ODP must now contend with two new major challenges to continue to survive and thrive. The first challenge will be maintaining its autonomy now that it is part of the huge AOL-Time Warner complex. The preliminary signs are encouraging: AOL has integrated ODP data into its own search service with impressive results. Nonetheless, political, financial, and marketplace pressures may have a disruptive effect on the current ODP structure.

"AOL has been very supportive of the Open Directory running with an independent editorial policy," says Chris Tolles. "We have been very successful with the Open Directory model, and have no plans to change the way things are working. The Open Directory will continue to operate as it does today benefitting AOL, and the rest of the Web, by succeeding in the mission of building the most comprehensive directory of the Web by utilizing a legion of contributors from the Web community," says Tolles.

Noble words, but AOL's legal problems with "volunteers" in the past have led to dramatic changes and scaling back the use of unpaid help elsewhere in AOL's vast collection of online properties. And one only needs to look at what happened to Infoseek once it was subsumed into the Go/Disney conglomerate to envision a similar threat to the ODP. Once heralded as an integral part of the Disney empire, Go/Infoseek has recently scaled back, shifting focus from being a general-purpose portal to concentrate instead on the far more limited "entertainment content," presumably more important to the parent company. The ODP enjoys autonomy now, but it's part of a parent company that continues to grow, refine, and importantly, discard business ideas when they are no longer meaningful to the overall business model.

The other major challenge the ODP faces is in scaling up with the explosive growth of the Web. This is no easy task, even for powerful spider-compiled search engines, let alone a directory that relies on the relatively pokey pace of humans to catalog the Web. Again, Tolles is confident that the ODP is up to the challenge. "Our mission is to build the most comprehensive directory of the Web, and it's a cop out (or an admission of a failed model) to not have your directory useful for the ever increasing scale of the Web," says Tolles. To help accomplish this goal, several infrastructure improvements to the back end of the directory are being made, together with continual improvements for the tools used by editors to maintain the directory. These improvements "will keep quality high, but improve the efficiency of the system, which will help scale the project," says Tolles.

Given the ODP's phenomenal success and its demonstrated ability to overcome obstacles that have tripped up larger, more established rivals, it's probably a safe bet that the ODP will not only weather the challenges to its continued success, but will thrive as it assumes its rightful place with the major players in the world of Web Search.


Links

Alexa User Reviews of the ODP
http://reviews.alexa.com/review?type=3&url=dmoz.org:80

Goodman, Andrew. "Why the Open Directory Isn't Open"
http://www.traffick.com/story.asp?StoryID=59

Google Directory
http://www.google.com/directory.html

Oingo
http://www.oingo.com

The Open Directory Project
http://www.dmoz.org

Search Engine Forums
http://www.searchengineforums.com

Sites Using Open Directory Data
http://dmoz.org/Computers/Internet/WWW/Searching_the_Web/Directories/Open_Directory_Project/Sites_Using_ODP_Data/


Innovative Users of ODP Data

Oingo: Meaning Based Search

One of the downsides of ODP data being freely available is that it's a relatively easy job to take the data, slap together a pretty interface, sell a few ads, and go into business as a "new" search service. Dozens of these "new" search engines have popped up and many have no compelling reason to exist other than to generate advertising revenue for their creators. One impressively notable exception is Oingo.

Oingo's difference is immediately obvious: No ads, fancy graphics, newsfeeds, or any other bolted-on "features" that are the hallmarks of most ODP powered portal-wannabes. Oingo's initial screen is a simple search form with links to the top categories of the ODP directory. Oingo's search engine goes beyond simple keyword matching, attempting to understand the meaning of your query by comparing it to the Oingo Lexicon, a rich database of words, meanings, and their relationships.

Results are presented for both matching categories and individual Web sites. A light bulb symbol indicates a "Meaning Hit"--that the result appears to be conceptually related to your search terms. Oingo also prompts you to narrow your search to specific meanings for each of your primary keywords. As with any search engine, the default is all possible meanings of a term. But Oingo also suggests limiters based on specific definitions of the word, and allows you to simply search for the occurrence of the word, not its meaning. This ability to narrow your query based on semantics and specific word meanings is an extremely powerful feature and adds significant value to the core ODP directory data.

Google: Spotlighting "Important" Sites

Google has also endorsed the value of ODP data by using it in the Google Directory. Google applies its PageRank technology to generate results ranked according to "importance," rather than in alphabetical order or by computed relevance. Importance is calculated in part by the quality and quantity of links pointing to a particular Web site. In essence, Google attempts to identify the most highly regarded pages on the Web for any particular topic, and assigns them the highest PageRank scores (see "Organizing the World's Information: Google Raises the Bar on Search Technology" by Jeff Pemberton, pp. 41-48 in the May/June 2000 issue of ONLINE for more information).

Matching directory categories are presented at the top of Google directory results. Clicking on a major category brings up ODP results displayed in PageRank order. For example, searching on the term "airlines" brings up three matching categories. The top category is for commercial airlines, and results are displayed so that the major global carriers appear at the top of the list. This means you see American, United, Delta, and British Airways at the top of the list, rather than an alphabetical list which leads off with Adria Airways, the national airline of Slovenia, followed by Aer Lingus, Aerocaribe, and Aeroflot.

You can also restrict your search to specific categories. By default, when you perform a search from within the directory, your search will be restricted to search only the pages from the category you are searching in, and all of its subpages. The Google directory also suggests "related categories" that contain similar, though not directly related content to the current category.

"The ODP has very useful information," says Sergey Brin, President of Google. "But it's tedious to browse. So we put our technology on top" to make it far easier to get pinpoint results. If you're a fan of Google's search engine, you'll love the Google Directory.

--Chris Sherman


Chris Sherman (websearch.guide@about.com or http://websearch.about.com. He holds an MA from Stanford University in Interactive Educational Technology and has worked in the Internet/ Multimedia industry for two decades, currently as President of Searchwise.net, a consulting firm specializing in search engine optimization and training.

Comments? Email letters to the Editor at editor@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2000, Information Today, Inc. All rights reserved.
Comments