Behind the Screen — The Privilege of Ranking: Google Plays Ball

Online

KMWorld

CRM Media, LLC

Streaming Media Inc

Faulkner

Speech Technology

Other ITI Websites

American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Enterprise AI World Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Unisphere Research

Magazines > Searcher > July / August 2003
Back Index Forward

SUBSCRIBE NOW!

Vol. 11 No. 7 — July/August 2003

COLUMNS
Behind the Screen — The Privilege of Ranking: Google Plays Ball
by Richard Wiggins
Senior Information Technologist, Michigan State University

Here's a riddle: What do undergraduate admissions at the University of Michigan (UM), the college football Bowl Championship Series, and Google have in common?

Lately, I've been springing that riddle on people when we talk about searching. Usually folks scrunch up their faces a bit before attempting to answer. I happened to ask this very question during a presentation at a conference sponsored by the Ann Arbor Computer Society just after the Supreme Court heard arguments on the Michigan admissions case. My audience, mostly hard-core software developers, must have thought a jab from East Lansing was headed their way. (Michigan State University and the University of Michigan have been known to feud, now and again.) On the contrary: It's a riddle for any audience that uses search engines.

For readers of Searcher, this riddle is probably pretty easy — unless you don't happen to pay attention to college football. Here's a hint: Before 1998, the national champion in college football was chosen by polls. The two polls that attracted the attention of sports fans were the Associated Press poll of sportswriters and the USA Today/ESPN poll of leading coaches. The college bowl system was never a tournament; who played in what bowl game depended on prestige and tradition as much as that year's rankings. In many years, the number-one and number-two teams in the nation never met. Some years, the polls named different national champions at the end of it all.

The Bowl Championship Series (BCS) tries to fix all that. The goal of the BCS is to ensure that the best two teams in the nation meet in one bowl game to determine one true champion. The only problem is, unlike NCAA basketball's famous "Road to the Final Four," the BCS still isn't a tournament; in effect, there is a single "round" with one championship game.

So to make this all work, the BCS has to rank the top college teams so that the best two teams in the land meet in one winner-takes-the-championship match. BCS needed a mechanism to pick these two contenders; if a human committee made the selection, someone would always cry foul.

So the BCS came up with a formula — an algorithm — a mathematical scheme — for ranking the best college football teams. It's surprisingly complicated, based on:

• The average of the coaches' and AP media polls.

• An average of the rankings of leading computer-based ranking services, such as The New York Times computer poll and the Sagarin poll.

• Strength of schedule, as measured by the win-loss record of a team's opponents¹.

• Losses: one point demerit for each loss.

• "Quality" wins: bonus points for defeating a top-10 team.

Now, back to our riddle. Figured it out yet?

The Challenge of Ranking

In the innocent days before 9/11, Southwest Airlines used to joke during the safety speech, "In the event of sudden loss of cabin pressure, oxygen masks will appear. Put your own mask on before assisting others. If you are traveling with two small children, please decide now which one you love more."

Sometimes it's hard to make a choice, even among only a handful of alternatives. Anyone who's had trouble deciding what career to pursue or where to go on vacation this summer knows how to do this: You write down all the pros and cons of each alternative on an index card; after all the choices are "scored," you sort among them based upon your evaluation.

The BCS series' challenge is to pick the top two of 117 or so Division I-A teams. At the end of the season, only a few teams contend in most people's minds for the number one spot. Most sports fans, for instance, would not accept a national champion that played a weak schedule or that lost more than one game in the regular season.

So all of the complexity of the BCS formula is brought to bear on what ought to be a very simple problem: Among a handful of contenders, choose which two teams deserve to play in the championship game. The BCS formula attempts to be "scientific." It appears objective, even though human judgment does enter into the formula due to the polling component.

Professional searchers understand that all search engines, Web or otherwise, use — or at least offer to use — relevancy ranking to sort items on a hit list. This is true whether it's a Web search engine such as Google, a legal search engine such as Westlaw, or a full-text scholars' resource such as ProQuest. Each system evaluates each candidate item, assigns each a score based on a series of criteria, and then sorts all the items from the highest- to the lowest-scoring.

Now we're ready to solve our riddle. BCS, UM admissions, and Google all attempt to sort dissimilar items into a list starting with the "best" items on top:

• The BCS selection process picks the top two entries from a list of 117 potential candidates.

• The undergraduate admissions process at the University of Michigan uses a mathematical formula to select about 5,000 students for admission out of a pool of over 20,000 applicants.

• The search engine Google uses a mathematical formula to choose the top 10 or so Web pages to present to users out of a pool of over 3 billion URLs.

In all three cases, some people defend the formula vigorously — especially its owners and stakeholders. In all three cases, other people criticize the formula vociferously — especially those people who don't rank at the top. If you like how you're ranked, the formula works just fine. If you don't like how you're ranked, you try to pick apart the formula to find its flaws.

In all three cases, knowledge of how the formula works causes people to change behavior, trying to "game" their ranking. The formula doesn't just measure the items to be ranked; it leads players to try to "improve" themselves in light of the formula's parameters.

And in all three cases, the owners of the formula periodically "tweak" the algorithm when it turns out that the rank-ordered list doesn't yield the desired results. Sometimes the tweaks yield unintended consequences.

Ranking Michigan Admissions

Jennifer Gratz applied to the University of Michigan in the fall of 1994, her senior year of high school. She had a 3.8 grade point average, ranking her 13th in her high school class of 298. She scored 25 on the ACT. An honor student, tutor, vice president of the student council, homecoming queen, and varsity cheerleader, she thought she presented the kind of credentials that would guarantee admission to UM.

On April 24, 1995, she opened a rejection letter from UM. She immediately asked her father, "Can we sue them?"

If you think it's hard to choose between Lake Louise and Sanibel for your next vacation, imagine trying to pick the best 5,000 applicants out of 20,000 candidates. UM chose a scoring process to ease the burden. Gratz' case shone a bright light on the details of the UM undergraduate admissions process. As the case wound through the courts, UM established a Web site to publish its side of the story [http://www.umich.edu/~urel/admissions], which, among other things, explained its scoring process. UM calls its ranking formula a "selection index," which assigns up to 150 points to each applicant. Available points break down this way:

80 for high school grades
12 for standardized test scores.
10 for strength of high school's academics.
-4 to +8 points for strength of student's course choices.
Up to 40 points for "other factors":
- 10 for being a Michigan resident.
- 6 for being a resident of a county in Michigan that is underrepresent
- 2 for residency in an underrepresented state.
- 4 for alumni relationships.
- 5 for leadership and service.
- 3 for a personal essay.
- 5 for personal achievement.
- 20 points for one of these:
  - Membership in an underrepresented minority group.
  - Socioeconomic disadvantage.
  - Attendance at a predominantly minority high school.
  - "Athletics."
  - "Provost's discretion."

UM argues that its formula accomplishes the university's goals for its undergraduate class. It argues that all admitted students meet minimum qualifications to succeed in coursework at the university; the formula yields a class that is also ethnically and geographically diverse. Some 100 groups filed amicus curiae briefs supporting UM before the Supreme Court.

Jennifer Gratz and her sponsors don't like that 20-point boost given to some applicants. They uncovered evidence that some minority applicants were admitted who would have ranked lower than Gratz without that 20-point bonus.

Our purpose here isn't to weigh the legal arguments or offer a late brief for the Supreme Court. However, we do want to expose how much faith we place in mathematical formulas. In essence, no one is quarrelling with the effectiveness of UM's ranking algorithm. Instead, the quarrel is with its premise.

UM's algorithm literally follows the language of the landmark Bakke decision, in which the Supreme Court ruled that racial quotas were illegal, but that race could be used as a "plus factor" in considering applicants.

Alas, the purity of the premise can be called into question. UM doesn't just seek academic quality and applicant diversity. That 20 point boost for "Athletics" is necessary because Michigan, like Stanford and Duke, is one of the few schools that manages to be strong academically and competitive in major sports. If you're a selective school and also selective about athletes, you need that 20-point fudge factor to insure the presence of talented athletes.

That "Provost's discretion" boost is especially interesting. Press reports indicate that applicants associated with major donors receive that bonus. A recent Wall Street Journal expose showed that marginal applicants to Duke can essentially buy their way into the freshman class.

Critics of the UM algorithm concentrate on the presence of a racial component in the formula, which is the issue now before the Supreme Court. These critics conveniently ignore the other non-academic components. Press reports say that George W. Bush's SAT score was 1206, which probably would not suffice for admission to Yale without his family alumni connections. Critics of the critics argue that if race can't be a factor, then the fact that grandpa was an alumnus, or that mom donated to the development fund, or that Biff is a 350 pound linebacker, shouldn't count, either.

Other critics attack the very idea of a formula. If a formula is used to score applicants, isn't the admissions process, well, formulaic? Some exclusive small schools have stepped forward with this criticism. But none of this points to an inherent flaw in using a scoring algorithm to rank competing candidates. If you've got 20 different people evaluating 20,000 candidates, a scoring algorithm is the only way to ensure a consistent process, certainly for the first pass.

People also tend to evaluate the formula without fully understanding its implications. That boost of 20 points sounds enormous on a scale of 150, raising critics' hackles. However, because a much larger number of non-minorities apply than do minorities, the impact of minority applications is less than people might guess. Former university presidents William Bowen and Derek Bok crunched the numbers and found that eliminating racial preferences would increase the likelihood for a white undergraduate to be admitted to a selective school from 25 percent to 26.5 percent.

Even the choice of scale influences the visceral reaction to the UM formula. Suppose instead of a 20-point boost on a 150 point scale, UM awarded a 2 point boost on a 15 point scale. Mathematically it's the same outcome, but emotionally it sounds different. Or suppose UM deducted 20 points for being white — again, mathematically, there's no difference, but you can just imagine how much more criticism would ensue.

This dispute is about the premise behind the formula — the goals implicit in the formula's components — not about using an algorithm to help make hard choices.

Google, Integrity, and Tweaking

In the UM and BCS examples, a complex formula is used once a year to build a rank-ordered list, and the resulting rankings will greatly affect some people's lives. Over 150 million times a day, Google users invoke a complex formula to choose among 3 billion Web pages — and they expect an answer in less than a second.

By any measure, Google is the overwhelmingly dominant Web search engine. It's pretty uncommon for a single Google search to change someone's life significantly, but Google plays an important role in the information many people rely on for choices in their lives. Cumulatively, the quality of Google's ranking impacts millions of choices made daily.

We know a lot about how Google works in general — and little about the details. Google was conceived when two Stanford graduate students sought an efficient algorithm for ranking the "importance" of millions of Web pages. Larry Page and Sergey Brin observed that existing search engines such as AltaVista had compromised their editorial integrity by silently selling the top of the hit list to advertisers. They sought to build a search engine whose ranking formula exhibited integrity. "Why not rank pages based on how important they are to everyone on the Web?" they thought. So they invented PageRank, a mathematical formula that, to oversimplify a bit, scores each candidate Web page by counting the number of links to it.

Brin and Page described PageRank in detail in a 1998 paper. (The paper is still online at Stanford; see
http://www-db.stanford.edu/~backrub/google.html.) They invented PageRank to use the link structure of the Web — which pages link to which other pages — to decide what makes it to the top of the list. Their original paper proposed the scheme:

We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all Web pages' PageRanks will be one.

Expressed in simple terms, PageRank causes pages that link a lot to bubble to the top of the list. If the pages that link to a candidate page are themselves linked-to frequently, the candidate bubbles even higher.

PageRank is by no means perfect. In November 2002, a teacher in a town on Lake Michigan opened a curriculum guide and gasped when she saw a lesson plan on the migration of whales in that lake. Of course, there are no whales in Lake Michigan. A small company in Utah sells the guide to subscribing teachers. An ill-informed researcher writing a lesson plan on whales did some Google searches and found a hoax site published by a company that gives boat tours on the lake. She swallowed the whale story whole.

This example underscores the limitations of Google's link analysis. A Web page with a large number of links to it is popular for some reason, but popularity does not equate to authenticity, authoritativeness, accuracy, or currency. It doesn't even indicate that a source is believed or trusted — people can link to a source that they distrust or even scorn. In this example, Google's ranking algorithm took a page popular for its comical touch and confused a searcher who assumed that any item ranked high on a Google hit list must be true.

Another limitation that Searcher readers understand implicitly is that Google can only rank what it indexes. Several days after the deadly disease SARS erupted on the world scene, a Google search for "sars" yields the South Africa Revenue Service. You have to go to Google News to find information on the new SARS malady.

Despite all that, Googlenauts not only trust its formula, they leap to defend it. When I observed in a public forum that AlltheWeb.com did a better job than Google if you search for "New York City" because it automatically treated the search as a phrase, several people flamed me. "Google would never modify my search like that!" one wrote. Sorry to break the news, Bubba, but every search engine "modifies" your search with its own spin on what defines relevancy.

Tweaking the Rankings

Sometimes formulas used for ranking can have unintended consequences. The original BCS formula gave credit for the margin of victory in each game. This makes sense, doesn't it? If you just squeak by and beat someone, you're not that much better than they are, right? Conversely, if you clobber them, you must be heaps better.

But this encouraged dominant teams to run up the score against clearly inferior teams, just to improve the BCS scoring. It gave incentive to schedule a game against the College of East Overshoe, just so you could clobber them for points. So in 2002, the BCS removed margin of victory from the formula.

Most fans, coaches, and sports columnists believe that Oregon lost a chance for a national championship in 2001 because it won a number of close games. The human rankers — the coaches' poll and the media poll — placed Oregon number 2. The BCS formula placed it number 4.

Consider the possibility that UM will find it necessary to tweak its admissions formula — if the Supreme Court doesn't tweak it out of existence. Suppose, for instance, that minority test scores and high school grades begin to rise over time. The day could come when one or more minority groups are represented out of proportion to the actual population. Or, suppose scores fall. UM would be under tremendous pressure to take action — even though the university claims its formula isn't a "quota."

Google isn't immune to the need for tweaking, either. For example, go to Google and search for:

google september 11

The top item on the list is http://www.firstmonday.dk/issues/issue6_10/wiggins — an article I happened to write about how Google reacted to 9/11. How did this article make it to the top of the hit list for this particular search? As it happens, the article was cited in a number of bibliographic sites concerning 9/11.

Probably the more important factor is that a number of influential Web logs also cited the article. Web logs, or blogs, represent an important phenomenon in blending traditional media with Web media. Writers such as Dan Bricklin linked to the article; other blogs picked up the reference, and suddenly PageRank ranks this page at the top.

But this is not all good news. Bloggers have learned that they can "game" the system by organizing a campaign in which a large number of blogs mention a URL they wish to boost. This simple technique proved to be an effective method of "index spamming." Google has tweaked its algorithm to reduce its effectiveness, but hasn't solved the problem.

Blogs and other news sources represent a paradox for Google, allowing content about breaking events and news to enter the mix and to influence rankings. In other words, these sources allow Google to grow and adapt to reflect new stories and new sources of Web-based information. But blogs can also magnify a source's relative trustiness merely because other bloggers find the content worth talking about. Apparently, Google sees blogs as an opportunity, not a problem: It recently bought the leading blogging technology company, Pyra Labs.

Actually, Google tweaks its formula frequently. Many searchers don't realize that Google launches an entirely new index once a month. This allows Google to re-sweep the Web in a way that clears away deadwood and discovers new content. It also provides Google with an opportunity to tweak how it ranks pages. It was during one of those monthly updates that Google tweaked to adjust for blog campaigns.

From Brin and Page's days at Stanford until today, when Google overwhelmingly dominates the Web search engines arena, Google has steadfastly maintained that the choice of ranking is editorially pure, driven by what is genuinely most relevant to the customer, not by advertising dollars or any other consideration.

At the same time, Google's revenues from advertising are on the rise, even in a market where online advertising budgets are a fraction of those during the halcyon days of the Internet bubble. Google even applies its ranking techniques to its successful Adwords program: If your ad draws more click-throughs, your ad is more likely to appear the next time someone does a relevant search.

Maybe I'm naïve. I trust Google to keep its hit list pure. But I think it can offer even a better, more intellectually honest form of ranking. I'd like to see a user control panel for Google — a sort of graphic equalizer for the search engine. In addition to a knob for Popularity, there'd be a knob labeled Authenticity and one labeled Level of User Trust. Throw in Accuracy and Cynicism and Humorousness and.... Heck, why not even have a knob labeled Payoff? Let me dial up that knob if I want to see, á là Overture.com, the ads of the highest bidders at the top of the hit list.

But please, let the default setting for that knob be zero.

Footnote
¹Each of the components of the BCS formula is itself complicated. Strength of schedule, for instance, is calculated according to this formula: SOS = 2/3•(OpW)/(OpW+OpL) + 1/3•(OpOpW)/(OpOpW + OpOpL) where:
• OpW is the sum of all opponents' wins minus the number of losses by the team in question minus the number of wins over Div1AA teams
• OpL is the sum of all opponents' losses minus the number of wins by the team in question
• OpOpW is the sum of all opponents' wins as calculated above
• OpOpL is the sum of all opponents losses as calculated above.
Source: http://www.geocities.com/rtell/FAQ.html.

Back to top