Set Your Cites High: The Value of Quality Citation Information

We’ve always thought that when listing the number of articles that cite a particular work, citation databases have actually counted all of those articles and retain a master list. Perhaps that goes along with us being a little retro. Dave wears a Paul McCartney and Wings shirt on Casual Friday. Amy still does Jane Fonda’s Workout on VHS. We can name that Brady Bunch episode in one scene, and if you want to know the ingredients of a Big Mac, we will sing the song. So please find it in your heart to excuse us for being either old-fashioned, naïve, optimistic, or all three for our surprised amazement when we stumbled upon this well-hidden caveat on Google Scholar (scholar.google.com):

Dates and citation counts are estimated and are determined automatically by a computer program.

Wait, what? Call us crazy, but we thought that Google could actually produce a list of articles that cite an article of interest, with the number of articles in that list equal to the given number. If we got 1,759 cited references, we thought that we could page through the list to see all 1,759 if we so chose.

First, this caveat is not readily apparent. A typical Google Scholar search involves typing in your title or author, finding the article, and reviewing the citation count. This approach, however, precludes you from seeing any information with regard to how Google Scholar actually conducts the citation count. When you see the total, the knee-jerk assumption is that all of those references exist in Google Scholar. However, if an author clicks on My Citations (the area where authors go to set up their accounts and check on their own citations), the caveat in question appears. It is almost as if Google seems to think that only authors will be interested in the exact totals for their own articles, when in reality, researchers need to know this information just as much or more.

This system might not be that alarming if you are just trying to get a sense of the popularity of an article. In that scenario, a ballpark figure works. But for librarians and information professionals working in finance or litigation, the number of cited references can either support or undermine the credibility of a theory or argument. It can make or break a deal or determine the outcome of a case.

BASIC VALUE OF CITATION REFERENCES

A wide variety of circumstances exist in which access to searchable and dependable citation reference information is invaluable. On a basic level, knowing how often particular authors’ works are cited by their peers is helpful (though maybe not determinative) in correlating just how widely those authors’ conclusions, theories, or opinions have impressed themselves into the works of others.

Bear in mind that the nature of the reference could be positive or negative. A widely cited article could be a benchmark study in a particular field or, equally, a controversial piece repeatedly refuted. But the ability to trace a line from a given article to subsequent references, its “descendants,” gives you some insight into how an author’s ideas are accepted and how these ideas often lay the foundation for future work.

Knowing exactly when, where, and by whom an article is referenced gives you a clear picture of how academic thought can evolve. In many ways, following a citation trail is a little like evolutionary science, with some branches dead-ending and others making a strong impression on future generations of thought.

CITATION CONUNDRUMS

Nancy Herther covered the conundrums associated with citation searching in previous articles for Online Searcher (“Digging Deeper,” March 2015, and “Advanced Citation Searching,” May 2015). In these articles, Herther considers the relevancy of citation searching in a world in which data is valued as “key indicators of anything and everything,” as well as the question of which metrics, if any, are indicators of a gold standard of research and/or researcher.

Ultimately, she determines that cited references are just as important as ever. She makes the strong argument that we have an obligation as information professionals to aggressively lobby database vendors to add cited reference records and search capabilities to their platforms and to encourage the industry at large to standardize across platforms so that data can be “aggregated and easily manipulated for analysis.”

As information professionals working in a litigation set ting, we wholeheartedly agree. We use cited references to bolster or debunk theories, arguments, and research all the time. We also use them to qualify practitioners. But because we are in a litigation situation, the stakes for accuracy and foolproof data are extremely high.

After our disconcerting Google Scholar discovery, we decided that if we are going to use citation counts as a measure of credibility and sound science, we need to know exactly how these counts are tabulated by the databases we access. Therefore, we decided to look at the main research databases that offer the cited reference feature to determine how each one tabulates the numbers. It became obvious to us immediately that all citation research projects present citation counts that vary by service, because the nature of the search universe differs for each database. As a result, we did not compare counts by service, as many other articles on this topic have done—and there are many articles on this topic!

FREE ACADEMIC SEARCH SERVICES

Google Scholar is not the only free academic search service offering citation searching, but it is the most popular. Microsoft Academic is the other, but it has had a rocky history, start ing as a research project (though rarely advertised as such) called Microsoft Academic Search that stopped updating in 2012. Microsoft Academic is the more recent iteration, built using semantic search technologies on top of Bing’s web-crawling capabilities. No longer in Microsoft Academic are Microsoft Academic Search’s graph visualizations, but Microsoft is encouraging people to try out its Academic Knowledge API to create their own.

We see Google Scholar as a service rather than a database. With a proprietary algorithm and an untold number of indexing arrangements with publishers across the spectrum of subject fields, Google Scholar is estimated to cover more than 100 million scholarly works. In fact, a 2015 study puts the number closer to 160–165 million documents (Enrique Orduna-Malea, Juan M. Ayllon, Alberto Martin-Martin, Emilio Delgado Lopez-Cozar, “Methods for Estimating the Size of Google Scholar.” Scientometrics, September 2015, Vol. 104, No. 3, pp. 931–949). For sheer scope, Google Scholar’s capacity is impressive. Coverage includes journal articles and books as well as theses, technical papers, conference reports, and “grey literature” that aren’t always indexed in traditional databases.

The citation feature on Google Scholar is very much aimed at the academic author. Entering an article title in the main search bar will provide you with the number of times the work has been cited, according to Scholar’s algorithm, but other valuable metrics can be accessed only after you’ve set up an author profile and have added articles to this profile. At this point, you finally get some additional data, including an annual breakdown of cited references, which is a potentially interesting metric for anyone tracking interest in an article across time. Of course, there’s nothing prohibiting someone from setting up an author profile with any name, thereby accessing this slightly more detailed citation information.

For example, a search for the benchmark economics work, “Consumer’s Surplus Without Apology” by Robert Willig, American Economic Review , 1976, results in 1,809 citations. You can browse that prolific list through the link provided. Because the count is an estimate, it is unclear how many titles will comprise the actual list. Google also provides a yearly breakdown, from 1976 onward, of cited references. Using the My Citations option provides you with a little more information than a typical citation search. More importantly, this seems to be the only way to be alerted to the fact that the results are estimated and generated by a proprietary algorithm.

The potential issues with using Google Scholar for citation searching are the same issues you have with the service in general. With apologies to Donald Rumsfeld, it’s still a matter of “known unknowns.” Scholar’s algorithm and the index it accesses remain unclear. Google states that “Google Scholar aims to rank documents the way research ers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature,” but we don’t know how this is specifically achieved. There’s some irony here, as most scholarly authors will tell you that it is important to understand the methodology used in order to determine the value of the results.

Google Scholar claims there are 1,809 articles which cite Willig’s 1976 article, and clicking on the link should show all of them.

Microsoft Academic estimates 1,260 articles that cite Willig’s 1976 article and uses the * to explain that the number is an estimate.

MICROSOFT ACADEMIC

Microsoft Academic (academic.research.microsoft.com) uses an algorithm to access, according to the site, 80 million publications. After searching for author or title, the number of citations appears with an asterisk that directs you to the caveat that the number count is an estimate. Refreshingly, Microsoft Academic’s FAQ page clearly states the potential issues:

Due to the noisy nature of large-scale scholarly data available on the Web, a publication’s true citation count is not identical to a simple count of the citing documents indexed by any given scholarly database. Thanks to the huge quantity of publications in the Microsoft Academic Graph, we are able to estimate a more accurate citation count for each publication. The citation count shown per publication reflects this estimation based on a statistical model which takes advantage of both the local statistics of individual publications and the global statistics of the entire academic graph to determine the estimates of citation counts.

Unlike Google Scholar, Microsoft Academic allows for searches in specific fields of study. Microsoft Academic also provides a listing of all the publishers it has agreements with, though some have yet to go into effect.

Amy Affelt is director, Database Research, Compass Lexecon and author of The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals (Information Today, 2015).