Ghost Authors, Teachable Moments
By Marydee Ojala
Editor • ONLINE
When is an author not an author? When Google’s search algorithms go awry and identify as an author a phrase that only a computer could think was an author. Péter Jacsó, whose column ran in ONLINE for many years, identified in Google Scholar such “ghost authors” as P Login (for Please Login) and L Password (for Lost Password). Worse, these “ghost authors” can replace the real ones, whom Péter dubs “lost authors.” Google Scholar can also get confused about publication dates, mistaking page numbers, postal codes, and street addresses for dates.
At a breakfast meeting sponsored by ProQuest during Internet Librarian 2009, Jacsó pointed out absurdities he’d found in Google Scholar. He had screen shots of some of the funnier errors he’d found. It isn’t only ghost authors and incorrect date results that annoy Péter. He also asserts that Boolean logic doesn’t always work as it mathematically should and that Google Scholar lacks transparency. For example, Google doesn’t reveal sources or time frames of coverage.
I blogged the talk (www.infotodayblog.com/2009/10/27/summoned-to-breakfast) and noted Péter’s upcoming (now published) article in the Nov. 1, 2009, issue of Library Journal (www.libraryjournal.com/article/CA6703850).
Péter’s main point is that these mistakes have serious consequences for bibliometric analysis. Trying to create an accurate citation count when ghost authors crop up to replace actual authors leads to a wildly off-base number. If you want a correct count of articles written on a particular topic in a particular year, you are equally prone to encounter a wrong answer.
What impact do Google Scholar’s algorithmic misunderstandings have on actual research? Do they provide teachable moments for librarians? If so, what is to be taught, and will it differ depending upon whether you’re imparting this knowledge to faculty or to students?
My guess is that very few people, even very few librarians, routinely search for Login, Password, or the other absurdities in the Author field. However, a search for, say, a medical condition that retrieves a citation with wildly weird authors should provide information professionals with the opportunity to point out to faculty and students how unlikely these citation elements are.
Faculty could be alerted to watch for them in bibliographies. It’s an indication that the student never read the actual paper. Instead of looking at the original article, the student accepted at face value the citation from Google Scholar. That’s shoddy scholarship. Students should be alerted to the fact that you can’t always trust Google’s handling of basic citation data.
Let’s not let our premium content, fee-based vendors completely off the hook while we poke fun at Google Scholar. Various databases and search services for which we pay good money make mistakes as well. You are reading the January/February 2010 issue of ONLINE, yet in some databases, the date will read as Jan. 1, 2010. Some databases take an article written by John Doe with a sidebar by Jane Smith and decide they are co-authors. Should I contact these database producers and complain? No, I’d rather view it as a teachable moment.
Ojala is the editor of ONLINE. Comments? E-mail letters
to the editor to