Google Scholar: Library Partner or Database Competitor?

Information professionals rely on vetted sources of information in order to provide the best quality service to their clients—whether they are students, researchers, or businesses. The internet has profoundly transformed our communication expectations, access to information, product evaluation, and buying behaviors. It has disrupted the business models of retail stores, the travel industry, and music.

For libraries, it has created serious challenges in the provision of quality information services. Even the definition of quality information services is coming into question. With all of the attention paid to “fake news” in the media and questionable research results in scholar ly literature, we can see the problems caused by a lack of focus and failure to verify data.

What is the role of Google Scholar (GS) in libraries today? Does it add value to the library or cause librarians to prefer it to more traditional databases? Many see GS as a strong partner in information provision, as reflected in this comment from Jared L. Howland and colleagues in a 2009 issue of College and Research Libraries:

GS has not only become a common fixture in library literature but is also becoming ubiquitous in information-seeking behavior of users. GS was initially met with curiosity and skepticism. This was followed by a period of systematic study. More recently, there has been optimism about GS’ potential to move us toward Kilgour’s goal of 100 percent availability of information. Librarians now find themselves acknowledging users’ preferences for one-stop infor mation shopping by giving GS ever-increasing visibility on their Web pages.
– “How Scholarly Is Google Scholar? A Comparison to Library Databases”; pdfs.semanticscholar.org/7dab/41504f61a8f85fc83c26e6700aad34a251c5.pdf

Since 2009, the reliance on Google products has risen dramatically in academic and research institutions as financial and public support have declined. Today, in fact, you would find few academic library webpage listings of databases that don’t include GS. The very act of including it on these lists of library databases provides a type of validation and support for its role and value. But is that valuation warranted?

Surendran Cherukodan, assistant librarian at India’s Co chin University of Science and Technology, told me he sees clear value from GS. “I think, with some limitations, GS is doing a wonderful task to the academic community. While Scopus and Web of Science (WoS) are paid databases, GS does it for free. It is free and open. Its robots search and find articles published anywhere in the world which have a web presence and inform us of the related articles/citing articles. The Google Metrics is also useful to identify top publications in all fields of science.”

So, what are professionals and researchers themselves saying about the reliance on GS, at any price, to adequately provide—or to even replace—traditional scholarly databases and indexes?

THE COST OF SEARCH

In a very provocative presentation at the 2016 Charleston Conference, Utrecht University librarian Anja Smit talk ed about the future of local discovery in current economic times. “Rising prices of content is a constant concern,” she explains to Online Searcher, “and we are always looking for shifting investments to create opportunities for new services (e.g., to support open science, or to preserve materials). In that context, to invest less in a local discovery tool is an advantage.”

She continues: “In the Utrecht library, it is expected that WorldCat Discovery Services will replace the Aleph catalog in 2018—and only because users cannot ‘link’ direct ly from Google (e.g., GS) to request print publications. For e-resources, this is possible, and users do not need a local discovery tool. We will offer our users an international view first, based on the assumption that to ‘discover’ content, it makes no sense to restrict oneself to the collection of one library. For a printed known item search, it may be of use to restrict a search to the local collection (but even then …). The local catalog also becomes less important because it could never provide access to all open access and freely available digitized materials. And the volume of that category is increasing.”

However, Smit sees a clear, ongoing need for quality scholarly databases with sophisticated search options. “My point is about catalogs and possibly general databases, not necessarily about discipline-focused bibliographies. Those would provide added value for users, although as research becomes more interdisciplinary, that would decrease their value.”

GS—THE IMPERFECT GIANT

A team led by Enrique Orduna-Malea attempted to discov er how large Google Scholar really is (“Methods for Estimat ing the Size of Google Scholar,” Scientometrics, Vol. 104, No. 3, May 2016, pp. 931–949; DOI 10.1007/s11192-015-1614-6). They proposed and tested different methods, including a nonsense search. They found that “despite providing disparate values, [we] place the estimated size of GS at around 160–165 million documents.” The authors did note that “all the methods show considerable limitations and uncertainties due to inconsisten cies in the GS search functionalities.” No method exists to ex amine or validate issues with GS as they arise.

Writing in Quartz on Sept. 16, 2016, Dave Gershgorn noted that the most cited scientist on GS is not an actual person, but “et. al.,” clocking in with more than 2.6 million citations when he checked (qz.com/787301/a-glitch-in-google-scholars-algorithm-is-messing-up-citation-counts). The European Union’s Project ACUMEN (webometrics.info/en/node/58) keeps a list ing of the most highly cited authors (real authors). However, this points to the limitations that exist in a tool created as a byproduct of a commercial product intended for other purposes and whose structure and algorithms remain a secret.

WHEN IS GOOGLE ENOUGH?

Writing in Information Research (“‘Just Google It’—The Scope of Freely Available Information Sources for Doctoral Thesis Writing”; informationr.net/ir/22-1/paper738.html), Vincas Grigas, Simona Juzeniene, and Jone Velickaite found, using bibliographies from completed dissertations, that free or open access resource tools today can cover at least 40% of research materials needed without library intervention. Does this mean we are reaching a tipping point for libraries’ traditional role? Locating copies of research is a very differ ent process than scanning the environment for potentially key sources for information.

From the medical library perspective, Dean Giustini, reference librarian at Vancouver General Hospital, and Maged N . Kamel Boulos, British health informatician, concluded:

GS’ constantly-changing content, algorithms and database structure make it a poor choice for sys tematic reviews. Looking for papers when you know their titles is a far different issue from discovering them initially. Further research is needed to deter mine when and how (and for what purposes) GS can be used alone. Google should provide details about GS’ database coverage and improve its interface (e.g., with semantic search filters, stored searching, etc.). Perhaps then it will be an appropriate choice for systematic reviews .

–“Google Scholar Is Not Enough to Be Used Alone for Systematic Reviews,” Online Journal of Public Health Informatics , Vol. 5, No. 2, 2013; journals.uic.edu/ojs/index.php/ojphi/article/view/4623)

Giustini further advises readers:

Google is adequate for some types of searching in the scholarly sense. However, if you have an un dergraduate student in the upper levels of their degree programs, GS can’t really be used on its own in any subject if the goal is to do a robust literature review. If a student or faculty simply want to locate a few relevant papers, then yes GS is fine. I have spoken to the engineers at Google Scholar, made recommendations to improve GS, but none of those recommendations were ever seriously considered.

“There is also the issue of trust,” Hamid Jamali of Charles Sturt University’s School of Information explained to me. “You cannot rely on preprint or a version of an article that you cannot verify sometimes. Not all researchers are able to judge an article from its content, they rely on other criteria such as reputation of journal, publisher etc. The rate of free access is lower for newly published articles, as a lot of journals have an embargo. Although they eventually make their articles available to the public or allow authors to self-archive, they do so after a year, etc. Access to new articles is critical.” Along with this, in order to make GS a reliable source, we need to develop some system that allows authors to clearly indicate versioning in their posted research materials— is something a draft, a submitted manuscript, the final copy?

“GS needs to become more rigorous and open and usable,” University of Kent Business School’s John Mingers advises. “Scopus/WoS need to dramatically improve their coverage of subject and type of output … no transition will happen unless Google invests in the product and gives it a proper user interface and makes its search mechanisms more transparent and puts effort into cleaning up the data. Colleagues who have tried to get help from Google have met with little response.”

Yutao Sun, Marie Curie Research Fellow of Chinese Stud ies at University of Nottingham, sees GS as “a very power ful tool for the spreading of scientific knowledge, but it’s imperfect … [lacking] a proper peer-reviewing procedure, materials of a lower quality could also appear in the search results. In that case,” Sun says, “researchers need to rely on their own evaluations of the materials, whereas with the traditional peer-reviewed databases, the evaluations are largely done by the peer-reviewing procedure.”

GS is “a tool, like other databases,” notes Queensland University of Technology’s Jamie Trapp. “Every tool has its strengths and weaknesses; one of the strengths of GS of course is that it is free and therefore more widely avail able (e.g., many institutions, particularly from the developing world, don’t have access to many journals or databases and GS provides at least some of this).” Conversely, he points out, the paid databases do enable a bit more refinement in searches (subject area etc.), while GS doesn’t seem to have quite so much control of search results. He concludes, “My opinion is that any researcher should be using multiple tools if they have them, recognising the limitations of each.”

Serious researchers depend upon complete and sure in formation. Peer review, publication in quality journals, and proper indexing don’t guarantee that the information is true, but it’s a far better system that Google’s black box approach to GS.

IS THE BEST YET TO COME?

Smit sees new roles and possibilities for information professionals ahead. “Libraries are transforming from ‘gateways to knowledge’ to increasingly take on the ‘consultant role.’ Our competencies around how to handle ‘knowledge’ sup port the idea of enabler of a huge cultural change within research to ‘open research education.’ In my university, the library is asked to initiate and lead a university-wide open science program because of our relevant competencies.”

A June 2017 article by Maxim Grechkin, Hoifung Poon, and Bill Howe (“Wide-Open: Accelerating Public Data Release by Automating Detection of Overdue Datasets”; journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2002477) published in PLoS Biology introduced an open access tool called Wide-Open (github.com/wideopen/datawatch), “a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private.”

Nicola Jones, writing in Nature, quotes Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence (AI2; allenai.org). Its new search engine, Semantic Scholar, “aims to index all of PubMed and expand to all the medical sciences.” Not to be overlooked is Microsoft Academic, which Anne-Wil Harzing thinks “is getting close to combining the advantages of GS’ massive scope with the more-structured results of subscription bibliometric databases such a Scopus and the Web of Science” (nature.com/news/ai-cience-search-engines-expand-their-reach-1.20964).

Perhaps the best is yet to come. In the meantime, there’s no value in denying either the essential role of scholarly databases or the overwhelming popularity of GS, despite its many flaws. We have a professional commitment to see that our clients are clearly aware of the costs and benefits of each and to advocate for better discovery systems. We abdicate our responsibility as trusted information profes sionals if we do not do this. As many of us have learned, there is no free lunch.

Nancy K.Herther is a research consultant and writer who recently retired from a 30-year career in academic libraries.