Knowledge Graphs: The Next Revolution in Scientific Publishing
by Dave Davis
While the advent of OA has had a dramatic impact on publishing practices over the past dozen years, there will be an increasing expectation that the underlying data be made accessible along with the article. PLOS, for example, already requires all of its authors to make all data necessary to replicate their study’s findings publicly available. In years to come, this data requirement may become as weighty as the demand for an OA provision.
In addition, data wonks like me are compelled to learn progressively more about data applications for scientific research. These include submitting manuscripts on the front end and text and data mining of aggregated articles on the back end. Increasingly, such research outputs require the inclusion of the underlying data, as well as representations of that data in new and more informative ways.
With the enormity of the challenge presented in confronting ever greater volumes of data, knowledge graphs are emerging as a powerful explanatory tool. Working alongside OA, knowledge graphs form a larger trend of linked data and enhanced data visualizations that will ease the process of peer review, alleviate the conduct of reproducibility studies, and benefit the scientific communication process overall.
KNOWLEDGE GRAPHS 101
Before we can get into how knowledge graphs will revolutionize scientific publishing, we must first understand what exactly they are. With vastly more data being collected and analyzed, knowledge graphs are a tool for representing insights in an easy-to-comprehend manner. Specifically, they enable a user to identify complex, interconnected relationships in and among the data in visual or graphical form. In a nutshell, knowledge graphs are created by processing large volumes of data from diverse sources and information types and can produce 2D and 3D visual representations of those data in all of their complexity and interconnectedness.
Visual data representation in the form of knowledge graphs differs from other kinds of data displays, such as spreadsheets. A spreadsheet is simply a table that is arranged for ease of entry and optimal storage and is programmed to enable certain preset means of manipulation. While it can represent relationships in an orderly way, a spreadsheet is not optimized to retrieve answers from a high volume of data in an efficient manner—especially as datasets become increasingly larger and interconnected. Knowledge graphs, however, unify data, creating a flexible data layer “on top” of multiple data sources.
In other words, graph databases make relationships between entities explicit, compared to the more implicit connections made in other data stores. This allows researchers to traverse these relationships in much more efficient ways; they can more easily find patterns among the data and ultimately make sense of it. Additionally, the act of building a knowledge graph can have benefits well beyond the visualization it produces. For example, search improvements or advanced analytics can be constructed based on the learning that goes into building the graph.
The benefit that knowledge graphs offer researchers is easy to understand, but what about the benefit they provide for the scholarly publications these researchers publish their articles in? With the current set of challenges in peer review, one of the applications that data experts have explored recently is a knowledge graph that shows relationships among researchers who are writing about COVID-19. Editors working in this space are responsible for identifying and selecting high-potential peer reviewers for articles that have been submitted to their journals.
By using a knowledge graph that includes research articles on COVID-19, SARS, and MERS—as well as additional articles from open research datasets such as CORD-19 and curated hubs such as LitCovid—editors can have a visually clear and accurate view of all of the authors who have published relevant articles and the publications their articles have been published in. Other relevant datapoints, such as co-authorship and relationships among authors, institutions, and publications, can also be revealed. Ultimately, this data can help facilitate peer review for scientific authors and editors.
The main point of illuminating data-sets through a knowledge graph is that it provides the user with immediate access to new and powerful perspectives on the underlying data. For researchers, these graphs enable them to find context and make sense of the data; for publishers, knowledge graphs move the peer-review and reproducibility processes along with greater efficiency. If researchers and publishers don’t leverage the enhanced visualization techniques knowledge graphs offer, it is my belief that the volume of data confronting the scientific publishing community will simply become too overwhelming.