Data Sources in Altmetric Indicators

I discussed the differences in data sources between the main citation indexes (Scopus, Web of Science, and Google Scholar) in my January/February 2020 column. Using citation-based metrics for evaluating research and demonstrating impact tells only part of the story. Altmetrics tells another part by measuring the impact of scholarly objects/artifacts using sources other than peer-reviewed journal article (PRJA)-based cited references.

It has been a decade since Jason Priem, Dario Taraborelli, Paul Groth, and Cameron Neylon published “Altmetrics: A Manifesto,” which gave us a working definition of altmetrics and raised the level of awareness about this way to evaluate and understand research (altmetrics.org/manifesto). Nonetheless, most still consider altmetrics as the new kid on the metrics block. A growing body of studies on the strengths, limitations, implications, and meanings of altmetrics sheds some light on how various altmetric indicators work. However, the field needs more research on these themes. Raising awareness of what is currently understood about altmetrics and how they do (or do not) demonstrate impact is also critical to their appropriate use.

Even more than for citation-based indicators, it is of paramount importance to understand and follow the data sources used in creating altmetrics. There are currently three major purveyors of altmetric indicators: Altmetric, PlumX, and Impactstory. Data sources for all three evolve: Some social media platforms locked down their data, while new sources of impact information have arisen. Experts continually increase understanding of where scholarly research is mentioned and cited beyond references in PRJAs.

Overview of Altmetric Tools

Altmetric: Altmetric is a Digital Science company. Euan Adie founded Altmetric in 2011 with the intention of collecting metadata related to online “attention” on the web for publications. Adie and William Roe offered key points about this approach in their 2013 article, “Altmetric: Enriching Scholarly Content With Article- Level Discussion and Metrics” (Learned Publishing, January 2013, 26(1):11–17; doi.org/10.1087/20130103).

Altmetric’s philosophy hinges on three approaches: comprehensive data collection, tool scalability, and what it calls traceable data sources, meaning the user can identify the original piece of content that relates to the research object/artifact in question. For example, the user can click through to the actual blog post or tweet that mentions a given article. Altmetric uses a multicolored “donut” visualization and Altmetric Attention Score to quantify impact, but like the other altmetrics sources, the real strength is the traceability of the original mention or reference.

Originally, Altmetric focused on research articles, but now it will capture any research object/artifact that has an identifier assigned to it. The most common identifier is the DOI, but it could also be any of these as well: PubMed IDs, ISBNs, Handles, arXiv IDs, ADS IDs, SSRN IDs, RePEc IDs, URNs, Clinical Trials.gov records, and certain URLs (help.altmetric.com/support/solutions/articles/6000134562-what-scholarly-identifiers-are-supported-by-altmetric-). This mix of identifiers facilitates the tracking of books/chapters, presentations, disserta tions and theses, datasets, grey literature, clinical trials, and other research objects/artifacts in addition to PRJAs (termed “research outputs” by Altmetric).

Plum Analytics: Founded by Michael Buschman and Andrea Michalek in 2013, EBSCO acquired Plum Analytics in 2014 and sold it to Elsevier in 2017. The tool could be considered the most comprehensive indicator on the altmetrics playing field; it is certainly more complex (although its documentation seems less transparent) than Altmetric. Unlike the Altmetric Attention Score and Altmetric donut, the PlumX tool and the Plum Print visualization do not attempt to meld different types of altmetric sources of impact. Instead, PlumX Metrics are based on five categories of impact: Citations, Usage, Captures, Mentions, and Social Media (plumanalytics.com/learn/about-metrics). The Plum Print, with its flower-like appearance, gives the user a quick assessment of which type of impacts are manifest for a given object/artifact. Note that Plum Prints still appear in EBSCO databases, and EBSCO usage data is a component of the Plum Print scores. Like Altmetric, the real strength is in the traceability of the original sources.

When they began, Buschman and Michalek were concerned with meeting the gap in measuring impact in research disciplines that do not focus on PRJAs. Like Adie and Altmetric, Plum Analytics founders valued scalability in terms of data collection. Buschman and Michalek emphasized a technique called identity resolution to bring together impact measure ments for objects/artifacts that have multiple points of access, such as an artifact that appears in a repository, website, and aggregator database. The PlumX creators were also looking ahead to aggregating artifact/object level metrics into author-level profiles and network analysis (“Are Alternative Metrics Still Alternative?” Bulletin of the American Society for Information Science and Technology, April/May 2013, 39(4):35–39; doi.org/10.1002/bult.2013.1720390411).

Plum Analytics aspires to provide altmetric indicators for “any research output that is available online.” Currently, Plum Analytics identifies 67 types of research objects/artifacts, ranging from abstracts to grants, multimedia, syllabi, technical documents, visual arts, and various other web resources (for the full list of object formats, see plumanalytics.com/ learn/about-artifacts). To accomplish the task of identity resolution, PlumX uses what it calls a “seed identifier” that algorithmically maps to other identifiers of the same content that may be at a different digital location. Plum Analytics’ web presence is somewhat cagey about what it uses as identifiers; it does not appear to offer a complete list online. One presumes that most align with the artifact/object identifiers used by Altmetric. Plum Analytics does, however, provide some examples that differ, such as YouTube and Vimeo IDs, SlideShare ID, and GitHub Repository ID (plumanalytics.com/identifiers-types-research-output).

Impactstory: This tool has a different approach toward al ternative metrics. Focusing on author profiles, it uses a kind of badging method to identify author reach and impact. Impactstory was created at a hackathon in 2011 by Heather Piwowar and Jason Priem. Since that time, it has been a grant- funded, nonprofit endeavor, unlike Altmetric and Plum Analytics, which are proprietary, for-profit products. In July 2019, the company changed its name to Our Research (ourresearch.org), although the tool name remains Impactstory. Its approach focuses on open content, open source code, and transparency (profiles.impactstory.org/about).

Because Impactstory focuses on authors and not artifacts/objects, it has slightly different metrics, known as achieve ments. These include Buzz (volume of discussion related to a research object/artifact), Engagement (how that object/artifact is being discussed), Openness (its ease of access), and the interesting, but limited-use Fun category. There are a number of badges for each category of achievement, and there are also what could be termed “macro” levels of achievement for its pool of researchers: Gold (top 10%), Silver (top 25%), and Bronze (top 50%) (profiles.impactstory.org/about/achievements).

Where Does Altmetric Data Come From?

Knowing what various altmetrics providers track and their overall approaches sheds some light on how each might be useful in differing circumstances. To get a more complete picture, we need to look at where the source data for the metrics come from, as well as how various datapoints are counted in each of the metrics. What platforms and content platforms do the various altmetrics purveyors use to track the impact of research artifacts/objects?

Altmetric: Altmetric harvests its sources from both algorithmic and manual data collection processes. Sources in clude global public policy documents, more than 2,000 main stream media mentions, Wikipedia, Open Syllabus Project, citations found in Altmetric’s sister citation indexing tool Dimensions, and more than 9,000 blogs of various ilk. Research recommendations from sources in the academic milieu in clude the peer-review forums Publons and PubPeer and the network F1000Prime. Altmetric tracks social media, including public Facebook pages and Twitter, but also has legacy data from LinkedIn, Google+, and other platforms. Interesting miscellaneous sources include IFI Claims patent data, YouTube, Reddit, and Stack Overflow Q&A.

Subscribers to Clarivate’s Web of Science (WoS) have access to WoS citation data in Altmetric. Interestingly, Altmetric cap tures Mendeley statistics and displays them in the source data, but they are not included in the Altmetric Attention Score. Presumably, this is because Elsevier owns both Mendeley and Altmetric competitor PlumX. Not all sources are covered from the inception of Altmetric, and the list continues to evolve. Luckily, Altmetric provides date ranges for its source tracking (help.altmetric.com/support/solutions/articles/6000136884-when-did-altmetric-start-tracking-attention-to-each-attention-source-).

Plum Analytics: Each of PlumX’s five metric categories— Captures, Citations, Mentions, Usage, and Social Media— has its own list of source data. These sources sometimes overlap into more than one category, which could raise questions about double-counting sources and thereby creating an inflated perception of impact relative to Altmetric.

Sources related to its Captures metric feature a wide variety of platforms, including SlideShare, SoundCloud, YouTube, GitHub, Mendeley, SSRN, EBSCO, and Vimeo. Tracked on these platforms are bookmarks, favorites, followers, readers, saves and exports, subscribers, and so forth.

The Citations metric tracks citation indexes including Airiti Academic Citation Index, CrossRef, various components of PubMed, RePEc, SciELO, Scopus, SSRN, the U.S. Patent and Trade Office, DynaMed Plus, National Institute for Health and Care Excellence (NICE) clinical guidelines, and a manually curated policy document source list. Tracked on all of these platforms is citation count. The documentation does not expressly indicate if each count is taken separately or if, for example, the counts are aggregated to remove duplicates.

Mentions are sourced from manually curated blog and news lists, Reddit, SlideShare, Vimeo, YouTube, GitHub, Stack Exchange, Wikipedia, Amazon, Goodreads, and Source Forge. These sources are tracked for posts, articles, forum topics, comments and mentions, references, and reviews.

The Usage metric is sourced from Airiti products; various repositories, including those from bepress and DSpace; EB SCO; ePrints; RePEc; SciELO; SSRN; bit.ly; GitHub; Dryad; Figshare; WorldCat; Vimeo; YouTube; SoundCloud; and SlideShare. Usage includes such things as object/artifact views (abstract, code, full text, etc.), URL clicks, downloads, library holdings, times played, and so forth.

Finally, Social Media includes Vimeo, YouTube, Facebook, Amazon, Goodreads, SourceForge, Figshare, Reddit, and Twitter. Tracked in the social media is activity such as likes, shares, comments, ratings, recommendations, scores, and tweet count.

Like Altmetric, the sources included have evolved with time as platforms have come and gone and the ability to track new sources has improved. Plum Analytics also keeps a list of changes to its source data mix (plumanalytics.com/learn/resources/plum-analytics-metrics-audit-log).

PlumX metrics draw distinctions between different types of activity on a single platform, for example, on YouTube alone, favorites, subscribers, comments, and number of plays are each counted in separate metrics on the Plum Print. The distinction between types of interaction on a giv en platform is an interesting one to consider and evaluate, but it is not readily evident that breaking out activity in this manner is a more granular or clearer approach to under standing impact. A single YouTube user could conceivably perform each of these activities on a single video, and by counting each action separately, the actual number of users interacting with the video becomes blurred.

Impactstory: Because Impactstory has an author-based model for measuring impact, the way its data sources are used differs greatly from what is described above. In fact, Impactstory uses Altmetric data as just one source for tracking impact. Impactstory uses BASE, Mendeley, CrossRef, ORCID, and Twitter in various ways to round out the author profiles and facilitate open access to tracked publications (profiles.impactstory.org/about/data). One downfall of this system is that it is incumbent upon the author to create and keep an ORCID profile up-to-date, increasing the possibility of in complete information and contributing to the relatively small pool of authors with profiles.