The Power of the Semantic Web
by Barbara Brynko
Researchers and information professionals are accustomed to information overload these days. While the internet may offer more than 110 billion sites, finding relevant data at the right time has turned into the modern-day quest for the Holy Grail. However, the emergence of the semantic web holds the promise of bringing search to an all-new level with technologies that can import and channel data at the most granular level, providing a new frontier in data linking.
As the seven-institution VIVO team continues its work on the National Network of Scientists, funded by the National Institutes of Health, each institution’s team is adding data on its researchers and faculty into its VIVO repository. Ontologies are being created and refined to provide a framework for the categories, terms, concepts, and relationships that can be linked for greater relevance.
“I think it’s been more work than we originally anticipated,” says Brian Lowe, programmer/developer at Cornell University who is helping to transform VIVO from a local people-finding service into a semantic web application. “But I think we’ve been making very good headway.” The first version of VIVO that rolled out in January included ontologies that were quite different from those Cornell originally had in place. Lowe says the process of using “sandboxes” (a common testing environment where everyone can input sample data, raise questions about it, and examine the results) has triggered good inter action among the players. The team had struggled with ways to discern how an institution is actually organized, and the collaboration through the VIVO project has helped them gain insights across the board. This, in turn, results in new ways of finding a commonality for the seven institutions. “And if it works across our seven,” he says, “then our hope is that this is a pretty general framework that will work across many different institutions.”
Working in an ontology world is very different than a traditional relational database schema or XML metadata schema, says Lowe. The process goes beyond creating a specific spot to fill in specific data. “With VIVO, the institutions can put in all kinds of different data,” he says. “We’re working on creating a path where that information can get mapped up into something that can be queried in a reliable fashion with a more abstract layer on top for better linking and associations.”
Ontologies are the key to navigating the semantic web effectively. “An ontology is similar to a database schema but is a more embedded structure where you can define all kinds of different relationships and a hierarchy of classification,” says Ying Ding, an assistant professor at the School of Library and Information Science at Indiana University. The result is a framework with more functionality and more flexible modeling potential than with a relational database. “Based on these ontologies,” says Ding, “additional data modeling and connections can be created from inference, so this is an attractive element of semantic web technology.”
Ding has been researching other projects dealing with ontologies, including the eagle-i Consortium, which she describes as a similar project to VIVO focusing on research instrumentation and lab resources instead of researchers. In order for data between the VIVO and eagle-i to be integrated, Ding is investigating ways that the classifications and ontologies can be aligned and can be bridged between the two projects. She also cited other projects in Europe using this scale of modeling, especially the U.K.’s egovernment portals. So the VIVO team is navigating in some relatively uncharted waters.
“We’re bringing organizations, people and their activities, and even their publications into VIVO, which seemed to be somewhat peripheral to scientific research ontologies at first,” says Jon Corson-Rikert, VIVO creator and head of information technology services at Cornell University’s Mann Library, “but this turned out to be the important glue layer that ties together the more academic and research ontologies.”
Corson-Rikert says that VIVO is concentrating more on the organizational and net working aspects by finding connections between people. However, he considers this a complementary effort. Approaching VIVO with the idea that the seven local sites can extend their individual ontologies is also relatively new in terms of a broad application of this kind of model. “And if we can make that work successfully, I think that will be a major contribution of the project,” says Corson-Rikert. There will always be special cases to work with, “and there’s often the rub of ‘I like 90% of what you do, but I have 10% that I have to do differently.’” The goal is to find the commonality that multiple disciplines and organizations can work with and build upon.
He points to the mashup trend during the past 2 years. People are getting clever with Google maps and tools where they essentially grab part of one webpage and combine it with information from another website. The semantically linked data that is now available to the community can do all of that too, but the quality of the structured data is much more refined with labeled text and defined terminology. “That additional level of structural information will enable the whole world of mashup technology to scale to an entirely new level,” he says.
He doesn’t see the semantic web as nirvana since there are still issues of data redundancy. Plus, there are vocabularies being used differently, and there’s no single global vocabulary of content to model the world. Take fish, for example, says Corson-Rikert. Think about the whole notion of fish as biological organisms, as food, as an industry, as part of the restaurant industry, in terms of nutrition and health issues, and even the marketing and selling of fish. The semantic web won’t clear up all of these problems of context and clarity across these domains, “but we have better technology in which to get data into a common framework, exchange it, and ask machines to translate and connect data,” he says.
Commercial search engines, such as Sig.ma and Noesis, are starting to use semantic search technology. For the past 15 years, using metatags in webpages has been a primitive way to add annotations. RDFs [resource description frame works] are also ways to take that data even further, Corson-Rikert says. RDFs provide more structured tags for information within webpages that Google and major search engines are beginning to pick up, and they now have the technology to read that content to do more intelligent linking, he says.
“We’re not sure at what point the average consumer will know that the semantic web is more effective,” says Corson-Rikert. Search will become more relevant and the connections more targeted. Google is doing research into this, as well as Yahoo! and Bing and other search engines. They may gain more traction in the semantic search industry by being able to do better natural language processing and using these somewhat more structured sources of information to provide better responses.
“I think the significance of what we’re doing is providing more coherence in how data is provided to the national network,” says Corson-Rikert. “We’re trying to use our structure both to deliver better quality data in terms of what we know people are giving us while giving them the flexibility to let them do localized specialization.”