The New Business Intelligence
by Hugh McKellar
Every year, when you walk through the aisles of the KMWorld conference, you hear
a bevy of buzz phrases bandied about: "value proposition," "win-win situation," "pain
points." Then there are the acronyms. You've heard them all: CM, DM, EIP, ERP,
CRM. The list goes on and on, but let's not forget these: EAI, AI. Oh.
This year's acronym was UDM. It's not techno-jive hip-hop for "you
da man," but "unstructured data management." You'll be hearing
that term a lot over the next 12 months, as companies previously
associated with search, taxonomy development, and categorization
software bring analytic capabilities to their solutions.
For example, in a somewhat complicated deal involving technology
assets and human intellectual capital, Intelliseek and Inxight
are deploying technology from the former WhizBang! Labs into
their individual solutions. For Intelliseek, the deal means its
deep search analysis and other robust technology will be enhanced
through the extraction of information from multiple, disparate
data sources in the form of what it calls "facts." Intelliseek
will produce ASP solutions that create analytic tools and structured reports
from vast amounts of unstructured data, such as Web pages,
chat rooms, Microsoft Office documents, and e-mail. In its Smart
Discovery software, Inxight will use WhizBang's fact-extraction
technology, which crawls even dynamically generated Web pages,
classifies them, extracts the entities, and associates them into
a database record.
With its emphasis on ontologies and enhanced metadata, Semagix
takes a different approach to UDM. It aggregates information
from any internal or external source: Web site, content repository,
or relational database. Through the help of human experts and
trusted sources, it builds a domain-specific ontology, which
it calls a "superset" of a taxonomy with classes, attributes,
relationships, and the like, all connected through a semantic
network. The software then enhances the content with inferred
metadata from the ontology. Powerful stuff.
ClearForest takes the auto-tagging approach to mining the riches
of unstructured data. The company's ClearTags software semantically,
structurally, and statistically tags content, greatly enhancing
the number and value of the tags. The process is automatic and
results in the discovery of relevant and related information,
both inside individual documents and between documents in large
document repositories. These richly tagged XML files are ideally
suited when repurposed or repackaged for use in other applications.
They can be stored in a ClearForest knowledgebase, where they
can be further leveraged by other ClearForest analytic software
such as ClearResearch, ClearCharts, and ClearLab.
Stratify automates the process of organizing unstructured information
by using the structure implicit in documents to construct a taxonomy
customized for a business. The company says that when users employ
custom industry standards or third-party taxonomiesor organize
their information using a file server or Web serverthe
system can directly import that existing work and automatically
extend it. The system uses multiple classification technologies
that operate in parallel to classify documents more accurately
than systems that depend on a single technology. Stratify adds
that its technology compares and combines the results from each
classifier to produce the best possible results.
Unstructured data management is arguably the most interesting
technology to watch these days. UDM's value proposition is a
win-win situation guaranteed to eliminate any organization's
Hey, man, you know, just keeping it real. It's all good.
Hugh McKellar is executive editor of KMWorld magazine. His e-mail address