Online KMWorld CRM Media, LLC Streaming Media Inc Faulkner Speech Technology
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Magazines > Searcher > October 2004
Back Index Forward
 




SUBSCRIBE NOW!
Vol. 12 No. 9 — October 2004
FEATURE
Infoviz for Info Pros: Information Visualization Software Tools
by Judith Gelernter, Consultant

Search tools swamp us with relevant results, but we usually only glance at the first few. Why do we tolerate information gaps? Perhaps we should challenge our partial results and not leave to a relevance algorithm the judgments we might make better for ourselves. Information visualization (or infoviz) software groups a wider range of results according to visual cues such as color, size, and shape in order to help us analyze results at a glance and judge what to overlook and what to examine.

Not everyone values data pictured graphically. Psychologists explain that preference for graphics stems from an individual's makeup. A strong left hemisphere of the brain may reflect superior language skills and logical thinking, whereas advanced visual processes are attributed to a strong right hemisphere. Ornstein puts it succinctly in The Right Mind, 1997, when he links left hemisphere dominance to text and right hemisphere dominance to context. Neurologists tend to view hemispheric asymmetry as an oversimplification of human intelligence, yet the model is accepted widely enough to extract some truth from these generalizations1.

Info pros, regardless of their preference for text or context, should be aware of the growing availability and acceptance of context-type infoviz tools. To ease the transition from old to new models of display, most current infoviz applications include text alongside data. The infoviz concept of grouping large data sets graphically even appears in text-only tools such as Vivísimo and TripleHop's MatchPoint with results grouped in categories by subject. However, word categories do not show data trends as easily as visual clusters.

Infoviz is not mainstream. These days, likely as not, the terms "infoviz" and "visualization" are used incorrectly. For example, the title of a June 2003 Economist article ("Grokking the Infoviz") seems to imply that infoviz is a sort of visual representation of cyberspace, which of course it is not. As another example, a February 2004 searchenginewatch.com article described search engines Vivísimo and Zapmeta as visual, presumably because these two allow data visualization in ways other than via the standard list2. In fact, neither Vivísimo nor Zapmeta exemplifies information visualization. Will infoviz ever go mainstream? The impression of readers such as yourselves may help determine its fate.

A Young Field

Visualization, according to Inxight CTO Ramao Rao, is "an ingredient technology, not an application in itself." Infoviz comprises 2-D visualization (including GIS programs such as MapQuest), 3-D visualization (such as the Visible Human Head Browser from the University of Michigan), multidimensional data (such as the HomeFinder application developed at the University of Maryland), hierarchical data (as demonstrated by Inxight), and temporal data (as displayed by Vision)3. This article concentrates on 2-D visual interfaces to the sort of information sources that constitute the daily diet of the info pro.

It has been said that information visualization branched off from scientific visualization only 10 or 15 years ago. Infoviz may depict either physical or abstract data, whereas visualization in science mostly depicts physical data (see http://www/infoviz/net/E-zine/2003/num_112.htm). One could maintain, however, that infoviz is more than a decade old and that the present generation of interfaces derive from spreadsheet software. A visible calculator — VisiCalc — emerged in the late 1970s with data organized in columns and rows. The point of the spreadsheet was to offer an overview of the data. It was so successful in the '70s that it became the incentive for many to purchase a personal computer. In the late 1980s, Excel was one of the first programs released for the Windows operating system4. Only this spring in a telephone interview (dated 3/9/04), anacubis product manager Paul Stefan named Excel as one of his product's greatest competitors.

A trusted voice in the interdisciplinary infoviz arena is that of Yale computer science professor emeritus Edward Tufte. Professor Tufte discusses strategies to reconcile display complexity and clarity in his books such as The Visual Display of Quantitative Information (now in its second edition). Creators of computer interfaces have adapted these strategies. (See http://www.edwardtufte. com/tufte/index for information on single-day courses on the design of information.)

A more software-oriented approach appears in the writings and graduate courses of University of Maryland Computer Science Professor Ben Shneiderman. Professor Shneiderman is advisory editor for the journal Information Visualization, established in 2002. Along with Ben Bederson, he edited The Craft of Information Visualization: Readings and Reflections, 2003, a collection of papers by scholars in the field.

As Professor Shneiderman put it in the introduction to The Craft of Information Visualization, the point of infoviz is to "improve the experience of people using computers, and make that experience more effective and enjoyable." The economics of selling these software packages leans on "effective." Infoviz companies aim to add value by showing a large amount of data on an information map that can be assimilated at a glance.

The challenge is to develop easy-to-learn systems. Experimental and mathematical psychologist James Wise has studied human color perception and determined that hue alone does not make nearly as effective an impression as does hue along with brightness and saturation of color against a background. And color is only one component of visual language. No language is self-evident, and invariably learning a language involves dealing with some cultural component5.

The testing ground for standardizing visual options may be the desktop. The standard WIMP system (Windows, Icons, Menus, and Pointers) has seen no significant changes since the 1980s. The next version of Windows scheduled for release in 2006 will employ an animated desktop. If infoviz integrates toolbars with browsers on the desktop, it might help conquer the learning curve and make a market splash. A product which adopts the visual language of Microsoft, such as the lowercase "e" of Internet Explorer to symbolize a Web site, will move ahead of the curve. Even so, the eventual dominance of one infoviz product or another will depend upon the effectiveness of its marketing as well as the product's usefulness and how easy it is to learn.

A January 12, 2004, Wall Street Journal article reported the expansion of the market for visualization software. The article speculates that this might be due to more powerful PCs that in turn allow more powerful programs. Processing power was the limiting feature on visualization as recently as 5 years ago. But in today's tight economy, some executives low on staff turn to infoviz software for overall corporate pictures or sales analysis6. The Wall Street Journal prediction is echoed by those in the information industry. A January 2004 article in Information Today quotes projections for 2004 from Clare Hart of Factiva and Allen Paschal, formerly of Gale, that predict the increased use and popularity of visualization tools (Information Today, vol. 21, no. 1, January 2004, pp. 1, 13, 21—26, 29).

Information professionals have had mixed reactions to infoviz. Our own Barbara Quint, editor of Searcher, mentioned this spring that she hesitates to adopt infoviz tools because of her preference for verbal over visual presentation. Other left-brainers with similar preferences should rest assured that text options accompany visuals in the present generation of infoviz products. In an EContent article from last year, Mary Ellen Bates wrote that she values visualization tools to answer the more general questions she receives on market trend analysis.(See "Search Show Offs," EContent, vol. 26, no. 6, June 2003, p. 27.) A skeptical Stephen Arnold wrote for Searcher ("In Search of . . . the Good Search: The Invisible Elephant," Searcher, vol. 11, no. 3, March 2003, pp. 40-51) that he believes such tools "won't do much more than turn off-point hits into an interesting picture." He emphasized that we really need better focus for off-point hits, preferably the result of cleaning up data and improving metadata. His remarks magnify the ideas of Donald Beagle, who declared in his article "Visualization of Metadata" (Information Technology and Libraries, vol. 18, no. 4, December 1999, http://www.lita.org/cfapps/archive.cfm?path=ital/1804_beagle.html) that "visualization research has advanced further and faster on the interface side than on the content side" and that results will improve when metadata processing improves.

But what kind of results do we seek? Do info pros use search tools as quick reference agents to find specific answers to pointed questions or for research to see what we can find about a topic? Ron Miller remarked in an April 2004 article on visual search: "...[S]trengths lie in the research tool market as opposed to pure play, search-and-find tools."7 Infoviz advertising pitches products at research rather than quick reference, playing up the "information discovery" qualities of "decision-making software." A few industries have begun to see the utility of infoviz for very large data sets. Karl Fast of the University of Ontario, who gave an infoviz presentation at last year's ASIST conference, may turn out to be right in his optimism for the future of this industry. (For a transcript of "Information Visualization: Failed Experiment or Future Revolution?," presented at the 5th Annual ASIST Information Architecture Summit, February 27-29, 2004, see http://www.livingskies.com/writings/2004/ia-summit/.)

Infoviz in Action

Infoviz products can display results quickly when words match words. That is, a text query might match with object metadata rather than the query coursing through the full text of a document or examine the object itself. While speeding the search, this places a burden on the accuracy of the metadata. Images generally include metadata in text form to speed up the matching. Tools are not yet sophisticated enough to match the term "apple" to a red — or green or yellow — roundish form. Currently there is an engine in development to search three-dimensional objects in which the user sketches what he seeks8.

The newness of the software adds to the complexity of the industry. A single company may produce a range of related products with versions of the flagship product continually updated and improved. Young infoviz companies share technologies among themselves. Furthermore, while some infoviz tools include their own search mechanisms, others work alongside search engines to uncover results and then fit the results to its own categories and show these categories visually. Such complexities, collaborations, and partnerships should simplify over time.

Graphic environments differ with different infoviz tools. KartOO and anacubis use a maplike space. Mooter is very basic in its use of line. Touchgraph Technology LLC, the basis for applications such as Inxight, TheBrain, and ThinkMap, and the similar linear-looking Spotfire, appears more like charts9. Grokker and Fractal:PC rely on abstract shape and decorative color. Panopticon uses a type of heat map in which colors represent data records in rectangular "countries." Antarctica uses color to show overlapping subjects. Autonomy and Omniviz offer a selection of graphic environments.

Function varies as widely as graphic display. Spotfire is used in academia and also by industrial chemists and biologists. Fractal:Edge provides visual interfaces to information providers, financial services, manufacturers, utilities, and telecommunications. Enterprise products such as Inxight, Nstein, Panopticon, Autonomy, and Antarctica play to a larger corporate market.

I have limited this article's examples to KartOO, Grokker, and anacubis, products that have small-business or desktop versions and whose company representatives made themselves available for interviewing. Though similar in function, the three are not equivalent. KartOO v. 4 is dedicated to Web search; Grokker will look out to the Web or into a local hard drive. anacubis will search the Web or a local information system or drive and can deftly combine different information sources into a single view. To help readers compare products, I submitted the same search term to all three and set Grokker and anacubis to overlay Google (not an option available for KartOO). These conditions allow a comparison of categories and Web sites retrieved and show how relevancy works. Please remember that despite the equivalent search, we are in a functional sense comparing apples to oranges in that each product has its own strongest applications and best recipes for success.

KartOO

The French company KartOO, founded in 2001, specializes in information retrieval, knowledge management, site monitoring, and visual interfaces. The name suggests cartography and art and to match the Internet double "oo" of Yahoo and Google. The metasearch engine is open to a general audience, while the genie that flashes as the engine works its magic should appeal especially to children. The imagery changes regularly according to the graphicist's ideas. Past Kartoons are on view in an online gallery at http://www.kartoo.net/a/en/visuels03.html. KartOO v. 4 was released in November 2003.

In a personal communication, marketing manager Alexandre Dos Santos explained a major company goal:

For years now, WWW surfers have been disappointed by the quality of search engine results. One day we asked ourselves the question: what does a relevant result mean? It all depends on the question and context. For example, if a user types the word "car," what kind of site do we have to offer? Pages about the car manufacturers, rental agencies, and car collectors. A traditional search engine returns all that in a linear list of numbered sites, without asking more of the user. But if I'm looking for rental car, I will be very disappointed to find a list of manufacturers and collectors. We need to alter the experience in two ways. First, [it must] yield results by search themes and if possible to position them in relation to each other, with links to the sites with those themes. This is where the idea of the map comes from. Second, we must help the user refine her search, always remembering that she is not a Boolean mathematician and that she shouldn't have to learn all the advanced syntax.

The metasearch engine KartOO can query Viola, AlltheWeb, AltaVista, MSN, Yahoo!, WiseNut, HotBot, Lycos, Nomade, Toile de Québec, Exalead, Dmoz, Teoma, and LookSmart. In default automatic search mode, KartOO bases its choice of search engine on the language and syntax of the query term(s). Otherwise, in manual mode, the user chooses which search engine(s) KartOO should query. KartOO retrieves results from the first page of an engine's display list and from additional pages if necessary to meet a minimum of 50 results for a map. KartOO then determines the number of sites to be displayed based on score and relevance. The graphics of the map are taken into memory to speed display. Users may consult a history of searches, print a map, send a link for a map to an e-mail address, or download a KarTOOlbar to give easy access to the main functions.

Figure 1 below shows a map for a search on "infoviz." The larger the icon, the more relevant the site. Here the icons are Web sites and the smaller adjoining icons are pages from the same Web site. If a site has a word in common with the query, the map may not contain that word in order to save space. KartOO divides the first 12­14 results into categories including (information) visualization, graphics, topics, projects. Mouse over the category "visualization" to see how it relates to other categories and search results (see Figure 2 on page 56).

Should you wish to delve into related search categories, choose "—> next map" on screen at the bottom right to yield a new set of categories and related sites (see Figure 3 on page 57).

Alternatively, change the result display from graphic map to text list. Or keep the map and turn your attention to the categories listed in the left-most column and site addresses in the right-most column (Figure 3).

Grokker

Grokker's name comes from the invented verb "to grok"— to understand profoundly through intuition. The term was coined by Robert Heinlein in his 1961 science fiction story, Stranger in a Strange Land. The name of the company, Groxis, is an elision of "grokking systems."

Grokker, just like KartOO, has been developed for a general audience. According to CTO Jean-Michel Decombe, "My goal is to make a beautiful system so that people feel happy about using it" (personal communication). To its credit, Grokker is not just another pretty interface. Its colors organize document clusters on any general subject.

The technology for information retrieval is built in four layers, with the foundation layer holding data graphed with nodes and links. Above is the acquisition layer with plug-ins to allow the program to work with different information sources. Above that is the augmentation layer that adds categories to the data to enable clustering. On top is the transformation and visualization layer that uses metadata to filter results and then display those results in a map.

Grokker 2.1 can retrieve results from several engines at once —Yahoo!, MSN, AltaVista, FAST, and WiseNut — and includes a plug-in for Google. The company plans to start releasing plug-ins regularly to allow users to query specialized databases such as LexisNexis. It will also launch a beta version of a software development kit that offers Grokker's APIs to those who wish to build custom connections to other information sources. Future developments aim to speed grokking. The long-term plan is for the software to use sounds and sensations to draw the user's attention to relevant documents.

Grokker may be used to search an information source or local drive. In the example below, it overlays Google. It takes only a few trial clicks to determine what the control buttons do, so first-time users are unlikely to need the "Learning the Basics" screen presented when the program opens. The user chooses whether the map should occupy half, all, or none of the screen. Results display in a standard text list when users select the option for the map to occupy none of the screen. The visually inclined user sets text, range, color, and site-type filters before entering a term into the search box and clicking "grok." The grok might take a few seconds if the search is in the computer's cache or up to a minute for a phrase not attempted before. As the results come in, the quivering colored balls that represent categories appear one by one, jiggling and jostling each other as they make way for themselves within the spherical boundary. Larger-sized balls indicate either more items in a category or the greater relevance of a category to the search term.

A search on "infoviz" over Google with Grokker yielded Figure 4 (see page 58).

Sixteen first-level categories come up for the 460 items retrieved; the whole subject hierarchy requires 415 categories. Visualization is the largest category: the most relevant with the most results. Selecting the Visualization category shows the action of zooming in on that sphere (see Figure 5 on page 58).

This map thus enlarges the Visualization sphere in the previous map, retaining the outline of the exterior Infoviz sphere and the pea-green color in the scheme as shown in the previous map. Each sphere represents a category open for further mining. Squares represent sites. Mouse over a site square to pull up an overview of the site with name, description, location, domain, source, and rank to help judge whether to jump to that site. See for example, Figure 6, "Understanding Information Collections with Maps and Visualizations," on page 59.

Maps may be shared or saved in .gxml format that allows viewing only in Grokker. Version 2.2 works with Windows, Mac OS, and Linux. A 30-day free download for the PC is available on the Web with support available via e-mail.

anacubis

The i2 Group released investigative analysis software 13 years ago to aid national and international law enforcement and intelligence organizations. Now anacubis, a privately held subsidiary of the i2 Group, has a related product designed specifically for applications in law and business: mergers and acquisitions, risk management, competitive intelligence, patent analysis, and other forms of market research. The name anacubis derives from Analytical Cubism, a painting style developed by Picasso and Braque in which the different parts of the image are deconstructed into their components and given equal ground.

anacubis Desktop 2.0, a visual research and analysis product, was released this spring from $1,950 per seat. Plug-ins offer analysis for specific applications such as searching intellectual property and patents and are sold at $750 per subscription. The second is a Web-based version that has similar visual capabilities but less analysis functionality and is the solution used for displaying search results from Google and other information sources mentioned below.

Both products match user criteria to metadata from commercial information providers, Web sites, or in-house data, consolidating data with filters and linguistic analysis and clustering results using preset taxonomies. Results display on an XML map in peacock, hierarchy, or group form animated with a Java applet. Selected information services from D&B, Hoover's, LexisNexis, Questel•Orbit, and Google work with the anacubis interface.

Using the Web-based version, the user sets the number of sites for display using anacubis' Google-Enabled Visual Search at http://www.anacubis.com/googledemo/google/index.asp, a free demonstration released by anacubis to showcase its visualization technology. Figure 7 (see page 60) shows a peacock form result screen for "infoviz." Retrieved objects in anacubis are termed entities and returned as unexpanded icons. Here the user chose to expand the golden "e" Web site for "Information Visualization at Pacific Northwest National Laboratories." The URL displays when the cursor hovers over the site, as Figure 7 shows.

Green lines indicate linked sites. The "expand linked sites" command brings up sites linked to the site selected.

Alternatively, the user may bring up sites that are similar but not necessarily linked to the sites selected. Red lines indicate similarity, and the "expand similar sites" command yields the Figure 8 screen (see page 60). The double colon or vertical bars that appear in the site title are an anomaly of the publicly available API provided by Googe and integrated by anacubis into the visual search. The punctuation results from the page title on the Web site concerned that may use characters that the API does not understand.

The anacubis system does not organize results in intermediate categories as do KartOO and Grokker. Instead, it offers a "Find text" feature to search within a map. This helps draw the eye to relevant results in maps with a large number of entities. Enter "visualization" into the "Find text" dialogue box.

As you can see in Figure 9 on page 61, the Find search can direct the gaze to sites with "visualization" in the title.

A left click on an entity produces another map with sites on more specific topics; a right click produces the option of visiting the Web page. The user may zoom into or pan out of a view, add or delete links, move nodes around on a map, or, as demonstrated above, search within a map for specific entities or link labels. Internal information can dragged onto anacubis Desktop.

The anacubis Desktop is available for a 10-day trial at http://www.anacubis.com/products/desktop. A free download called View Manager can reveal charts created in the Desktop. The company recently announced a new search demonstration based on Hoover's data at http://www.anacubis.com/hoovers. Both the Google and Hoover's searches are also integrated into the anacubis Desktop as free information sources.

Conclusion

Infoviz software uses quantitative data to reveal trends probably undetectable in raw textual or numerical output. Content retrieval research lags behind visualization to the extent that, at this point, queries and not graphics limit the infoviz software's power.

Of the three products surveyed here, KartOO and Grokker use generic taxonomies for general documents. Grokker allows users to create their own categories to apply the tool for more specific data sets. The KartOO genie and Grokker animated spheres endear them to the young. The anacubis Desktop, with its nuanced choices and specialized taxonomy, is geared to those in business and finance.

Infoviz tools tend to be developed for a specific application or audience and then expanded to a more general application or wider audience. This approach in the marketplace is considered slow diffusion rather than killer app, just as the i2 Group began within law enforcement and expanded into anacubis legal and business applications.

When looking to widen the market, developers must also consider the colorblind, the graphically challenged, and most of us who instinctively resist a new look and prefer the familiar. But with an animated Windows desktop due for release in 2006, visual language might be borrowed from Microsoft and the way paved for the public to encounter graphical interfaces commonly. Today's infoviz market picture is encouraging, even while it remains blurry

How does infoviz add value?

• Comprehensive. Displays very large data sets and affords a concise overall picture.

• Context. Patterns and trends are displayed that would not be otherwise discernable.

• Colorful. Pleasing to the eye.

 

 


Endnotes

1 Robert E. Ornstein, The Right Mind: Making Sense of the Hemispheres, New York, 1997. Corballis, P.M., "Visuospatial processing and the right-hemisphere interpreter." Brain Cognition, vol. 53, no. 2, November, 2003, pp. 171­6.

2 "Grokking the Infoviz" Economist, vol. 367, no. 8329, June 19, 2003. http://economist.com/science/tq/
displayStory.cfm?story_id=1841120
.

Chris Sherman, "ZapMeta: A Promising New Meta Search Engine," dated February 26, 2004, at http://searchenginewatch.com.

3 See "Making Information More Accessible: A Survey of Information Visualization Applications and Techniques" by Gary Geisler, last updated January 31, 1998, at http://www.ils.unc.edu/~geisg/info/infovis/paper.html and a list of information visualization software from the University of Maryland at http://www.cs.umd.edu/hcil/pubs/products.shtml.

4 D. J. Power, "A Brief History of Spreadsheets," DSSRe
sources.COM, World Wide Web http://dssresources.com/history/sshistory.html, version 3.5, October 4, 2003.

5 On color, see James A. Wise, "The Ecology of Colour," Inf@Vis!, No. 129, September 15, 2003, at http://www.infovis.net/E-zine/2003/num_129.htm. On visual language, see Juan C. Dürsteler, "Visual Language," Info@Vis!, No. 120, May 5, 2003,
http://www.infovis.net/E-zine/2003/num_120.htm. Also see dissertation by Yuri Engelhardt, the Language of Graphics, 2002.

6 Jeanette Borzo, "Get the Picture: In the Age of Information Overload, Visualization Software Promises to Cut through the Clutter," Wall Street Journal, January 12, 2004, p.R.4.

7 Ron Miller, "Get the Picture: Visualizing the Future of Search," EContent, vol. 27, no. 4, April 2004, p. 35.

8 Brian Bergstein, "Researchers Develop 3D Search Engine," ExtremeTech, April 16, 2004, http://www.extremetech.com/article2/0,1558,1569245,00.asp.

9 http://touchgraph.sourceforge.net/index.html. TouchGraph has developed a Java browser for Google at http://www.touchgraph.com/TGGoogleBrowser.html.


       Back to top