on the net
photo Greg Notess
Reference Librarian
Montana State University

Plenty of search sites have

some multimedia offerings,

but the issues of searching

within that content are

different than with regular

Web searching

Searching Beyond Text: Issues with Multimedia Searching

ONLINE, September 2000
Copyright © 2000 Information Today, Inc.


As the Internet grows ever larger, the sheer quantity of textual information continually increases. Basic text, in ASCII, HTML, or PDF, probably makes up the bulk of the information that we all handle on a daily basis. But the image, video, audio, and multimedia capabilities of the Web create information opportunities beyond text.

With today's high-powered desktops, fast Net connections, and the proper browser supplements, all manners of image, audio, and video files can be viewed, heard, and seen. There are plenty of Web sites with large collections of such files. But searching for the information content contained within these is a more difficult matter. This is due in large part to the nature of searching such files. Plenty of search sites have some multimedia offerings, but the issues of searching within that content are different than with regular Web searching.


When searching for Web pages, the general Web search engines have the advantage of a database of textual information. Computers have been able to process textual information for decades. To find a Web page containing a specific word, just enter that term into a search engine's search box and it will try to find pages that contain the particular word.

Text searching is made easy since most textual information is presented on the computer in text characters. To find the word "search" in a computer file, the search engine simply needs to find the characters s, e, a, r, c, h right next to each other and in that order.

That is a bit of an oversimplification, and there are many variations and extra features available to make text searching far more complex. Yet when compared to the problems of searching non-textual information, the character matching of text searching is elegant in its simplicity.


Many Web pages demonstrate the innate difficulties of matching information content within images to the words used for searching for that content. Take a typical Web site for the imaginary company Widgets, Incorporated.

Somewhere near the top of the main Web page, there is a graphic image of the company's logo. That graphic image might contain the Widgets, Inc. company name along with a slogan such as "We Wrangle Widgets". If that Web page doesn't contain that slogan in a text format, and if the image does not include it in the alternate text portion of the image tag, the search engines will not make a text match for that page when a searcher enters the slogan. Although we humans can see the words within the graphic, the computers cannot.

Thus, even when a graphic is a picture of a word, that word is not searched. So then, what can be searched for images? We come back to descriptive cataloging. An image will at a minimum have a file name. The height and width of the image should be embedded within the HTML coding, but if not, those measurements can be automatically determined. The picture can be given an alternate text tag, which could be interpreted as a title or description.

A database of cataloged images can have additional information included and real cataloging. For example, the records could list the artist, the date of creation, the style, the theme, the colors, the reproduction technique, etc. Yet cataloging the information content within images is not a simple task.

Think of a picture of two people walking a dog along the shore of a body of water. All sorts of descriptive information could go into a cataloging record: the peoples' names, ages, occupations, and relationship; the dog's breed, grooming, and colors; descriptions of the activities; the location; the name of the body of water. And yet someone searching for such an image might actually want it because to that person the picture implies serenity. Or they might want it to illustrate a Web site for a pet store, a realtor, or a retirement community. How can all possible or even likely uses for a particular image be cataloged? They can't.

Even so, providing additional descriptive information about images provides more opportunities for retrieval. Some of the searchable image databases on the Web have cataloged images while others just search pictures from Web sites. As with so many searches, understanding the scope of the database can help make searching far more effective. Yet, as with so many Web search sites, the scope of the underlying databases is rarely clearly identified, and it is then left up to the searcher to try and determine those boundaries.


The same issues apply to audio files. While the MP3 format is now the most common audio file format, there are still plenty of sound bytes available in Real, .wav, and .au formats as well. Regardless of the file format, these sound files are like image files in that the content of the file is disguised in a non-searchable code. For example, a short .wav file viewed in a text editor includes a character string such as ƒƒ, ƒ,,___ . If you have looked at image files, they use similar nonsensical characters: BåÍOÏœ7nÁ'ÏqK'X0_x#ü<LÁ.

Within all this binary coding, many audio files indeed contain words. On the information side of audio files, talk shows and news reports consist primarily of words. But there is not yet an easy way to search for words within the text of all the files, although more options are becoming available for some.

Like the images, the sound files can be indexed by their names. Unfortunately, if it is simply an embedded or linked audio file on a Web page, there may be no additional information about it. The Real audio files may have some descriptive information included, such as the source. Other metadata could be included in audio files, but that requires more effort on the part of the content creator.

To fully index the content of audio files generally requires having a transcript of the session in a computer-readable text format. That way the words can be indexed. And that is the primary way that audio files are made searchable.

There are some exceptions. With voice recognition software, some automated indexing of audio files is possible. SpeechBot at http://speechbot.research.compaq.com/ is one of the better known experiments using speech recognition technology to index popular U.S. radio shows. It has indexed over 4,000 hours of shows. And the site does well in documenting its technological approach and the limitations of that approach.

At SpeechBot, the speech recognition software is used to create a transcript of the show. However, as its help file notes, the transcript "rarely matches what was spoken exactly." Yet even for all the inaccuracies, it can provide access to content that may not otherwise be searchable.

Other than this kind of automated approach, the creation of transcripts is a time-consuming and labor-intensive process. For that reason, only certain audio files will have searchable transcripts. Until the automated technology matures, searchers will have to determine the information content of audio files by their surrounding clues.


Combine the difficulties of searching for images with the transcript difficulties of audio files to gain a sense of the exacerbated problems of searching for content within a video file. Like audio, video comes in a variety of file formats including AVI, MPEG, QuickTime, Windows Media, and Real. Not only must you have an appropriate player, but with the frequent upgrades to the video-playing software, some files will only work with the most current version.

At this point, most of the videos on the Net are short, small, and rather shaky. And once again, the content is not easily indexed. Like the audio files, the video files typically are either linked or embedded on a Web page, which means the general Web spiders can index their file names and possibly the anchor text that links to the files. However, the content of the files is inaccessible to the search engine unless a transcript is available as well.

For information content, news videos have the most to offer. Fortunately, many news broadcasts do have a transcript service available. In those cases, the news site may indeed have searchable full text of the broadcast. However, do not assume that a search option means that the broadcast content can be searched. One quick way to check is to view the video for a few minutes and then try searching one of the words or phrases you heard.


For both audio and video files that do have searchable transcripts associated with them, synchronization is an issue. Especially when the files are lengthy, it can be difficult to find the exact spot in the multimedia file where the search terms occur. In text sources, such as Web pages and even text PDF files, internal search functions make it possible to pinpoint the use of the search terms on the page.

Within an audio or video file, there are only rarely internal search functions. Instead, the more common practice, especially for the news-related audio and video files, is to break the long broadcasts into segments. For example, the National Public Radio site (http://www.npr.org) makes its news radio program "All Things Considered" available in audio format. Rather than presenting it as one long 90-minute file, it is broken out in segments by story, each of which might be about three to five minutes.

This segmentation helps, but it does not pinpoint the search term in a way possible with text files. That is not to say that such an approach cannot be done with audio and video files. When the Starr Report came out, AltaVista made the Clinton testimony available in a fully searchable video file. And the search terms were synchronized directly with that part of the video. Yet while that example shows what can be done, at this point it is a time-consuming process and not yet readily available for most multimedia files. In the meantime, the segmenting approach seems to be the best we can expect.

These are all issues to consider when searching for information contained in non-text sources. The tools are still in the early stages of their technological development, but are all likely to see frequent change over the next few years.

At the 2000 Search Engine Meeting in Boston, Bill Bliss of MSN Search mentioned trying to move MSN Search beyond the "tyranny of the text match." While such an intent is to get beyond simple text matching of search terms with the occurrence of those terms on a text page, it provides an in- teresting counterpoint to searching for the information content of images, audio, video, and other non-textual information sources. In the beyond-text world, simple text-matching capabilities would be a great stride forward.

Greg R. Notess (greg@notess.com; http://www.notess.com/) is a Reference Librarian at Montana State University.

Comments? Email letters to the Editor at editor@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2000, Information Today, Inc. All rights reserved.