Looking for Good Art, Part 2: Image Retrieval

Online

KMWorld

CRM Media, LLC

Streaming Media Inc

Faulkner

Speech Technology

Other ITI Websites

American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Enterprise AI World Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Unisphere Research

Magazines > Searcher > October 2004
Back Index Forward

SUBSCRIBE NOW!

Vol. 12 No. 9 — October 2004

FEATURE
Looking for Good Art
Part 2: Image Retrieval
by David Mattison
Access Archivist, British Columbia Archives, Royal BC Museum Corporation
[Part 1] • [Part 3]

The infrastructure behind art image databases forms the core of Part 2 of this series of articles. The metadata problem, both the lack and inaccuracy of information describing image content, poses a significant challenge for both searchers and image database developers. Without detailed image content metadata, search options may be limited to the kinds of "advanced" search options you see on Google Images or AltaVista: text in the file name, file size, file format (limited in Google to JPEG, GIF, or PNG), and whether the image is color, grayscale, or black and white. Google's Image Search FAQ indicates that the search engine places more weight on the image's context within a Web page, especially textual information in the image hyperlink (the IMG SRC tag).

Image Search Engines

Image search engines fall into two categories. General search engines such as Google and AltaVista index image file format and property data, that is, metadata about the image file itself, but not the image content directly. The content connections come from the language surrounding where the search engine found the image — in other words, content-in-context indexing. Specialized search engines attempt to analyze image content by applying various techniques such as edge detection (object outline); color, shape, and texture comparisons; and other basic values and properties. Most content-based image search engines work on a search principle called Query by Example. The user selects from an existing set of images and the search engine attempts to produce matches from its image collection. In their review of content-based image search engines, Gevers and Smeulders point out that "As data sets grow big and the processing power matches that growth, the opportunity arises to learn from experience. Rather than designing, implementing, and testing an algorithm to detect the visual characteristics for each different semantic term, the aim is to learn from the appearance of objects directly." Many of the specialized search engines depend on the same techniques used in identifying objects in other fields, e.g., satellite detection, fingerprint matching, etc.

As far as I could tell, most of the sites I listed in Part 1's Table 1 (AlltheWeb Pictures, AltaVista, Ditto Images, Google Image Search, Freenet.de Bildersuche, Ithaki Multimedia Meta Search Engine, Ixquick Metasearch, Lycos Multimedia Search, Picsearch, Yahoo! Image Search) do not access any academic or art museum databases. AltaVista, however, does let you directly query Corbis, the commercial image agency started by Bill Gates in 1989 that includes a large body of historic art.

Nicolas G. Tomaiuolo provided a thorough, general overview of image search engines in "When Image Is Everything" (Searcher, January 2002; https://www.infotoday.com/searcher/jan02/tomaiuolo.htm). He updated his article for a chapter on the same topic in his book The Web Library (Information Today, Inc., 2004); the Web site for the book includes additional sites for the two chapters that cover image searching and art images [http://www.ccsu.edu/library/tomaiuolon/theweblibrary.htm].

Content-Based Image Retrieval: The Holy Grail of Information Retrieval

Content-Based Image Retrieval (CBIR) is a large field, taking in other domains such as computer vision and pattern recognition. For a sense of how the content-based image retrieval field evolved, you might want to look at these four surveys of early and current CBIR systems and technological issues:

• Content-Based Image Retrieval: An Overview by Theo Gevers and Arnold W. M. Smeulders (Faculty of Science, University of Amsterdam, June 2003) [http://carol.science.uva.nl/~gevers/pub/overview.pdf].

• Content-Based Multimedia Information Handling: Should We Stick to Metadata? by Paul Lewis, David Dupplaw, and Kirk Martinez (Cultivate Interfactive, February 2002) [http://www.cultivate-int.org/issue6/retrieval/].

• Content-Based Image Retrieval Systems: A Survey by Remco C. Veltkamp and Mirela Tanase (Utrecht University, March 8, 2001) [http://www.aa-lab.cs.uu.nl/cbirsurvey/cbir-survey/cbir-survey.html], lists all known CBIR systems as of late 2000.

• Content-Based Image Retrieval: A Report to the JISC Technology Applications Programme by John P. Eakins and Margaret E. Graham (Institute for Image Data Research, Northumbria University, Newcastle, January 1999) [http://www.unn.ac.uk/iidr/report.html].

For more background information on the field, try academic and private research institutes involved in CBIR via their publications pages, site-search engines, e-print servers, institutional repositories, or through new kinds of academic search services such as OAIster (now indexed by Yahoo! Search). SFUjake [http://mercury.lib.sfu.ca/~tholbroo/sfujake-mason/search.html] can identify specific journals devoted to image recognition and retrieval, where they're indexed, and their availability in full text. The Association for Computing Machinery (ACM) Portal [http://portal.acm.org] also offers a fruitful resource for current and past research in CBIR.

Some of the European, British, and U.S. CBIR research centers include:

• The Institut National de Recherche en Informatique et en Automatique (INRIA, France) IMEDIA Project [http://www-rocq.inria.fr/imedia/index_UK.html] and its IKONA software. One of the demonstration IKONA databases contains art images [http://www-rocq.inria.fr/cgi-bin/imedia/ikona].

• Germany's Center for Computing Technologies (TZI) [http://www.tzi.de] offers a demonstration of its PictureFinder application [http://www.tzi.de/bv/pfdemo] for large image databases. It was designed for the "automatic indexing and annotation of images in a specific domain" [http://www-agki.tzi.de/bv/projects/index.html?project=picturefinder&site=short&lang=en].

• Greece's Informatics and Telematics Institute [http://www.iti.gr] has developed a number of CBIR applications, including the SCHEMA Network of Excellence in Content-Based Semantic Scene Analysis and Information Retrieval [http://www.schema-ist.org/SCHEMA/] and ISTORAMA: Content Based Image Retrieval over the Internet [http://uranus.ee.auth.gr/Istorama/]. Although the two SCHEMA demonstration systems [http://media.iti.gr/site/Schema/schema.php and http://media.iti.gr/SchemaRS/systems/xm/index.html] utilize photographs of natural objects, the SchemaRS system, based on MPEG-7, an audio-visual content description standard [http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm], contains photographs of art objects contributed by Fratelli Alinari.

• From Italy's Istituto Trentino di Cultura comes COMPASS (Computer-Aided Search System) [http://compass.itc.it], "a distributed application for content-based image retrieval using remote databases." Among the downloadable demonstration databases is one of fine art images (Windows, Linux, and Sun Solaris versions).

• Among the achievements of the Intelligent Sensory Information Systems Research Group, University of Amsterdam [http://www.science.uva.nl/research/isis/isisNS.html] in the realm of Web searches for images are PicToSeek and PicToVision. You can try these out through the Java-based demonstration ZOMAX site [http://zomax.wins.uva.nl:5345/zomax/ or http://www.science.uva.nl/research/isis/zomax/].

• The Intelligence, Agents, Multimedia Group (IAM) [http://wwwcosm.ecs.soton.ac.uk/], Electronics and Computer Science Department, University of Southampton, U.K., continues to generate a wealth of e-prints on CBIR, and links to various projects.

• The Institute for Image Data Research [http://www.unn.ac.uk/iidr], Northumbria University, Newcastle, U.K., worked on a visual search CBIR tool from 2000 to 2002 for the AHDS Visual Arts image collections.

• Along with several other content-based multimedia retrieval projects, Columbia University's Digital Video/Multimedia Laboratory [http://www.ee.columbia.edu/dvmm] created the demonstration WebSEEK: A Content-Based Image and Video Search and Catalog Tool for the Web [http://www.ctr.columbia.edu/WebSEEk/ and http://www.ee.columbia.edu/dvmm/researchProjects/MultimediaIndexing/WebSEEK/WebSEEK.htm]. The lab also collaborated with the university's teachers' college to create a visual arts teaching tool called EdSearch that incorporates user sketches as part of the database query.

• The University of California at Berkeley's Digital Library Project, working with Berkeley's Computer Vision Group [http://elib.cs.berkeley.edu/kobus/famsf/model_2/text_and_blobs/bbox.html], developed a nameless demonstration, Java-based image-content browser using artworks from the Fine Arts Museum of San Francisco.

• The RIEMANN Project (Research on Intelligent Media Annotation) [http://wang.ist.psu.edu/IMAGE/], also described as "automatic linguistic indexing of pictures," contains a sample database of art images and features past and current CBIR work by professors James Z. Wang and Jia Li at Pennsylvania State University. Dr. Wang "developed an art image retrieval system for the Stanford University Libraries ... [and] later worked for the IBM QBIC project." Professor Wang's most recent investigation into applying machine learning techniques for image retrieval started in August 2002: Advancing Digital Imagery Technologies for Asian Art and Cultural Heritages [http://art.ist.psu.edu], which also brings together some of his collaborators' work, includes a version of the SIMPLIcity (Semantics-sensitive Integrated Matching for Picture Libraries) [http://wang.ist.psu.edu/~jwang/amico], a demonstration database that uses thumbnails from the AMICO (Art Museum Image Consortium) collection.

Experimenting with Free Image Retrieval Software

If you'd like to try content-based image retrieval and have the right computer components, imgSeek [http:// imgseek.sourceforge.net], an open source application, contains some of the functionality used in IBM's QBIC technology. The related imgSeekNet project [http://imgseek.sourceforge.net/net/], available in prototype form on client-server architecture, hopes to create "a distributed content-based image search engine or peer to peer network." The GIFT software (GNU Image Finding Tool) [http://www.gnu.org/software/gift/ or http://savannah.gnu.org/projects/gift] from the University of Geneva is another open source CBIR system based on the "query by example" model. These products appear to work best with representational photographic images, rather than fine art images. While not a content-based image retrieval tool and designed specifically for digital photographs, Eamonn Coleman's Windows-based free PixVue image management software [http://www.pixvue.com] deserves mention because of its support for JPEG and TIFF metadata, along with PixVue's integration into Windows Explorer.

More Image Database Search Tools

As of July 2, 2004, the University of Michigan's OAIster [http://oaister.umdl.umich.edu/o/oaister/] service contained more than 3.3 million records from 307 institutions gathered through the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) [http://www.openarchives.org]. Besides the ability to limit a search to a specific media type such as an image, the home page promises that "instead of just the catalog records of a slide collection of van Gogh's works, users will be able to view images of the actual works." With only keyword queries available, except for the media type, you should keep your subject words general and few. A test search on July 23, 2004, for the resource type "image" and the subject "art" produced 4,791 records, not all of them relevant.

Institutional and learning object repositories, most of them at academic facilities and some searchable through OAIster, provide yet another resource category worth investigating. Cultural institutions such as art museums with staff engaged in active research may adopt the institutional repository model for archiving research. [For an excellent overview, see Miriam A. Drake's "Institutional Repositories: Hidden Treasures," Searcher, May 2004 at https://www.infotoday.com/searcher/may04/drake.shtml.] For example, you can explore art-related materials at MERLOT (Multimedia Educational Resource for Learning and Online Teaching) [http://www.merlot.org], GEM (Gateway to Educational Materials) [http://www.thegateway.org/], and collections such as Blue Web'n and Filamentality that form part of the SBC's Knowledge Network Explorer [http://www.kn.pacbell.com/].

Imaging and Image Retrieval Conferences and Software Vendors

To stay on the cutting edge of visual arts digitization research, go to EVA Conferences International (Electronic Imaging & the Visual Arts) [http://www.eva-conferences.com]. This umbrella site for "a cross-sectoral, multi-disciplinary, local & global set of events for people interested in new technologies in the cultural sector" dates back to 1990, when the European Commission established its VASARI project (Visual Arts System for Archiving & Retrieval of Images), aimed at developing a system of "very high quality digital imaging direct from paintings for conservation purposes." The EVA site, developed by the VASARI organization (not the project), offers access to conference papers back to 1998.

Two other international conferences in this field are the International Conference on Image and Video Retrieval [http://www.civr.org], and the International Conference on Pattern Recognition [http://www.ee.surrey.ac.uk/icpr2004/]. I also recommend DigiCULT [http://www.digicult.info] for the European perspective for art image digitization initiatives within the art museum and gallery community. In North America, D-LIB Magazine [http://www.dlib.org] performs a similar service in identifying new art image databases and conferences.

Here are some commercial CBIR systems:

• LTU Technologies [http://www.ltutech.com], an Anglo-French-American company, partnered with Corbis on a demonstration database for its Image-Seeker product [http://corbis.ltutech.com/].

• Idée Inc. [http://www.ideeinc.com], a Canadian company, markets Espion, a visual search and image management product whose image retrieval capabilities, according to a product sheet, encompass "collections of visual images that may contain photographs, video, graphics, sketches, illustrations, drawings, etc."

Digital library collections management systems are well worth exploring for art image collections. These digital content management systems go by a variety of names: digital library or archival management systems, institutional repository software, digital asset management systems (DAMS), and museum collections management systems or software. A large proportion of the publicly accessible digital library collections contain digitized photographs and art images. One company to watch is contentDM [http://contentdm.com], connected to OCLC, which features a Customer Collections page with a category devoted to Art and Drama. The Brigham Young University BYU Museum of Art Collection [http://www.lib.byu.edu/hbll/moa/], for example, contains nearly 9,000 images. With most contentDM collections you can easily switch from one collection to another. Art information systems designed for commercial galleries, such as the collections management suite sold by Artsystems Ltd. [http://www.artsystems.com], may also yield some interesting finds.

The Measure of All Art: Metadata and Catalog Systems

To explore some of the practical issues surrounding art cataloging, e.g., those issues raised in the interviews with UCAI's project team, read Sherman Clarke's Art Cataloging [http://artcataloging.net]. Clarke's site mainly covers art-related name authority issues and links to other art cataloging resources.

Besides associations and organizations such as the Getty Research Institute [http://www.getty.edu/research/], other groups maintain a proprietary interest in the art image cataloging and metadata standards process:

• The small but internationally influential American Library Association's Committee on Cataloging: Description and Access [http://www.libraries.psu.edu/tas/jca/ccda/ and http://www.ala.org/ala/alctscontent/catalogingsection/catcommittees/ccda/ccda.htm] that works up the ALA's position on changes to the Anglo-American Cataloguing Rules, Second Edition, 2002 Revision (AACR2R). If you're trying to keep ahead of the AACR2R curve on handling digital images, this is the place to begin.

• The Visual Resources Association's new initiative, Cataloguing Cultural Objects: A Guide to Describing Cultural Works and Their Images (Draft, May 2004) [http://www.vraweb.org/CCOweb/index.html], a data content standard like AACR2R, also covers 2-D and 3-D artwork.

• The Society of American Archivists' Visual Materials Section [http://www.lib.lsu.edu/SAA/VMhome.html] maintains some links to archival image cataloging resources.

• The RSLP Collection Description project [http://www.ukoln.ac.uk/metadata/rslp/], based at the University of Bath's UKOLN (U.K. Office for Library and Information Networking), established a metadata standard and software for use by U.K. research libraries.

• Although the U.K.'s Museum Documentation Association (MDA) [http://www.mda.org.uk/], like the distributors of the Anglo-American Cataloguing Rules, sell their primary museum documentation standard, Spectrum, in a print and electronic version, you'll find free access to some thesauri and vocabularies developed for U.K. museums and a large array of links to other international documentation standards on the wordHoard page [http://www.mda.org.uk/wrdhrd1.htm].

Conclusion

Despite the allure of content-based image retrieval, accurate, valid, standardized, and detailed metadata is the key to the precision recall of online art images. As pointed out by Christine L. Sundt and the Union Catalog of Art Images team, whether it's general image search engines such as Google Images or large-scale image databases such as ARTstor and the UCAI project, it all comes down to the depth and quality of the metadata. In large image databases, I can see that the ability to begin a query or to filter search results by specifying image content attributes — for example, show me only pictures that contain a round yellow object that resembles a flower — will narrow the possibilities down, but no amount of statistical inference and machine learning, without prior or concurrent human intervention in the description (cataloging), search, or retrieval processes, can confirm or distinguish an amateur painting of a sunflower from the masterpiece by Vincent van Gogh.

Of course art and artistic images represent only a small subset of the overall problem when searching for online images. Still, I believe we're a long way from the kind of artificial intelligence required that would permit a machine to consistently and reliably identify a Rembrandt from a Renoir. The remarkable achievements of researchers such as Professor James Z. Wang and his many colleagues around the world in the field of computer vision, pattern recognition, machine learning, and content-based image retrieval, nevertheless, all contribute and help redefine the possible when it comes to searching for good art.

The author's opinions do not necessarily reflect those of his employer.

Back to top