Oversimplifying Search

“SEARCH IS TOO HARD.” I hear this lament from librarians trying to teach students the intricacies of searching the many databases mounted on a plethora of platforms to which their libraries subscribe. I also hear it from enterprise search professionals charged with putting in place an internal search system to meet all the information needs of the employees of their organizations. I rarely hear it from people embarking on a Google search.

“Why can’t all search be just like Google?” is the other refrain that frequently follows the complaint about search being too hard. Just give me that single search box. Forget all those facets to refine my search strategy. Boolean search operators just get in the way. Using synonyms, understanding controlled vocabulary, and limiting by date? Fiddlesticks. That’s too complicated. We need to simplify search.

It sounds wonderful, doesn’t it? One search box, one set of protocols, one intuitive interpretation of what you’re searching for. You can practically see Henry David Thoreau over there at Walden Pond nodding in agreement, with his signature plea to “Simplify, simplify, simplify.”

There’s only one problem. It won’t work. A single search box to search across all possible databases, webpages, and information resources is impractical. Content is not linear in design. Information architecture flows from a variety of factors, not the least of which is internal structure. One search box to rule them all cannot ac count for the vagaries of newspaper reporting versus scholarly articles versus legal documents versus statistical tables versus. … Well, I could go on, but you get the picture. And that’s structured information. How would one search box also handle unstructured data such as that found in social media, emails, and text messages?

Technology is stumbling its way toward solutions. Machine learning and deep text analysis are two AI technologies making impressive inroads on the single search box mentality. But they are encountering criticism as well. Bias in training sets is often completely unnoticed by those choosing the training sets, who are predominantly white and male. When the data used to train AI systems is flawed, the end result is flawed as well.

Machine learning tends to favor the popular. Although deep text analysis in humanities and social science research is producing valuable new insights from old information, in other aspects of library reference and information services work, it is “long tail,” hidden, and not widely searched information that has value.

Certainly, some of the complexity of search can be alleviated by machine learning. Given millions of searches and an analysis of what searchers clicked on, the ability of a search engine to identify synonyms and related terms is enhanced to the point that searchers no longer need to create long OR strings. Disambiguation based on personalization signals vastly increases relevancy.

Will we ever have a single interface to all knowledge? I doubt it very much. I’d prefer that information professionals help their clients realize that not everything has an easy answer, that complexity is intrinsic to research, and that human curation and evaluation can’t be achieved by a single search box.