Creating Search Gizmos to Simplify Web Searching
by Taraq Calishain
In April, I applied for a Bellingcat Tech Fellowship, an opportunity to build open source search tools useful to journalists and OSINT (open source intelligence) investigators. As I prepared my application and CV, it became clear I had a glaring deficiency: I knew a lot about searching and had made some search tools in Google Sheets, but I didn’t know anything about making tools that could be put on a webpage.
It just sort of happened.
But it was a way to make search tools! So I started taking a SkillShare class in May. A couple of weeks later and about two-thirds of the way through, I knew enough to try making my first tool. So I did. And I didn’t stop. Over the summer, I made more than 20 search tools, which I’m calling my ResearchBuzz Search Gizmos (searchgizmos.com), and I intend to continue building more search tools.
In this article, I’ll show you how to use four of my Gizmos in researching historical figures. Instead of just dropping a name into Google, Wikipedia will guide searchers to meaningful content with two tools, then use two more tools to do some nook-and-cranny web searching. My search example will be Aretha Franklin, exemplary singer and Queen of Soul.
I’ll start with a tool that uses Wikipedia to intersect famous people and news: Gossip Machine.
Wikipedia is a great source for finding out about a famous person. But it’s also a great source for finding news about a famous person. That’s because Wikipedia has been tracking pageviews s ince 2016, and spikes in pageviews mean interest in the page’s topic. Gossip Machine (searchgizmos.com/gossip-machine) uses pageview counts as fossilized attention to determine when people were interested in people, places, and things. It has search boxes for Wikipedia Page Topic, year, and degree of newsworthiness. It works best with pages that have at least 7,000 views per day.
Gossip Machine does this by reviewing Wikipedia pageview counts a year at a time and flagging days that have much higher than usual views. Those days are then turned into clickable Google News and Google Web searches.
Aretha Franklin passed away in 2018, and as you might expect, her Wikipedia pages got a lot of views that year. She died on August 16, but her Wikipedia pageviews actually increased before that as the news spread that she was in hospice care.
News dated later that year find stories about her lack of a will and her relatives’ unhappiness with her eulogy. (If you look at the Gossip Machine results for 2019, you’ll discover they eventually found a will.)
But just because a famous person dies doesn’t mean we stop talking about them. Let’s look for Aretha Franklin news in 2021, almost 3 years after her passing. In this case, there’s a lot of news, and most of it centers around a biopic of her life.
Gossip Machine does have a caveat: It won’t work as well when average daily pageviews for an article are less than 7,000 or so. (Gossip Machine will give you an average daily pageview count a long with your results.) Depending on the topic, those lower-view pages will work somewhat well to very well. Articles with less than 500 views a day don’t work at all according to my testing. Also, try different years when you’re using Gossip Machine; just because a historical figure had only 1,500 average pageviews a day in 2017 doesn’t mean they don’t get more views later on.
Starting with Gossip Machine can give you an overview of recent news about the person. If they’ve died, you can learn about posthumous impacts of their life and legacy. To explore their work over their life span, however, you need a second tool—the Contemporary Biography Builder.
The Contemporary Biography Builder (CBB)
The Contemporary Biography Builder (CBB; searchgizmos.com/cbb) tool uses Wikipedia information about birth and death dates to build searches for your subject across Google Books, the Internet Archive, the Digital Public Library of America (DPLA), and the Library of Congress’ newspaper project Chronicling America. Instead of making the searches completely open, though, they’re timespan-bounded by the life span of the subject. (Of course if they’re still alive, there is no search cutoff.)
The CBB doesn’t work well for people from much earlier than the mid- to late-18th century, because it’s so publication-focused. But for more recent people, it works well. A search for Aretha Franklin covering the years 1942–2018 returns a set of links from several Google Books databases, along with links from the Internet Archive, DPLA, and Chronicling America.
There are separate searches for full-view resources and all re sources, because nothing’s more frustrating than getting something in a search result you can’t access. Because these searches are all within Aretha Franklin’s lifetime, they’re wonderfully targeted.
Similar to the Google resources, the Internet Archive search will find different types of text-based information, be it books, magazines, or monographs. The DPLA and Chronicling America pages are a little different, however.
The DPLA tends to have non-publication material, like promotional ephemera, photography, and design documents. The Chronicling America content is all newspaper material, but it’s older material; there are 32 results for a search for Aretha Franklin, and none of them are newer than 1962.
Gossip Machine will point you toward pre- and posthumous news that you might not have known about, and the CBB will create life span-bounded searches for a number of online resources. But neither of these tools address the vast amount of unorga nized and unaffiliated information on the Web. That’s what the other two tools—Carl’s Name Net and The Anti-Bullseye Name Search—are for.
Carl’s Name Net
If you’ve ever done any genealogy research, you know the im portance of name search. The problem is that the usual (for America and much of the Western world) format of Firstname Middlename Lastname is not the only one that webpages use to name, refer to, or organize people. Unfortunately, searching through every possible iteration of a name is time-consuming and annoying. At least it was.
Carl’s Name Net (CNN; searchgizmos.com/carls-name-net) takes a usual three-name construction and generates several ways that name might be expressed in a search result. Since as a performer Aretha Franklin used only two names, she’s not a great example, although you can still use her name with CNN—it just generates fewer iterations than a three-part name would.
Let’s use actress Sarah Michelle Gellar instead. I’m going to put in her name and use the additional query terms box to add the words “actress” and “Buffy.” Here are the name variants CNN generates:
|Sarah Michelle Gellar
|Sarah M Gellar
||Gellar Sarah M
||Gellar Sarah Michelle
|S Michelle Gellar
In addition to generating name variants, CNN folds them into searches for Google, Google Books, Google Scholar, and Internet Archive. The name variants are divided into common and uncommon variants, and searches are created for each type. CNN ORs the names together using the pipe (|) operator.
I find that CNN tends to find low-profile stuff—reference materials and articles that might be buried under mentions of more common name variants. Be sure to check all the resources for which CNN makes searches; just because someone’s an actress doesn’t mean they won’t be written about in Google Scholar.
Using the three Search Gizmos I’ve talked about so far, you can pull an awful lot of information from organized spaces like the Internet Archive and Google Books. You can also get a lot of data from disorganized spaces on the web. But there’s one more tool you can use to make sure you’ve gotten everything. It’s a big cannon that really shakes up your search results and it’s called The Anti-Bullseye Name Search.
The Anti-Bullseye Name Search
The Anti-Bullseye Name Search (TABNS; searchgizmos.com/ bullseye) completely removes the standard “Firstname Lastname” name format from Google’s search results. In addition, it tries to remove as many ecommerce and clutter results as possi ble, either by removing results by domain (Abebooks.com, Facebook.com) or by URL pattern (restricting Amazon or eBay from appearing in search result URLs, for example).
The upshot is that the Google search results from using TABNS look much different from regular search results. On the one hand, a regular Google search for Aretha Franklin yields a couple of good links in the results but there are also links to a lot of distracting content you don’t need, like video content and links to music services.
On the other hand, a search for Aretha Franklin using TABNS yields results that are much, much more reference-oriented. If the person you’re searching has a political or legal history, TABNS will surface results related to legal filings and political transparency evidenced by a TABNS search result for former congressman Devin Nunes. Other things that TABNS works well to surface include legal settlements and filings, indexes, and metadata.
SEARCH GIZMOS STILL TO COME
If you do a Google search for a historical figure, you’re not going to find every last item about them on the web. I’m not sure that’s possible anymore. But if you start with these four Search Gizmos, you’ll have a structured way to build your research and suss out resources that you might have otherwise been difficult to find.