FEATURE - How Data Bias Obscures the Underserved

Information Today, Inc. Corporate Site

KMWorld

CRM Media

Streaming Media

Faulkner

Speech Technology

DBTA/Unisphere

PRIVACY/COOKIES POLICY

Other ITI Websites

American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Enterprise AI World Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Unisphere Research

Vendors: For commercial reprints in print or digital form, contact LaShawn Fugate (lashawn@infotoday.com)
Magazines > Computers in Libraries > November 2021
Back Index Forward

SUBSCRIBE NOW!

Vol. 41 No. 9 — November 2021

FEATURE
How Data Bias Obscures the Underserved
by Suzanne S. LaPierre

While the impact on the underserved is critical, biased survey results can also negatively impact staff.

Bulletproof vests weren’t designed for people with breasts. Workplace safety equipment often fails to protect females because it was designed for the male body. Smaller sizes of the same equipment don’t solve the problem, because differences in body shape interfere with proper fit. For a similar reason, women in car wrecks are far more likely to be injured or killed than men in the same situation, because automobiles and their safety equipment are manufactured based on data from typical male bodies. ¹

While results of data bias are not always life-threatening, they can be pervasive. Everything from public restrooms to library card applications often fails to accommodate those with nonbinary gender identities. Products such as soap dispensers and facial recognition software can fail to see people with dark skin because they were designed using data skewed toward lighter skin tones. ² Even when problems are more annoying than lethal, a sense of being excluded deters people from using certain spaces and services.

Similar to other types of bias, data bias is typically unintentional. Certainly, designers of bulletproof vests and airbags do not intend for their products to fail users. And libraries do not intend their services to be inaccessible or irrelevant to many people. However, as computer scientists say: garbage in, garbage out. Biased data used to design systems results in flawed products and services.

A unique danger of data bias is its invisibility. Due to being both unintended and often unrecognized, the results of data bias become baked into products, algorithms, statistics, and technology. As Caroline Criado Perez puts it in her book Invisible Women: Data Bias in a World Designed for Men, “Big Data … is panned for Big Truths by Big Algorithms, using Big Computers. But when your big data is corrupted by big silences, the truths you get are all half-truths, at best.” ³ At worse, the results of biased data perpetuate and compound existing inequities.

What Does This Mean for Libraries?

Surveys distributed only to people who are already in the building or are using the library website fail to capture input from those who find either site too difficult to access. These may be people who are already marginalized by the digital divide, disability, or transportation or language barriers. Surveys distributed only to program attendees may result in data that supports more of the same type of program at the expense of those who find that service unusable in the first place. Biased results can perpetrate the status quo at the expense of the underserved.

Designers of measurement tools attempt to address these issues in many ways, such as translating into multiple languages, disseminating surveys in various formats, and including write-in fields rather than limited options for responses. However, those who are not using the library due to accessibility issues, because they don’t feel comfortable in the library, or because they don’t perceive it as having services they need or want may still remain uncounted. This includes a large portion of the community the library is intended to serve.

Some Types of Data Bias

Sample selection bias occurs when certain people or groups are underrepresented in research data while others are overrepresented.

Researcher bias, also called confirmation bias, occurs when those conducting surveys unconsciously do so in a manner that tends to encourage certain results. ⁴ Research shows that very minor changes in the wording of survey questions can result in major differences in responses. ⁵

Observation bias occurs when participants in a study are aware that they are being observed, which can consciously or unconsciously alter results.

Overlapping datasets occur when the same people are counted more than once. This can suggest more demand for a particular service than is actually the case.

All of these types of data bias risk further obscuring the needs of the underserved. “If we come to terms with the fact that all data are necessarily partial and potentially biased, then we need long-term approaches that optimize for justice and equity,” Ruha Benjamin writes in her book, Race After Technology: Abolitionist Tools for the New Jim Code, which discusses how new technologies perpetuate racism and other inequities. ⁶

YAVIS and Elite Bias

In the mental health field, the acronym YAVIS stands for “young, attractive, verbal, intelligent, and successful.” Professionals are warned against YAVIS Syndrome, which means offering preferential treatment to patients who exhibit those traits. YAVIS also refers to people who seem to respond better to services received because they were least in need to begin with.

A parallel for libraries is clear. Which customers are our version of YAVIS? Public libraries devote departments exclusively to serving children and teens, even though most public schools also have libraries serving those age groups. How many libraries have staff members—let alone entire departments—devoted to serving elderly, immigrant, or homeless populations?

In qualitative research, the term “elite bias” refers to “overweighting data from articulate, well-informed, usually high-status participants.” ⁷ This can happen even when evaluating text, such as write-in comments on surveys. When evaluating research results, the use of “juicy quotes” ⁸ to magnify certain points may give an advantage to the English-fluent and eloquent.

YAVIS and elite bias can have compounded effects. Staffers who are distributing surveys may be more likely to offer them to friendly customers and regulars. Those users may be more likely to provide enthusiastic responses, which are quoted in results. Once again, opinions from existing and well-served users may be overweighted.

Of course, we should value and appreciate our most avid customers. They are our bread and butter when it comes to voting and supporting libraries. However, we should be aware of how we may be magnifying their input at the expense of the underserved.

Who Are the Underserved?

There are reasons many people aren’t being heard. Some are harder to count, whether it’s because of communication issues, geographic barriers, or information poverty. We have a pretty good idea of who they are, though. We are unlikely to be counting people who don’t come to the library because it’s not on a public transportation route, it’s too difficult to access due to mobility or vision challenges, or they find the space intimidating or unwelcoming. Web resources aren’t being utilized by those who don’t have broadband service or merely find it confusing to navigate the website. There are significant barriers to accurately surveying people experiencing homelessness and those who speak languages other than English.

There are very few studies in which people with disabilities are asked how well libraries are meeting their needs. Studies of library service needs of people in residential care facilities are almost nonexistent. Some of our most underserved community members are in nursing homes, assisted living and rehab centers, jails, and juvenile detention facilities. Many of these residents are unable to come to the library building and also have restricted—if any—access to the internet. Some of these facilities have their own small libraries, but rarely do they have anything approaching the resources of a public library.

Invisible Gender Divides

The underserved can vary by context. Women have often been disregarded when analyzing data throughout history, such as medical research and employment algorithms. ⁹ Libraries are different because women comprise the majority of library staffers nationwide.

Years ago, I attended a library conference session about book clubs that was hosted by a team of both male and female librarians who shared tips for improving them. At the end, I asked, “How do we achieve more gender balance in book clubs?” The hosts replied that even when they had made deliberate efforts to choose titles with more guy appeal, the effect on participation was negligible. Their response was this: “Nothing we do seems to have much impact. Book clubs just seem to appeal more to women.”

Fast-forward a few years when a new brewpub opened near our library. A colleague launched a Books and Brews club to take place monthly in the pub’s special events room with support from the library. She advertised it in the pub and utilized the usual library marketing tools. From the first meeting, there was an even gender balance in the club. It is still going strong years later with an even gender balance. It also attracts a wider age range than our other book clubs. School libraries face similar challenges in designing services that work equally well for boys and girls. If we’re leaving men or women out of certain equations, how are we overlooking the concerns of nonbinary people?

Librarians Are People Too!

While the impact on the underserved is critical, biased survey results can also negatively impact staff. When schedules and workloads are continually stretched by demands to provide more of the same services to the same users, staff burnout may result.

Libraries are great at counting, and Mary Jo Finch writes this in Learning From Our Statistics: “But statistics are often misused. They are sometimes treated as a scorecard where higher means better, encouraging library staff to focus on increasing numbers instead of on meeting needs. We misread data and draw incorrect conclusions. We make bad comparisons, look for trends with insufficient data, and miss connections. We count what is easy to count, and we don’t necessarily count what is important to know. … We take surveys and allow the opinion of a few to stand for the majority.” ¹⁰

When staffers sense that increased pressure is based on weak metrics, morale suffers. The stress of working harder in response to dubious data is detrimental to employee wellness and retention.

Brainstorming Solutions

As we explore the potential benefits of new technologies, we must also be aware of how inequities can be perpetuated by seemingly neutral processes. ¹¹ AI and its subset, machine learning (ML), use algorithms applied to datasets to arrive at predictions. However, ML practitioners lack an industrywide fairness-aware standard in data collection, explain computer scientists Eun Seo Jo and Timnit Gebru in “Lessons From Archives: Strategies for Collecting Sociocultural Data in Machine Learning.” They suggest that interdisciplinary collaboration between computer scientists and archivists/librarians may help develop more representative and systemic processes for generating datasets used in AI. ¹²

If computer scientists can learn from librarians and vice versa, certainly staffers of archives and public, academic, and school libraries can learn from one another when it comes to producing more inclusive data. Academic libraries have access to a community of scholars who are adept at using the scientific method, while public libraries have a broad userbase from which to learn. Public libraries often partner with schools to promote programs and services. Partners can also include businesses such as grocery stores and laundromats in which community members who might not be regular library users gather.

Technology can be used to extend services without overburdening staffers. Many librarians honed virtual programming skills during the pandemic. These virtual programs may be shared with residents of treatment, care, or detention facilities through outreach and partnerships.

Focus groups can enable input from those who, due to smaller sample sizes getting lost in larger equations, are at risk of invisibility. These might include LGBTQ+ individuals, people with certain disabilities, speakers of languages other than English, and recent immigrants.

In conclusion, the following anecdote struck me as an example of the invisible public. Author Virginia Eubanks describes passing a man who was gesturing and verbalizing oddly as she entered a public library to do research. As if by some unspoken agreement, passersby looked away from him and from each other. “When we passed the anguished man near the Los Angeles Public Library and did not ask him if he needed help, it was because we have collectively convinced ourselves that there is nothing we can do for him. When we failed to meet each others’ eyes as we passed, we signaled that, deep down, we know better. We could not make eye contact because we were enacting a cultural ritual of not-seeing,” Eubanks wrote. ¹³ Certainly, libraries can’t be all things to all people. However, we can make a concerted effort to look at those who are hardest to see.

Resources

Benjamin, Ruha. (2019). Race After Technology: Abolitionist Tools for the New Jim Code. Cambridge, Mass: Polity.

Connaway, L. and Radford, M. (2016). Research Methods in Library and Information Science, 6th edition. Santa Barbara, Calif.: Libraries Unlimited.

Eubanks, Virginia. (2018). Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York: St. Martin’s Press.

Finch, Mary Jo. (2021). “Learning From Our Statistics,” Public Libraries Magazine, May 4, 2021. publiclibrariesonline.org/2021/05/learning-from-our-statistics.

Jo, Eun Seo and Gebru, Timnit. (2020). “Lessons From Archives: Strategies for Collecting Sociocultural Data in Machine Learning.” FAT* ’20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain. doi.org/10.1145/3351095.3372829.

Miles, Huberman, and Saldaňa. (2019). Qualitative Data Analysis: A Methods Sourcebook, 4th edition. Thousand Oaks, Calif: Sage.

Perez, Caroline Criado. (2019). Invisible Women: Data Bias in a World Designed for Men. New York: Abrams Press.

Endnotes

1. Perez, Caroline Criado. Invisible Women: Data Bias in a World Designed for Men.

2. Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code.

3. Perez, Caroline Criado. I nvisible Women: Data Bias in a World Designed for Men, p. xii.

4. Connaway L. and Radford M., Research Methods in Library and Information Science.

5. Ibid, p. 119.

6. Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code, p. 189.

7. Miles, Matthew B., Huberman, A. Michael, and Saldaňa, Johnny. Qualitative Data Analysis: A Methods Sourcebook, p. 296.

8. Connaway, L. and Radford, M. Research Methods in Library and Information Science.

9. Perez, Caroline Criado. Invisible Women: Data Bias in a World Designed for Men.

10. Finch, Mary Jo. “Learning From Our Statistics.”

11. Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code.

12. Jo, Eun Seo and Gebru, Timnit. “Lessons From Archives: Strategies for Collecting Sociocultural Data in Machine Learning.”

13. Eubanks, Virginia. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor, p. 175.

Suzanne S. LaPierre is a Virginiana Specialist Librarian for Fairfax County Public Library in Virginia. She writes The Wired Library column for Public Libraries Magazine.