Même texte en français ici. I’ll copypaste the English version here in case of paywall.

Accents are one of the cherished hallmarks of cultural diversity.

Why AI software ‘softening’ accents is problematic

Published 2024/Jan/11
by Grégory Miras, Professeur des Universités en didactique des langues, Université de Lorraine

“Why isn’t it a beautiful thing?” a puzzled Sharath Keshava Narayana asked of his AI device masking accents.

Produced by his company, Sanas, the recent technology seeks to “soften” the accents of call centre workers in real-time to allegedly shield them from bias and discrimination. It has sparked widespread interest both in the English-speaking and French-speaking world since it was launched in September 2022.

Far from everyone is convinced of the software’s anti-racist credentials, however. Rather, critics contend it plunges us into a contemporary dystopia where technology is used to erase individuals’ differences, identity markers and cultures.

To understand them, we could do worse than reviewing what constitutes an accent in the first place. How can they be suppressed? And in what ways does ironing them out bends far more than sound waves?

How artificial intelligence can silence an accent

“Accents” can be defined, among others, as a set of oral clues (vowels, consonants, intonation, etc.) that contribute to the more or less conscious elaboration of hypotheses on the identity of individuals (e.g. geographically or socially). An accent can be described as regional or foreign according to different narratives.

With start-up technologies typically akin to black boxes, we have little information about the tools deployed by Sanas to standardise our way of speaking. However, we know most methods aim to at least partially transform the structure of the sound wave in order to bring certain acoustic cues closer to a perceptive criteria. The technology tweaks vowels, consonants along with parameters such as rhythm, intonation or accentuation. At the same time, the technology will be looking to safeguard as many vocal cues as possible to allow for the recognition of the original speaker’s voice, such as with voice cloning, a process that can result in deepfake vocal scams. These technologies make it possible to dissociate what is speech-related from what is voice-related.

The automatic and real-time processing of speech poses technological difficulties, the main one being the quality of the sound signal to be processed. Software developers have succeeded in overcoming them by basing themselves on deep learning, neural networks, as well as large data bases of speech audio files, which make it possible to better manage the uncertainties in the signal.

In the case of foreign languages, Sylvain Detey, Lionel Fontan and Thomas Pellegrini identify some of the issues inherent in the development of these technologies, including that of which standard to use for comparison, or the role that speech audio files can have in determining them.

The myth of the neutral accent

But accent identification is not limited to acoustics alone. Donald L. Rubin has shown that listeners can recreate the impression of a perceived accent simply by associating faces of supposedly different origins with speech. In fact, absent these other cues, speakers are not so good at recognising accents that they do not regularly hear or that they might stereotypically picture, such as German, which many associate with “aggressive” consonants.

The wishful desire to iron out accents to combat prejudice raises the question of what a “neutral” accent is. Rosina Lippi-Green points out that the ideology of the standard language - the idea that there is a way of expressing oneself that is not marked - holds sway over much of society but has no basis in fact. Vijay Ramjattan further links recent collossal efforts to develop accent “reduction” and “suppression” tools with the neoliberal model, under which people are assigned skills and attributes on which they depend. Recent capitalism perceives language as a skill, and therefore the “wrong accent” is said to lead to reduced opportunities.

Intelligibility thus becomes a pretext for blaming individuals for their lack of skills in tasks requiring oral communication according to Janin Roessel. Rather than forcing individuals with “an accent to reduce it”, researchers such as Munro and Derwing have shown that it is possible to train individuals to adapt their aural abilities to phonological variation. What’s more, it’s not up to individuals to change, but for public policies to better protect those who are discriminated against on the basis of their accent - accentism.

Delete or keep, the chicken or the egg?

In the field of sociology, Wayne Brekhus calls on us to pay specific attention to the invisible, weighing up what isn’t marked as much as what is, the “lack of accent” as well as its reverse. This leads us to reconsider the power relations that exist between individuals and the way in which we homogenise the marked: the one who has (according to others) an accent.

So we are led to Catherine Pascal’s question of how emerging technologies can hone our roles as “citizens” rather than “machines”. To “remove an accent” is to value a dominant type of “accent” while neglecting the fact that other co-factors will participate in the perception of this accent as well as the emergence of discrimination. “Removing the accent” does not remove discrimination. On the contrary, the accent gives voice to identity, thus participating in the phenomena of humanisation, group membership and even empathy: the accent is a channel for otherness.

If technologies such AI and deep learning offers us untapped possibilities, they can also lead to a dystopia where dehumanisation overshadows priorities such as the common good or diversity, as spelt out in the UNESCO Universal Declaration on Cultural Diversity. Rather than hiding them, it seems necessary to make recruiters aware of how accents can contribute to customer satisfaction and for politicians to take up this issue.

Research projects such as PROSOPHON at the University of Lorraine (France), which bring together researchers in applied linguistics and work psychology, are aimed at making recruiters more aware of their responsibilities in terms of biais awareness, but also at empowering job applicants “with an accent”. By asking the question “Why isn’t this a beautiful thing?”, companies like SANAS remind us why technologies based on internalized oppressions don’t make people happy at work.

  • @GenderNeutralBro
    link
    English
    31 year ago

    I feel like this is yet another symptom of corporate greed. Corporate culture is dehumanizing from the start, so this is not a surprising development. Be a good little machine, conform to the job, and be happy with the few waking hours per day in which you can still be a human.

    On the surface, I hate it.

    On the other hand, I understand the motivation, and it makes sense given the current reality of outsourcing remote support to wherever labor is cheapest. If your job is to provide support over the phone to American customers, then yeah, you want Americans to understand you as clearly as possible. Your average American will struggle to understand a thick Indian accent over a crappy phone line with call center noise in the background, especially when you add in the stress and frustration from navigating automated phone systems and hold music just to reach a human in the first place. (One would hope they are using AI to clean up that noise, as well, though I’m not sure how much can be done about the line quality.) This is not any kind of judgment (except against corporations for shifting the cost of support onto their customers in this way), and I wouldn’t say it’s anybody’s fault exactly. I’ve certainly heard this complaint many times, here in America.

    And indeed, any good call center in, say, India or the Philippines will train their employees for this. Is it better for employees to spend more time in voice training, and consciously work to mitigate their accent, or let an audio filter do that work for them? Seems like a complicated question to me, and I have no simple answer. Even native-English-speaking Americans often have a “customer service voice” that is notably different from their normal voice.

    Personally, I have a relatively easy time understanding diverse accents, compared to most Americans. This is a sort of privilege in itself; I was lucky enough to grow up in a big, diverse city, with exposure to a broad range of cultures, languages, and accents from a young age. I also have fairly good hearing (for now). I really can’t blame anyone who struggles understanding foreign accents when they never had the opportunity to train their ear for them.