5 Questions with Thao Phan
Thao Phan is a Research Fellow in the Australian Research Council’s Centre of Excellence for Automated Decision-Making & Society, and the Emerging Technologies Research Lab at Monash University, Australia. She is a feminist technoscience researcher who specialises in the study of gender and race in algorithmic culture.
Her research addresses topics such as the aesthetics of digital voice assistants, ideologies of ‘post-race’ in algorithmic culture, the corporate capture of AI ethics, and AI in film and popular culture. Her work has been published in journals including Big Data & Society, Cultural Studies, Catalyst: Feminism, Theory, Technoscience, and Culture Machine.
NO.1
To start things off, especially for those unfamiliar with your work, how did you come to researching and analysing automatic speech recognition from a race-based context?
Automatic speech recognition is really quite new to me. My main bread and butter over the last few years has been as a researcher working in an area called feminist technoscience (sometimes also called feminist science and technology studies (STS) or social studies of science). In a nutshell, it’s a field of feminist research that treats science and scientific knowledge as objects of study. It asks questions like: what counts as scientific knowledge and why? Who is given authority in these important knowledge-making communities and who isn’t? And what is the role of science and technology in actively constituting categories like gender, race, ability, class, and nation?
My main interest as a researcher has been in using this critical framework to understand the vast array of systems, technologies, fields of practice, cultural figures, and networks of power we bundle under the phrase artificial intelligence (AI). I did a PhD looking explicitly at gender and AI, unpacking the central role of gender in making AI legible to users, consumers, and audiences. As a part of this project, I did a lot of work analysing digital assistants like Amazon Echo and Apple’s Siri, interrogating the gendered and racialised aesthetics of the artificial voice. It’s because of this research that I started to have conversations with James Parker, Joel Stern and Sean Dockray—the team behind Liquid Architecture’s Machine Listening curriculum. They’re a really wonderful and creative group whose agenda revolves around critically interrogating the rise of devices that listen to us: smart speakers, digital voice assistants, and other systems and technologies that are designed to transform sound and speech into units of data.
One of the things I really respect about them is that they don’t just produce incisive critiques, they’re also interested in recalibrating our relationships with these technologies, earnestly asking if it’s possible to put these systems to use for purposes outside of surveillance and capital accumulation. Last year they developed an experimental tool they called a word-processor. It works by using elements of speech-to-text and automatic speech recognition to transcribe videos and audio files. Once transcribed, these files can then be uploaded to an interface that allows an operator to search, play, analyse, deconstruct and recombine fragments of voice-data extracted from human speech. I was one of a few scholars, artists, and activists they invited at the end of last year to experiment with the tool and to see what critical and creative use it could be put to.
While automatic speech recognition is an intrinsic feature of digital voice assistants, it’s not something I had looked into in any great detail. Up until this point, my focus had been on understanding the dynamics of race and gender on the front end—the way user interfaces were designed or how devices were figured in marketing material and user reviews. The word processing tool was all about looking at these dynamics from the back end—how a computer uses statistical models to process speech-to-text. What I really underestimated is how strikingly different analysing these complementary sides of a system could be. Looking at the front end, I could pull on theory and methods from cultural studies, media studies, gender studies and critical race studies, analysing the gendered and racialised representation of devices and situating these representations within a history of dehumanised labour.
But you can’t really use those resources when looking at the back end, because statistical models don’t operate in that way. There’s no semiotic meaning to the representation of speech-to-text, there’s only statistical meaning. When a system hears the word ‘cat’ it doesn’t understand that word as signifying a furry domestic creature that’s sometimes ginger but can also be tan, grey, blue or striped, or as a Broadway musical that was devastatingly adapted into a Hollywood feature film. The statistical model only understands that the word ‘cat’ is one syllable long, takes place x number of words into a sentence, is likely to be a noun based on where it sits in that sentence, matches with the phonetic sounds ‘c’ ‘ah’ ‘t’, and features dominantly enough in the training data set that it can be identified with a defined degree of confidence.
The front end and the back end operate through completely different ontological regimes. This took me a really long time to grasp but is now totally central to how I think about AI.
NO.2
There’s a lot to be said about how artificial intelligence are considered so-called “objective”, when in reality they mimic the structural injustices in the physical world. There are too many examples to name, but off the top of my head I’m thinking of the Tay chatbot issue, as well as “predictive policing” that entrenches racial profiling, and how Facebook—as you’ve mentioned in your presentation—targets and excludes users based on race. In your presentation, you asked “How can we advocate against a process that operates beyond our perception?” I’m curious to hear how you answer this.
These are all great examples of what many scholars refer to as algorithmic racism—the use of automated decision-making and/or AI-powered technologies to sustain racist structures. There’s lots of different kinds of AI out there, but they all, at a basic level, do the same function: use a model to act on data. So much of our critical discourse has really focused on the issue of data: what counts as data? how was it collected? who did the labelling? What kind of histories do they capture? What do they inevitably exclude? Whose intentions do they reflect? Incredibly urgent and important questions that have yielded so much ground-breaking work. But the issue I was gesturing to has more to do with the operations of the model.
While we can physically look at a dataset, questioning its history and context, it’s a bit more difficult to do this with a model because models are, by their nature, opaque. In some instances, this opacity is the result of our commercial ecosystem in which something like the specific operations of the Google PageRank algorithm cannot be revealed because it’s protected as a proprietary system. But in other instances, this opacity is just a symptom of the use of AI to begin with. The reason we employ AI across so many domains is because it can do things that humans are physically incapable of doing. It’s not just that it operates on different speeds and scales; it’s that it is able to identify elements, patterns and trends that are literally beyond our perception. The human geographer Louise Amoore elegantly puts it this way: ‘critical accounts of the rise of algorithms have placed great emphasis on the power of algorithms to visualize, to reprogram vision, or indeed to “see” that which is not otherwise available to human regimes of visuality.’ Yet, algorithms cause great anxiety because ‘they operate on a plane in excess of human visibility and at scales that are inscrutable to the human.’
This is an important point because this ‘post-visual’ aspect of algorithmic culture changes how we perceive and act on a phenomenon like race. In critical race studies, race is typically apprehended visually and studied using visual methods, e.g. Fanon’s visual schema of epidermalisation, which operates through acts of looking and interpellation. But algorithmic culture complicates this framework. Let’s take the way a platform like Facebook racially categorises its users. It doesn’t ask people to self-identify. Instead, Facebook infers racial identity based on users’ behavioural data, drawing on proxy indicators like language use, IP address, and interests. This is a very strange and novel way to do race. When I asked that question, ‘how do we advocate against a process that operates beyond our perception?’, this is the context to which I was referring to, and to be honest, it was really more of a rhetorical question. A kind of provocation for critical race scholars to engage more with the technical working of models because our existing tools can only get us so far in understanding the operations of racism in this algorithmically mediated moment.
NO.3
When I attended your presentation, I was struck by how you managed to interweave personal experiences with your family into the research that you’ve done. It reminds me of the work of the Algorithmic Justice League, founded by AI researcher Joy Buolamwini. Although she focuses on the visual aspects of AI, which is much more researched than the aural aspects, do you think diversifying the tech sectors will lead to better implementations in terms of these technologies? Why/why not?
No. lol. The aim of diversity and inclusion strategies isn’t to challenge the dominance of technocratic forms of governance (the thing that actually threatens marginalised and disenfranchised people); the aim of diversity and inclusion is to further entrench and legitimise that regime. I am absolutely a student of Sara Ahmed when it comes to this topic. Within most organisations—from corporations to academic institutions—the purpose of diversity is to accrue value rather than reform social structures. In Ahmed’s words it becomes ‘one of the techniques by which liberal multiculturalism manages differences.’
Joy Buolamwini is an interesting example to bring up, because the co-author on her celebrated ‘Gender Shades’ article is Timnit Gebru, the co-lead of the Google Research Ethical AI team, who was famously fired from Google in 2020 for raising ethical concerns regarding energy use on large language models. Until that point, Google had held up Gebru and her team as a shining example on how they were leading the way in terms of ethical, diverse and inclusive practice … and then she was literally fired for doing her job. It’s a textbook illustration of what Ahmed means when she says ‘when you expose a problem you pose a problem.’
I’m probably far too cynical about these things but I constantly see examples of Big Tech using women and non-white people as a means to further their own agendas. One example that stuck out to me recently was the Women in AI awards, which broadly speaking aims to celebrate the role of women and other minoritised people in the AI industry. But the award is sponsored by Lockheed Martin, the world’s largest weapons manufacturer. This is a company that makes hellfire missiles and military drones. A company that profits from armed warfare. They’re not just bad capitalists, they’re the worst capitalists. In 2020, they made $65 billion in revenue and now they want to absolve their sins by handing out trophies to black and brown women. To me, it really doesn’t matter if the person who receives that award is white, or a refugee, or Indigenous, or queer, or any other matrix of oppressed identities because at the end of the day it’s the military-industrial-complex that wins.
I don’t mean to set myself up as an enemy of diversity here, and I would never begrudge someone from seeking recognition for their work, but it’s so often the case that these kinds of schemes undermine the people they’re said to celebrate. I often think of the Academy Awards and how the Oscars were originally established by studio boss Louis B. Mayer as a way to curb union-organising among motion picture employees. Establishing competition is a way of undermining solidarity. Awards help miserable workers to retrospectively justify their sacrifices. As Mayer himself said: ‘I found that the best way to handle [moviemakers] was to hang medals all over them … If I got them cups and awards they’d kill themselves to produce what I wanted.’
Having said this, I’m glad you enjoyed the meshing together of personal narratives and research narratives. It’s a very new thing for me and I usually try and keep an intentionally cold and detached scholarly distance from my life and the thing that I’m researching. There’s probably a lot to say about how that detachment is some kind of survival mechanism, but that’s probably another example of me being far too cynical.
NO.4
In a recent essay, titled ‘Racial formations as data formations’ (co-authored with Scott Wark), you note, ‘If categories of race are inextricable from the technologies of classifying and sorting that makes the production of distinctions between people possible, it follows that technological innovations engender innovative ways of producing and policing difference.’ Can you speak more to this?
This really builds off my response to the second question. Scott and I both have backgrounds in media studies, which means we spend an inordinate amount of time fixating on the idea of mediation. How do the operations of different technologies change how we experience the world? Media studies has often sought to answer this question in the context of communication. How does oral communication differ from the written word? How does the written word differ from the printed word, or the words that circulate through forms of analogue recording or live broadcast? How does this differ to the forms of communication now mediated by commercial platforms and proprietary algorithms? How do different political-economic-arrangements effect how we communicate, e.g. Twitter v. Facebook, Zoom v. email? In short, how do all these subtle, material changes to the medium of communication shape how we receive the message? (That’s my amateur McLuhan.)
In that essay, Scott and I were trying to come to grips with how race is mediated in the current moment. Commercial platforms algorithmically determine race for the purposes of targeted advertising, but so do government agencies who use racialised markers—like language use, search terms, and social networks—to determine a person’s ‘foreignness’ or ‘threat-level’ as a way to justify sustained state surveillance. These systems operate through proxies and abstractions to figure racialised bodies not as single, coherent subjects, but as shifting clusters of data. In this context, race emerges as an epiphenomenon of processes of classifying and sorting—what we’ve called ‘racial formations as data formations’. As I gestured to earlier, what makes this regime of racialisation new is that it operates in ways that are beyond our perception: either because its constitution occurs beyond human scrutiny or because it is deliberately obscured and opaque. This has implications for us as critical race scholars because we now need to reassess how we evidence and resist racism. How can we keep up with the pace of dynamic classification—classifications that are being assessed and reassessed with every new piece of behavioural data?
How can we resist a category we don’t even know we’ve been placed within? And how do we form communities of solidarity under those conditions?
NO.5
What researchers and/or scholars inform and inspire the work that you do? What books do you recommend we read to better understand this problematic phenomenon?
There’s so much good work coming out at the moment it makes it hard to name just a few! I’m super excited by the latest wave of work in critical algorithm studies that combines rigorous theory with historical analysis and technical literacy. I’m thinking here of Xiaochang Li, Louise Amoore, and especially Wendy Chun. All of Chun’s work is brilliant but there’s an edited book called Pattern Discrimination that she put together with Clemens Apprich, Florian Cramer, and Hito Steyerl that I think is really underrated (it’s also open-access, meaning it’s online and free). She also has a new book, Discriminating Data, which explores in detail how histories of eugenics, segregation, and identity politics are encoded into our digital networks.
The work produced by Black feminist scholars like Ruha Benjamin, Safiya Noble, and Simone Browne has been absolutely critical in defining the public conversation on AI and race. I’m a really big fan of Browne’s book Dark Matters, a work that puts black feminism in conversation with surveillance studies. What makes it so impressive is that it works across so many different sites—from the archives of transatlantic slavery, to contemporary art, literature, reality TV, biometrics, as well as post-9/11 security practices.
And for anyone who is interested in engaging more with feminist technoscience, I strongly recommend the work of Paul Edwards, Karen Barad, and Donna Haraway. My favourite Haraway book is the one with the most insane title, Modest_Witness@Second_Millennium. FemaleMan©_Meets_OncoMouse™. It’s kinda difficult to explain but I suppose it’s a sprawling commentary on the emergence of modern technoscience. It detours through critiques of enlightenment humanism, Robert Boyle and the birth of the Scientific Revolution, the human genome project, and a litany of cyborg figures from fiction and life, like OncoMouse—a genetically modified lab mouse used for breast cancer research and the first patented animal in the world.
Find out more
What is the sound of racialisation? How might we listen to misrecognition? What does machine error tell us about the precision of racism? And how can the tools of a racist system be used to transcribe new forms of resistance?
This experimental presentation by feminist technoscience researcher Thao Phan brings together critical work on race and algorithmic culture with new techniques for dissecting and analysing automatic speech recognition, applied to personal and public archives drawn from Thao’s life and research.
Watch Thao Phan’s presentation (hosted by Liquid Architecture’s ‘Machine Listening’ programme) here.