Hello. It looks like you�re using an ad blocker that may prevent our website from working properly. To receive the best Tortoise experience possible, please make sure any blockers are switched off and refresh the page.

If you have any questions or need help, let us know at memberhelp@tortoisemedia.com

Computer says no – is AI making healthcare worse for women?

Computer says no – is AI making healthcare worse for women?


Artificial intelligence has the potential to drastically improve so much of our lives. But in a world where women’s heart attacks are already systematically under-diagnosed, AI might actually be making healthcare worse for women


Sally Bee: I was actually at a kid’s birthday party. I was holding my daughter. The boys were running around. And I just felt really poorly, really suddenly. So I handed my little girl to a friend, and I went to the toilet. And I don’t know whether I thought I was going to be sick, or whatever. I just knew something was really wrong.

Caroline Criado Perez, narrating: Sally Bee is telling me about a summer’s day in 2004, when she was 36 years old.

Sally: Then I came back out, and collapsed on the floor. And all my friends tried to help me, but they didn’t know what was going on. I had recently watched an episode of ER, where someone was having a panic attack. And they brought them a bag, to breathe into. Someone actually brought me a packet of cheese and onion crisps, which didn’t really have the desired effect.

Caroline, narrating: Sally is an author, a coach, a motivational speaker. 

Sally: And I do TV presenting, and cooking. Bit of everything, really. Jack of all trades.

Caroline, narrating: But first and foremost, she calls herself a survivor.

Sally: I just remember the pain, absolutely tearing through my chest. And the ambulance came. And by then, because my breathing was so forced and I was panicking I suppose, my hands went all crunched up. And I couldn’t open my hands. And so, the ambulance guys presumed I was having a panic attack. And I was trying to explain well, I am panicking now, but it didn’t start that way.

Caroline, narrating: Sally was taken to Warwick Hospital, and sent home that same evening, with medication to treat indigestion.

Sally: Then a couple of days later, I ended up back in hospital with the same chest pain. Was given an ECG. And a student nurse saw the ECG. And she was like, “Oh my goodness. This is saying you’re having a heart attack.” So she called the cardiologist down. So I had three cardiologists standing around my bed. All of them were saying, “Now look. This ECG is telling us you’re having a heart attack, but we actually don’t believe it. Because, you are 36. You don’t drink, you don’t smoke. You’re not overweight. And you don’t have any family history of heart attacks.”

Caroline, narrating: Sally tells me one of the cardiologists – they were all men by the way – said he would bet her a million pounds that she wasn’t having a heart attack. But she was. And if Sally had been diagnosed sooner, her heart might have ended up being less damaged.

Caroline, narrating: I’m Caroline Criado Perez, and this is Visible Women, my new weekly podcast from Tortoise, investigating how we finally fix a world designed for men. In this episode, I’m actually not just looking at heart attacks. I’m looking at bias in healthcare, and whether the emerging use of AI means it’s about to get a lot worse. I start off this episode with all sorts of ideas about how to fix bias in AI. But by the end, as you’ll hear, I realised that maybe I was approaching this all wrong. 

This was an episode that really forced me to re-evaluate my thinking. Oh, and if you’re the cardiologist that owes Sally a million pounds, get in touch. I’ll get you her bank details.

Caroline, narrating: It turns out that Sally has a condition called Spontaneous Coronary Artery Dissection, or SCAD. SCAD has until fairly recently, been thought of as a rare condition. But some cardiologists now think it’s actually been historically underdiagnosed. It’s the most common cause of heart attacks in women under the age of 40, as well as in women who are either pregnant, or have recently given birth. And more than 90 per cent of SCAD patients are female. The thing that’s so shocking about Sally’s story is that it’s about more than just the fact that her symptoms weren’t taken seriously – although, that’s bad enough. It’s that 12 years later, it happened all over again.

Sally: So I was older. I knew all about my condition by then. I had my ECG laminated up on the fridge. I had a cardiologist letter up on the fridge. I knew what I had to do, in an emergency. I understood how dangerous it was. And then I woke up in the morning, and that pain just hit me again. And I knew exactly what it was.

Caroline, narrating: This was in 2016, by the way.

Sally: We phoned the ambulance. They came. And they wouldn’t take me seriously at all. So the paramedic that came said, it could just be angina. It could be menopause. Lots of women your age get chest pain. And he wanted to take me to the little, local hospital where I live. And I said, “No. I have to go to the big hospital, with cardiology.” He wouldn’t listen. And he wouldn’t take me. I was crying. I was sobbing. Not from pain, although I was in pain. But from frustration.

Caroline, narrating: The paramedic did an ECG, to check Sally’s heartbeat. But she knew the equipment he was using wouldn’t pick up her kind of heart attack. Sally explained this to him, but it didn’t help. Her husband ended up phoning her cardiologist.

Sally: And even my cardiologist was almost begging this paramedic, “No, please. You must take her to the bigger hospital.” And he still was arguing. So it took 20 minutes for this paramedic to say, “Oh. All right then. We’ll take you to the further hospital.” And that could have been the last 20 minutes of my life.

Caroline, narrating: Sally’s story is shocking, but I didn’t find it surprising. Women who have a heart attack in the UK are 50 per cent more likely to be misdiagnosed than men. They’re also more likely to die. And it’s basically because the vast majority of medical data we have collected historically and continue to collect today, including in cardiovascular research, has been in the male body. Male humans, male animals. Even male cells.

The gender data gap in healthcare was actually what prompted me to write my book, Invisible Women. It was all about the systematic under representation of women in data. I was used to men being the default humans in politics, in the media. In books and films, and even bank notes. Long time fans might remember that as one of my greatest hits.

BBC News: The new Jane Austen £10 note announced today…

Caroline, narrating: But to find that this default male bias existed in science, in medicine… That the male body was being treated as an essentially gender neutral body, in a field that is, and I cannot stress this enough, literally focused on studying actual human bodies, and that the result of this default male bias was that women were dying? Well, that was a shock.

Dr Nishat Siddiqi: To put it in context, under the age of 65, double the number of women die of cardiovascular disease than from breast cancer. It’s the leading cause of death for women in the world.

Caroline, narrating: This is Dr. Nishat Siddiqi. She’s a consultant cardiologist, at a busy teaching hospital in South Wales. And 50 per cent of her workload is women. Nishat tells me that the most common symptom of a heart attack for both men and women is chest pain. For some women, the chest pain may feel more like a tightness across the chest. Research also shows that some women will experience different symptoms, like breathlessness, nausea, fatigue. But Nishat says, the symptoms aren’t really the problem. The problem is that even when women do present with the classic symptoms of a heart attack, like Sally did, no one believes that they are in fact having a heart attack.

Nishat: Women themselves present late on the whole. And when they do present, their symptoms are often dismissed as stress and anxiety related.

Caroline, narrating: It’s a bit of a vicious circle. The lack of research on women means doctors know less about what a heart attack looks like in women. This leads to underdiagnosis, which, in turn, fuels the perception that heart attacks don’t happen in women. And so, the cycle continues. Nishat tells me that doctors are often not thinking about female-specific risk factors when they’re assessing patients.

Nishat: So for example, if a woman has been pregnant and she has had a preterm baby… Or if she’s had pre-eclampsia, hypertension or gestational diabetes, that automatically… It’s like a warning, a red flag that she’s at a higher risk of developing cardiovascular disease. And yet when I see patients who’ve come to me from my other colleagues, quite often, nobody has bothered checking their past history.

Caroline, narrating: I think about Sally. Her cardiologists were so busy ignoring her ECG results – because a 36 year old female couldn’t possibly be having a heart attack – that they never thought to ask about her pregnancies. In fact, recent data shows that an increasing proportion of people hospitalised with a heart attack in the US are under 55. And the largest increase is in young women.

But because we have for years relied on medical data collected in men to treat women, doctors simply have not been sufficiently educated on female dominated risk factors. Which aren’t exclusively related to pregnancy, by the way.

Other risk factors include starting your periods before the age of 11, having irregular cycles, and suffering from autoimmune disorders, which themselves disproportionately affect women. The result of this male bias is that when women turn up at a hospital, like Sally did, they often don’t get properly examined.

Nishat: Across the board, men were more likely to receive all of the standard treatment compared to women.

Caroline, narrating: Nishat tells me about one study that compares men and women’s outcomes, after having had a heart attack.

Nishat: The only thing where women had a higher instance of getting something more than men in that study, was that women were more likely to die from a heart attack than men.

Caroline, narrating: In fact, according to research funded by the British Heart Foundation, more than 8,000 women died between 2002 and 2013 in England and Wales, because they did not receive the same standard of care as men.

Caroline, narrating: I knew about the poor outcomes for women who have heart attacks, from researching my book, Invisible Women. I also knew that these poor outcomes are intrinsically related to the over representation of men in cardiovascular research. There is a huge data gap here. And it’s one that I’ve been worrying about for years now. But there’s a related issue I mentioned in Invisible Women, that’s been making me even more worried.

BBC Click: Can artificial intelligence make healthcare better for all of us, and save the NHS?

Computer scientist Geoff Hinton speaking at the 2016 Machine Learning and Market for Intelligence Conference in Toronto: I think we should stop training radiologists now. It’s just completely obvious that within five years, deep learning is going to do better than radiologists. 

Caroline, narrating: Just to start with the basics, AI refers to artificial intelligence. Computers which have been taught to think for themselves. They use algorithms – basically a set of automated rules – to analyse data, and find answers to whatever questions are being posed by their human overlords. Or overladies! 

We’re all used to thinking of AI as neutral and objective. And sure, AI doesn’t literally hate women and want us to die. As far as I’m aware, AI has no strong feelings on the matter. But AI is only as good as the data we feed it. And when it comes to medical data on women, the data is pretty non-existent. Still though, you might ask, why does this make AI any worse than human doctors? They are also only as good as the data they are fed. And yes, that’s true. But humans also aren’t on the whole, any worse than the data they’re fed.

AI on the other hand, suffers from something called bias amplification. Readers of Invisible Women may remember a study I cited, where an algorithm was trained on a very commonly used open source image dataset. The dataset had a bias whereby pictures of cooking were 33 per cent more likely to include women than men. But after the algorithm was trained on this data set, it was connecting pictures of cooking with women 68 per cent of the time. That is, it was labelling pictures of men as female, just because they were standing in a kitchen. This example might not sound very life and death, but imagine an algorithm this biased let loose in a hospital.

Irene Chen: One of the things that we’re starting to really uncover is that yeah, the algorithm can magnify bias. But ultimately, the healthcare system in itself before AI ever entered the picture, is in fact inequitable.

Caroline, narrating: This is Irene Chen. She works on machine learning for equitable healthcare, and is currently a PhD student at MIT. I wanted to ask Irene about my concerns that adding AI to the mix risks making everything worse. The systematic underdiagnosis of women already leads to a perception among human doctors, that women don’t have heart attacks – that vicious cycle we mentioned. But what about an algorithm trained on a data set riddled with these missed heart attacks.

Irene: Now when an algorithm learns it… If they want to scale doctors, they want to help doctors out, they’ll just magnify what happened in the existing setting. And so, I think that kind of acknowledgement of the existing system being inequitable and that scaling that blindly would be bad, is incredibly important.

Caroline, narrating: Listening to Irene made me think of an article that I came across at the end of 2019, about eight months after Invisible Women was published. It was the headline that first sparked my interest.

Voice actor: AI to predict heart attacks at least five years before they occur.

Caroline, narrating: The article was illustrated with a picture of a woman, which pleased me. But the study of the article was based on, pleased me a little less. The studies the algorithm had been trained on were heavily male dominated. And the paper provided barely any sex disaggregated data. The lead author of the paper was quoted in the article, saying…

Voice actor: “We genuinely believe this technology could be saving lives within the next year.”

Caroline, narrating: Men’s lives, maybe. Women’s lives? Well, that might have to wait. I was so frustrated, reading this study. I knew what the research said about bias in healthcare, and bias in AI. And it made me so angry that algorithms like this were still being produced, their gender neutral claims credulously reported by the media. Why wasn’t anyone paying attention? And more importantly, what needed to be done to change this?

James Zou: Healthcare is one area where AI could potentially have some of the greatest impacts, in terms of human welfare.

Caroline, narrating: James Zou is a professor at Stanford University, where he leads a group developing AI. And his specialty is healthcare. I first came across James’s work, when I was researching Invisible Women. I cited a study of his which looked at a language learning AI. They found that the top occupation the algorithm linked to women was homemaker. While the top occupation linked to men, was maestro. Which is, of course, fine.

Caroline, narrating: James is pretty excited about what AI could do for healthcare. Especially its capacity to diagnose quickly and cheaply. But he also says there are a lot of issues to be ironed out, before it can be safely deployed.

James: I’ve been very interested in thinking about, how do we systematically evaluate, and how do we audit these AI algorithms? Similarly to how you might audit someone’s tax return, or how you might want to evaluate how well your car’s performing, every couple years. We also want to have systematic ways to really evaluate these AI models.

Caroline, narrating: But that’s not really happening right now. And yet over in the US, James tells me that the FDA has already approved 130 AI medical devices, to be used on patients.

I wanted to know how many AIs were currently in use, in the NHS. So I called on my fellow data enthusiast, Patricia Clarke, a data journalist at Tortoise.

Caroline Criado Perez: I’m very interested to know, do we have any in use in hospitals right now?

Patricia Clarke: Yeah. I’ll see if I can reach out to the NHS, and see if they’ve got a list of companies that they’re working with. And the kind of technologies that they’re using.

Caroline: Yeah. And how did they decide? What’s the quality control?

Caroline, narrating: After several weeks of trying, Patricia hadn’t really got anywhere. Which is really unusual, because she’s very good at her job.

Patricia: It’s an incredibly murky area. So the NHS has this AI lab that they use. And they’ve got case studies on their website, where they talk about ways in which they’ve trialled certain kinds of AI. 

And I spoke to someone with very good knowledge of that area. And they said, look. It’s impossible to know how much AI is being rolled out in the NHS, for a number of reasons. One being that the NHS, despite it being thought of as one singular body, is actually a decentralised series of independent trusts, and so on. And we could go into…

Caroline, narrating: Can I just add, this person works very high up in AI for the NHS. They should know.

Patricia: Even someone who is very close and has worked in the NHS in that kind of area, doesn’t know how often it’s being rolled out. We rely on the case studies that they put on their website and that they share with us. And when it gets into partnerships with private companies and so on, then for a whole bunch of reasons, we don’t know exactly how much is being rolled out. So that was really hard.

Caroline: Just to interrupt you for one second. I mean, that’s really poor practice. That they put these agreements with private companies who are in it to make profit and want to defend their proprietary software, above transparency with the public.

Patricia: Yeah.

Caroline: Which is, I think, really shocking. That part of the reason we can’t find out to what extent these algorithms and AI are being used in the NHS, seems to be because of non-disclosure agreements with private companies. Which just seems a little bit problematic.

Caroline, narrating: The lack of transparency and refusal to engage felt like PPE all over again. That was in episode one, by the way, in case you missed it. 

Most of the FDA approved AI, are computer vision type AI systems. They look at things like chest x-rays, and make diagnoses. This worries me because I don’t see any evidence that we’re ready for these kinds of algorithms to be deployed anywhere near patients.

Caroline, narrating: Here’s Irene again, telling me about an algorithm that had been trained on three prominent chest x-ray data sets. She wanted to know how well the algorithm would diagnose different subgroups, such as race, age, and sex. 

Irene: And so if we were to find, for example… Which we did. That there is pervasive underdiagnosis for different groups, then that would be incredibly concerning. Especially, as these large scale chest x-ray classifiers get rolled out.

Caroline, narrating: The subgroup that the algorithm did worst on, by the way, was women. Irene tells me that looking for bias in AI, is like a detective mystery.

Irene: At the very end, you see the dead body. You see, oh. The graph says, in this group, black patients are getting an accuracy of 70. And white patients are getting an accuracy of 90. Uh-oh. What’s going on? And so then, you start to take a look at all the clues that you have. 

Caroline, narrating: The clue Irene had in her chest x-ray study was that algorithms that are literally called ‘state of the art’, are systematically underdiagnosing female patients. When I read Irene’s paper, there was one bit that really hit me. She pointed out that previous work on these algorithms has basically just raved about their radiologist-level performance. But until her paper, nobody was even thinking about whether or not these algorithms were biased.

Irene: On paper, the performance is great. The reason it’s so concerning is because on paper, these algorithms can go faster, more consistently. And in some cases, even better than specialists who’ve trained in that specific field.

Caroline, narrating: But bodies don’t exist on paper. And in bodies, things look considerably less rosy. James, our Stanford professor, tells me about an algorithm that was analysing skin lesions to determine whether or not the patient had cancer.

James: And what we discovered is that those algorithms, which could be very powerful, are actually systematically much worse once applied on images from specific populations. So for example, if you have images from darker skin individuals, those algorithms end up being much less reliable. And they make a lot more mistakes.

Caroline, narrating: This is basically because the data fed into those algorithms was trained on mostly white patients. 

To be fair to the AI community, not everyone is like these researchers. Or the authors of that 2019 heart attack prediction paper. Plenty of people are actively trying to fix these problems.

James: Yeah. So I think the view in the community has definitely changed quite a lot, over the last five years. I think now in 2022, there is much broader acceptance and recognition that it’s really important to build AI algorithms that are trustworthy. And by being trustworthy, certainly reducing and evaluating and mitigating these biases, is one big component of that. 

Caroline, narrating: The trouble is, it’s just not that easy.

James: Now there’s still very much an open question about, what are the best approaches for doing that in practise? How do we actually evaluate an algorithm for potential biases? And how do we mitigate those biases, after we identify them?

Caroline, narrating: For a start, we’ve got a big data problem.

Irene: Sometimes, and this is the most insidious part, is that we don’t measure race or gender in the dataset at all. So we’ll have all these patients, and we don’t know some crucial information that would allow us to present all of these broken down metrics.

And that’s the most crazy thing to me. That’s the thing that drives me up the wall. Is that then, you can’t even do anything on these algorithms. Because then, you don’t know what the actual groups that these people belong to.

Caroline, narrating: And it’s not just that the data isn’t great, or it’s missing. Or it’s not sufficiently diverse, or it’s not disaggregated. It’s also that it’s completely un-standardised. Every hospital, every university has different ways of labelling, formatting and organising the data. And until we have a single standard…

Standards klaxon 

Caroline, narrating: Ah, standards. My favourite thing. Anyway, without a standard, all these diverse data sets will remain in their own silos. It’s a huge waste of knowledge. The obvious solution here is to collect more and better data. But as Irene explains, this sounds easier than it actually is.

First of all, we would have to identify what data we need. And this would require a huge data analysis project, to identify all the gaps and biases that exist in the data we have. Only then, could we go out and collect our new data. This could, and arguably should be done. But it’s really time consuming and expensive. So it’s not a quick or easy fix.

I discussed loads of other potential solutions to the bias problem in AI, with both Irene and James. I got really excited by an idea I came across in a couple of papers that suggested closing the data gap by synthesising data. Basically, making it up. But Irene isn’t convinced. She tells me about an algorithm that was meant to be generating new data. The developers ran a little test, to see if it was working as intended.

Irene: So they’ll have a fill in the blank question. It’s like, this patient should go to… Blank. They have these symptoms, this patient should go to… Blank. And if you ask the algorithm to fill in the blank here, for a white patient, it might say the hospital. They should go to the hospital. And then for a black patient, it would say, this patient should go to jail.

Caroline, narrating: So if more data isn’t going to fix everything, what else are AI researchers trying? There’s been a lot of discussion about what are called, black box algorithms. This is basically an algorithm where we really don’t know what’s going on inside it. And this makes it pretty much impossible to explain why the algorithm has come to any decision.

Some people think the solution is to make algorithms simpler. That people should have a right to an explanation. But James explains that the problem with this idea is that, in order to be explainable, algorithms have to be simpler. And simpler algorithms really aren’t sophisticated enough, for a field as complex as healthcare. He compares our expectations of understanding algorithms, to our expectations of understanding how the pill you took for your headache works. It’s a good point. I haven’t got a clue what the paracetamol’s doing in my body, once I’ve swallowed it.

James: If I’m trying to take a drug, and I try to explain that to a user who’s not a chemist or who’s not a biologist, then I have to make some simplifications and say, okay. So maybe this is a drug… I don’t have to explain the exact molecular details of the drug, but I can say, okay. This is meant to help to reduce your headache.

Caroline, narrating: So collecting better data isn’t a simple fix. And neither is creating explainable algorithms. What about, if we tried asking the algorithm different questions? So instead of asking the algorithm to prioritise overall accuracy, as in, being right 90 per cent of the time, we instead asked it to prioritise fairness? As in, not being more accurate for men than women. But without better data, the algorithm tends to do this by just being less accurate for men, rather than by improving accuracy for women. So it’s fairer, but it’s not actually as good.

Irene: That’s often not a conversation that doctors want to have. If my grandmother is being treated, I want her to have the best possible algorithm. But at a high level, it’s a non-starter to go to a clinician and say, “Oh. I want you to use this lovely AI algorithm. There is actually a better one out there, but use this slightly worse one at a top level, overall performance.”

Caroline, narrating: But even if we manage to fix these coding issues, Irene explains that it’s not likely to happen any time soon.

Irene: A lot of it is about incentives right now. The truth is that there’s not really a great incentive structure to report. You would never be able to get a paper published, without saying what the top line performance is. There should be an expectation that just as you should have the top line metric, maybe you should also have to break it down, by different well known groups.

Caroline, narrating: Finally, a tangible solution. Journals and machine learning conferences could simply refuse to publish papers that don’t break down their results. Although Irene says, a lot of it is about the availability of data.

Irene: It is no fun to go door to door, asking for data. It is no fun sitting around saying well, if only we had this data, then we could really ask the questions. It is much more fun as a machine learning person to say, “Oh. We have this giant data set. Let’s see what happens.”

Caroline, narrating: So who is going to actually change these incentives? And make tech developers want to focus on fixing bias, instead of just building shiny new algorithms?

This was all starting to remind me of Dr. Katrina Hutchison. She’s the bioethicist we met back in the PPE episode, who told me about moral aggregation problems.

Katrina Hutchison: So moral aggregation is the idea that you have small, on their own sort of harmless or morally not noteworthy kinds of things. But when you get a lot of them or you get different types of them, they have a kind of cumulative, or aggregative effect.

Caroline, narrating: This really feels like another one of those problems. There’s no one obvious person or institution, who can fix it. Fixing bias in AI is going to take changes in all sorts of different places. Journals, universities, individual researchers, funders, regulators.

When I spoke to Katrina, she also told me about another concept that I thought was really helpful for thinking about how we might fix AI. It’s called the tragedy of the commons. And it’s kind of about cows.

Cow mooing sound effects

Katrina: It’s the kind of problem that arises when you have a common pool resource, or a shared resource. And it’s in the interest of each individual, to take more than their share. And if just one individual does, that won’t cause any problem for the group as a whole. But if everybody does, it will.

One of the most familiar framings of the problem by Garrett Hardin is, you have a common land with grazers who are grazing stock on that land.

Caroline, narrating: So imagine you’ve got a herd of cows. It would be in your interest to graze as many cows as you could. You might get more milk that way, or more calves.

Katrina: But cumulatively, the commons may become overgrazed. And might suffer a catastrophic, environmental collapse.

Caroline, narrating: It’s perhaps not a direct one to one, since AI developers aren’t exactly working from a common resource. But the principle of the incentive for the individual researcher or company, versus what works for the group, i.e humanity as a whole, feels very relevant.

Katrina has done some research, applying the idea of the tragedy of the commons to the development of medical devices. And she’s found similar problems to what I found with AI. And PPE, for that matter.

Katrina: And then, there’s this expediency issue. There’s just so many forces that are saying, do things more quickly, do things more cheaply. Make medical devices that suit everybody. Not different devices for younger people, older people, women, men… 

Caroline, narrating: For an individual AI developer, it might make sense to create an algorithm that doesn’t consider sex. Because, it will be quicker and cheaper to develop. But when that algorithm turns out to discriminate against women, well, that’s a tragedy of the commons. Because, the end result is women dying from preventable causes.

So how can we make the individual AI developer care about the commons? Katrina says there are two approaches. The first would be for members of the AI community to come together as a group, and agree on a shared set of rules. A standard, if you will.

Standards klaxon

Katrina: There has to be an agreement that everybody will do the right thing in the same way. Otherwise, some people will exploit the good will of other people.

Caroline, narrating:The other approach is regulation.

Katrina: If you’re going to be punished, or you’re not going to be able to have your device on the market unless you conform with certain kinds of policies, you’ll do it. Even if otherwise, if the policy wasn’t there or the rule wasn’t there, it wouldn’t have been in your interest to do so.

So even if it’s burdensome or costly… If that’s the rule, and that’s the only way you’re going to be able to play the game or be able to have your device on the market, you will follow that rule. 

Caroline, narrating: There are other tangible solutions too, like increasing diversity in the AI workforce, which is incredibly male dominated.

James: I think it’s one of the factors that we definitely want to be very thoughtful about. So as you know, I think that the AI workforce of AI researchers and AI developers is not very diverse. It’s definitely not balanced, across gender.

Caroline, narrating: You can say that again, James. In fact, a 2019 study reported that only 18 per cent of authors at leading AI conferences are women. And more than four out of five AI professors are men. The same study revealed that women made up only 15 per cent of AI research staff at Facebook, and only 10 per cent, at Google.

Research shows that scientists from underrepresented groups often study topics that have traditionally received less research effort. They also tend to produce more novel research. But they get less funding, and their innovations are taken up at lower rates.

A lack of diversity in innovation can also produce some embarrassing mistakes, as Apple found out, when they launched their ‘comprehensive’ health tracker app back in 2014. The app could track your intake of copper and molybdenum – yeah, I don’t know what that is either. But it couldn’t track your period.

James also thinks progress could be made further up the pipeline. For example, at universities.

James: And I would love for these topics to be more deeply integrated into the curriculum, like in computer science, in data science. In statistics. So oftentimes, you would maybe have some core classes in machine learning and AI. And then there would be some separate classes, that are more specific topics on ethics or policy. But I think it would be more effective if those classes ended up integrated.

Caroline, narrating: And remember James’ whole thing about auditing AI? Well it turns out, some people are working on creating AI that can audit AI.

James: And I think that’s going to become increasingly common. Whereas… That basically, the auditors are going to be some combination of human plus AI, to systematically check and value all these models.

Caroline: And who’s going to audit the auditors? 

Caroline, narrating: Come on. That was a great line. So anyway, there are encouraging noises. But I still felt unsatisfied. It didn’t feel like we were getting to the bottom of a slam dunk fix. It all felt really messy. But then something Irene said made me wonder if I was looking at this all wrong.

Irene: AI right now, is about predicting the future if nothing changes. And if we could switch that thinking into, AI should be about empowering people who haven’t had power in the system before. Or understanding what would happen if we make changes in the system. And how would we go about using AI to get there? So this is my glass half full pitch, about how health inequities could be addressed if we start from the very beginning and we say, what are known health inequities that we want to address? And how would we go about doing that?

Caroline, narrating: What Irene’s saying, reminds me of that thing that Henry Ford probably didn’t actually say.

Voice actor: If I’d have asked people what they wanted, they’d have said, “Faster horses.”

Caroline, narrating: Instead of course, he built a car. By the way, I know Henry Ford didn’t come from Mississippi. But my husband does. And it’s the best we could do, at short notice.

So instead of using AI to try and replace doctors, maybe we should be using AI to fill in the gaps doctors and researchers have left open for decades. Maybe we’re being too unambitious.

Irene: There are diseases that affect women that are under-studied and under-diagnosed. Endometriosis is one of those, for example. On average, I believe it’s seven years for diagnosis. There are researchers at Columbia University, for example, who have made it their life’s work to say, can we use machine learning to diagnose endometriosis earlier? And can we better understand this disease that we don’t know that much about, because it’s underfunded and it’s diagnosed so late?

Caroline, narrating: Okay, now we’re getting somewhere. Now, I’m actually starting to feel excited. And then, Irene tells me about this amazing project she’s working on.

Irene: So this work is with Dr. Bharti Khurana, who is at Brigham And Women’s Hospital. And so together, we’ve been looking at this question of how we can use all of these observational electronic health records, to better understand, and perhaps do early detection for intimate partner violence.

Caroline, narrating: By the way, intimate partner violence is a form of domestic violence.

Irene: Intimate partner violence itself is tricky. Because, there is an underdiagnosis problem. Patients are reluctant to come forward, because of the stigma. They don’t know what the consequences will be. They might be reliant on their abusers. They’re mistrustful of healthcare professionals. And also clinicians and healthcare practitioners are often not on the lookout for these things. They’re not trained in the right way. They might be busy, they might be resource constrained. They might only have a few minutes with each patient at a time. And so, it’s very difficult to get the diagnosis. 

Caroline, narrating: Naturally, this lack of data didn’t deter Irene. Together with her research partner, Dr. Bharti Khurana, she started to aggregate what data was available. Things like health insurance billing codes, patient self reports, radiology reports. And built a data set, from scratch.

Irene: Using that, our preliminary data that we looked at for almost a thousand patients overall, is that we were able to predict about three years in advance, of them entering a violence prevention programme. We could actually see these patterns of what’s going on. 

Caroline, narrating: Okay. Can we just stop here, to appreciate how amazing this is? This could be a total game changer for domestic violence prevention. This could save women’s lives. I was so excited about this. I had to tell Patricia.

Patricia: Three years is incredible.

Caroline: Yeah. Right? Three years. I mean…

Patricia: And quite scary as well.

Caroline: The number of lives that could be saved, by early detection like this.

Patricia: So that must mean that there’s just potential for lots of research in this area. I mean, this is just the first study of its kind, right?

Caroline: Yeah. For sure. And that’s what’s so exciting about it. That’s why I suddenly felt, oh, okay. Right. So there is something here. Yeah. There is hope.

Irene: And what’s more, we can find what are risk factors or smoking guns that if we see on a radiology report would be a huge thing that we’d be concerned with. One of which is an ulna fracture. So the ulna is the bone in your forearm. And essentially, people might come in and say, oh, I fell. And that’s why I fractured my ulna. But if you fall, you fracture your wrist. And the ulna is much more likely to be a defensive wound, from above.

Caroline: Wow.

Irene: And so, the ulna fracture can then become an almost very high predictor that something is going on. And so if we were able to find even a few more of these predictors, we could then empower clinicians. We don’t need to roll out a full on AI model just yet. We could even just find these patterns, and better understand what’s going on.

Caroline, narrating: I’m just blown away by this. Already, Irene’s work has produced tangible results that can be used to help patients right now. Even without rolling out an algorithm. And as Irene says, what makes this research so important is that it’s filling in a data gap.

Irene: No clinical trial would study this. This is definitely not something that is prioritised, funding wise. So it’s a very exciting project. It’s definitely ongoing. And I am excited to see what else we can find.

Caroline, narrating: It’s incredibly exciting. Oh. And yes, Irene did sex-disaggregate her data. But her approach is not very popular yet. She tells me, there aren’t many people working on health inequities in machine learning. Still, the work is being done. When I was researching this topic, I read about an algorithm that looked through studies on 11 different diseases going back 52 years, to determine the extent of the data gap. This is hugely important work. It’s also hugely costly, and time consuming to do it by hand. Being able to automate this is a total game changer. We can finally stop having the argument over whether the data gap in health research even exists, and just focus on fixing it.

Caroline: Okay. Final question, are you optimistic that we can fix these bias issues before AI becomes widespread in healthcare?

James: I am. Even though we’ve talked about a lot of these challenges, I’m still very excited about the potential of AI in healthcare. I think it could really become transformative, and improve the health and wellbeing of really broad populations.

Caroline, narrating: Irene thinks we should get Batman on the case.

Irene: I definitely am always for more collaboration. Even between machine learning researchers, or machine learning researchers and clinical practitioners. And then, even bringing in ethicists and anthropologists, and people who… you know, economists, public health people. There’s a whole branch of people who could all work on similar problems, from slightly different angles. You can think of it as the Justice League coming in. You have this superpower, but I have this superpower. I mean, I guess nowadays it’s Marvel. But…

Caroline: I have no idea. You don’t need to worry about that.

Irene: Who knows? Yeah. If we could bring all of this together, then that would be great. And we could leverage each other’s strengths.

Caroline, narrating: For now though, we’re just not ready to replace human doctors with HAL 9000. You remember HAL? The AI that tried to kill off his entire spaceship crew in 2001, A Space Odyssey?

At the moment, humans have a major advantage over AI. And it’s simply that, they’re human. They’re flexible. And able to listen, and understand when they’re wrong. Even if like Sally’s paramedic, it sometimes takes a full 20 minutes and an intervention from a senior cardiologist.

Caroline: Why do you think he was so reluctant to believe you?

Sally: All I know was the situation at the time, he’d had a directive to say that you can’t have a heart attack without something called ST elevation. And ST elevation is a kind of elevation on your ECG. I knew that their ECG machine wasn’t picking it up. And I knew I was having a heart attack. But he was following his protocol, which told him no ST elevation, no heart attack. And that’s why I think, it’s got to be a human response as well as a protocol, generic response.

Caroline, narrating: We can make people better. And in fact, we are. Even if it’s slow work.

Sally: I have to say that when I got to the hospital, the care was absolutely amazing. And they knew about my condition, because I’d been raising awareness. So had everyone else. So actually when I went in, they went, we know exactly what to do with you. We’ve had the SCAD research team in, telling us about this. And they knew exactly what to do. The treatment was amazing. But that wouldn’t have mattered, if I hadn’t made it that far.

Caroline, narrating: Sally was lucky that those 20 minutes didn’t cost her her life. But women shouldn’t have to rely on luck. Women should be able to rely on data. Better data means better doctors. Both of the human, and the AI persuasion. And AI could have an important role to play in collecting and auditing that data. Meanwhile, I’m excited to see what Irene does next. And to hear about the next algorithm, that we didn’t even know we needed.

Thanks for listening to this episode of Visible Women, from Tortoise. If you’re a Tortoise Plus listener on Apple Podcasts or a Tortoise member, listen out for a bonus episode coming on Friday. As for the rest of you, if you are hungry for more, go to tortoisemedia.com/caroline and use my code, Caroline30. 

This episode was written and produced by me, Caroline Criado Perez, alongside Hannah Varrall and Patricia Clarke. The executive producer is Basia Cummings. It features original music by Tom Kinsella and sound design from Studio Klong.