Artificial Intelligence in Radiology: Considerations for Racial Equity

Brandon Campbell, Daniel Kim,

 

 

Introduction

The amount of radiologic data is rapidly increasing compared to the number of trained radiologists (Hosny et al., 2018). Specifically, studies have shown that, on average, a singular radiologist needs to spend between three and four seconds per image to keep up with the pace of image production (Hosny et al., 2018). The pace at which the radiologists have to operate predisposes them to make medical errors, negatively affecting the patients they are treating (Hosny et al., 2018). Therefore, there is a need to create tools to help radiologists analyze these images, and artificial intelligence (AI) may provide a promising pathway to help optimize the radiological workflow, but there must be stringent oversight to ensure racial equity in its usage. However, it is equally important to ensure that the development and implementation of AI in radiology are done with stringent oversight to promote racial equity. Without addressing the lack of diversity in AI development, these technologies risk perpetuating and exacerbating existing health disparities among different racial groups.


Existing Practices

Out of a desire to manage increasing workloads, enhance diagnostic accuracy, optimize workflows, and leverage advancements in machine learning technologies radiologists began incorporating AI into their practices. Integrating AI into various clinical contexts has demonstrated increased diagnostic accuracy and improved risk classification. In a study across multiple medical centers, AI was used to diagnose acute respiratory failure; the study found that there was a 4% improvement in accuracy (Jabbour et al., 2023). Previous data indicated that baseline diagnostic accuracy for acute respiratory failure by human clinicians was approximately 70%. The incorporation of AI raised this accuracy to around 74%. Beyond diagnostic accuracy, AI has fostered the development of new technologies that optimize current imaging techniques. Existing high-strength MRI systems provide high-resolution MRI scans, but they are confined to medical centers that may not be available in rural areas. Researchers from the Martinos Center for Biomedical explored using Convolutional Neural Networks (CNNs) to develop synthetic images from low-strength MRI images obtained from portable MRI systems. These portable devices can produce images similar in quality to those of high-strength MRI scanners (Iglesias et al., 2023; Keen, 2023). The introduction of AI, in this case, produced images similar to high-strength scanners and opened the avenue for more equitable outcomes for populations who do not have the best access to medical centers with high-strength MRI machines.

In current practice, AI has been used for various roles, ranging from generating radiology reports to being used in medical education. The development of radiological reports, which entails the analysis of a patient’s scan, is being heavily targeted with the usage of AI. In a study, ChatGPT, a form of generative AI (GenAI) that can take in textual or image inputs and output human-like responses, was utilized to analyze and develop reports for cases of distal radius fractures (Bosbach et al., 2024). It was found that ChatGPT could generate the text to analyze the images, though errors in its interpretations of the data and text generation remain (Bosbach et al., 2024). However, the tool still offers the potential to increase the speed at which the reports are generated for radiologists. Additional studies have demonstrated AI can develop methodologies for image extraction autonomously in the future (Nakaura et al., 2024). Furthermore, AI has shown potential in interpreting radiology reports so patients can better understand the detailed and technical reports that scientists write (Nakaura et al., 2024).

GenAI has also provided the potential for optimizing how current and future radiologists learn about the field. Using the vast literature available about radiology, GenAI, like ChatGPT, can engage with resources to allow radiologists to have questions answered in real time, get explanations for various concepts, and access resources to broaden their knowledge (Mese et al. 2024).

Technologies

Corresponding with the integration of AI into radiology and other healthcare domains, many new technologies have been developed that aim to increase efficiency across the radiology landscape. The two general forms of AI include engineering features with mathematical models and deep learning. The mathematical approach consists of generating inputs based on expert definitions for machine learning models but is often unable to adapt to various imaging modalities like CT, PET, and MRI scans (Hosny et al., 2018). On the other hand, deep learning does not require previous definitions and can automatically learn feature representations (Hosny et al., 2018). In certain studies, deep learning methods have been found to be able to perform at the same level as trained radiologists in tasks like detection in ultrasonography and segmentation in MRI, though not perfectly (Wang et al., 2017).


Disparities Resulting from AI Usage

While AI promises to revolutionize the field of radiology, there are serious limitations that threaten to exacerbate racial disparities in health outcomes. These racial disparities arise from the fact that most algorithms are trained on datasets that lack diversity (Arora et al, 2023). A broad array of evidence suggests that current models underperform when used to analyze data for underrepresented minorities (URMs) , resulting in misdiagnosis in groups already at risk of worse health outcomes (Seyyed-Kalantari et al., 2021; Tripathi et al, 2023). For example, imaging in pediatric emergency departments is used less for Black patients relative to White patients (Payne and Puumala, 2013). This creates an issue where Black children may not receive the same level of diagnostic attention and care, leading to delayed or missed diagnoses, which can exacerbate existing health disparities and result in worse health outcomes for these patients. Other algorithms like those deciding who gets a renal transplant show racial bias when they overestimate glomerular filtration rate in Black individuals resulting in disqualification for dialysis and transplantations (Diao et al., 2021). These algorithms may overestimate glomerular filtration rates (GFR) in Black individuals due to the race correction factor used in equations like the CKD-EPI, potentially resulting in their disqualification for necessary dialysis and transplantation. While the race correction factor was initially introduced to account for differences in serum creatinine levels between racial groups, its use is controversial. Some argue that removing this correction factor could lead to better patient outcomes by providing a more accurate assessment of kidney dysfunction, though others warn it may also result in overestimating kidney dysfunction. The existence of this debate in and of itself accentuates the need for further research to determine the best approach to using or removing race correction factors to ensure equitable treatment outcomes for all racial groups.


Diversity in AI data

Achieving a truly diverse dataset is a challenge in and of itself. Individuals in underserved communities often do not have equal access to healthcare where training data is gathered from. There are also many legal and ethical concerns with data sharing. Datasets can only be retrieved from willing individuals, so datasets are skewed toward populations with the understanding and willingness to divulge their personal health data to a third party which many are reluctant to do. Especially among URMs, systemic barriers like lack of awareness for disease screening, lower socioeconomic status, and inadequate health insurance coverage preclude patient compliance or willingness to engage in routine radiology screening (Wang et al., 2019). Until this is rectified, achieving a diverse data set will remain challenging

These challenges are exacerbated by the high volume of data required to train algorithms. When training datasets are not diverse, racial bias can be readily introduced when evaluating radiological data from URMs (Ross et al, 2021). This happens because URMs may not be adequately represented, leading to poorer performance and higher error rates for these populations. Biases can stem from data collection methods, existing systemic biases in healthcare, and labeling practices, all of which can skew the AI model’s predictions. Other areas where racial bias may be introduced during the training data stage include utilizing training datasets that lack sufficient diversity, including corporate datasets (Kidawi-Khan et al, 2024). Previous research studies have found that using publicly available data introduces racial bias to URMs groups in research (Vaidya et al, 2024). This group also recommends, on a policy front, that regulatory agencies integrate “demographic-stratified evaluation into their assessment guidelines.” Other researchers have found that CT datasets like the MIT MIMIC-CXR dataset has the following racial background: “White (67.64%), Black (18.59%), Hispanic (6.4%), Native American (0.29%), and other races (3.83%)"​ (Tripathi et al, 2023). This imbalance can lead to AI models underperforming on minorities, resulting in misdiagnoses and poorer healthcare outcomes for underrepresented populations. Long term this can undermine trust in AI technology as being unfit for use in a medical context when the major issue has less to do with the technology and more with what data the technology has been given as a basis for its responses.


Diversity in development of AI

As AI grows to become more integrated in the field, it raises the question of how to train individuals to utilize such technologies (Patino et al, 2024). While many radiologists are concerned with algorithmic bias, the lack of diversity within the field itself poses a challenge for diversity being integrated during the development of algorithms (Li et al, 2024). In the field, only one in ten are a URM and only one in four are female (Doddi et al, 2024). Black and Hispanic radiologists are particularly underrepresented even when taking into consideration that other non primary care specialities lack diversity (Wu et al, 2024). Given that these are two groups that face racial disparities in current algorithms being used, it is imperative that representation changes to reflect the demographics of local populations. Despite numerous diversity, equity, and inclusion (DEI) initiatives, the progress in achieving a more diverse population of radiologists has been slow. Many DEI programs fail to address systemic issues that prevent underrepresented minorities from entering and advancing in the field. These issues include insufficient support and mentorship, bias in recruitment and promotion processes, and a lack of early exposure to radiology careers. Additionally, socioeconomic barriers and educational inequities mean that fewer Black and Hispanic students have the resources or opportunities to pursue careers in radiology. 

Furthermore, existing DEI initiatives often lack measurable goals and accountability structures, leading to limited impact. Programs that do not actively involve URM communities in their planning and implementation may not fully understand or address the specific challenges these groups face. As a result, while DEI initiatives are well-intentioned, they often fall short in creating the significant, sustained changes needed to increase diversity among radiologists. To truly make a difference, these initiatives must be more comprehensive, inclusive, and focused on long-term strategies that address the root causes of underrepresentation.

Recent advances in artificial intelligence (AI) due to robust research and increased number of datasets have allowed for machines to manage data at a performance level equal to if not better than humans in various tasks1. AI algorithms have the potential to enhance task performance in radiology, but proper AI integration has to minimize racial bias and incorporate a wide variety of data. To that end, several approaches may be employed by medical experts and data specialists.

Radiology patient information should be private and de-identified whether through pseudonymization or total anonymization. Patients must be clearly and completely informed on what information they are releasing, where it is going, and how it is being used. Ongoing consent check-ups are also crucial to ensure patients are still willing to have their information used. Often minority communities tend to be reluctant to release information out of the fear of being exploited as they may have been many times before. The transparency and security of the process of using their information may make these minority populations more comfortable with releasing it which would make AI more cognizant of their needs therefore increasing AI efficiency in relation to URM populations. Race should be included in patient information due to the capacity of AI to detect it as well as its complex ties with socioeconomic factors that affect daily lives (Ball et al, 2024). To mitigate underrepresenation of individuals in racial minority groups and thus prevent selection bias4, data experts should mandate the inclusion of diverse demographic data in training datasets and regularly audit the datasets for representation across different groups. Increasing the screening of more people who are in racial minority groups to improve their representation in data should be done with consent of these individuals. Acquiring a wide variety of datasets is also necessary for enhancing data quality and representation of all races. This could involve establishing an open source platform that many medical institutions can use to examine as much information as possible and collaborate to improve AI algorithms.

AI algorithms should be thoroughly examined, optimized, and tested to minimize racial bias as well as to increase applicability to underrepresented populations. AI systems should be scalable and adaptable to different healthcare environments, including those with varying levels of resources. This approach can help ensure that the benefits of AI are accessible to diverse populations and settings. The latest AI software should be utilized in all medical centers to prevent underdiagnoses that often occur in under-served subpopulations (Hosny et al, 2024). This could be done by advocating local and state healthcare systems to mandate the usage of the latest software.

Regarding the metrics of these algorithms, there are various considerations that should be made. These systems should have high F1-scores, the harmonic mean of precision and recall, to help assess the model’s performance in detecting conditions where both false positives and false negatives are critical, such as in cancer diagnosis. Analyzing F1-scores across different racial groups can highlight discrepancies in the model’s effectiveness in balancing precision and recall ‘

for those groups. Accuracy, the ratio of correctly predicted instances to the total instances, should be carefully examined. High accuracy doesn’t guarantee equitable performance across different racial groups. If the training data underrepresents certain racial groups, the model might perform poorly for those populations despite having high overall accuracy. Other data processing methods should be utilized to create optimized algorithms based on the needs of radiologists.Techniques like data augmentation, which increases dataset diversity, and synthetic data generation, which creates realistic images representing underrepresented groups are excellent for optimizing AI algorithms in radiology. Transfer learning allows models pre-trained on large, diverse datasets to be fine-tuned for specific radiology data, improving performance with limited labeled data. Bias correction algorithms help identify and mitigate biases in training data, ensuring more equitable outcomes in AI-driven diagnostics. In addition, the usage of data augmentation to simulate diverse samples in datasets should be evaluated. This alone should not be done as this is artificially introducing AI systems to diverse data. Rather, experts should reach out to minority communities to increase representation and compare new data with data from augmentation processes to assess data quality.

Conclusion

Integrating AI into radiology promises significant improvements in diagnostic accuracy and workflow efficiency but risks exacerbating racial disparities if not carefully managed. Ensuring diverse and representative datasets, transparent consent processes, and continuous auditing for bias are essential steps to mitigate these risks. It is also important to involve a diverse range of stakeholders during AI development in order to properly address the needs of all patient populations. By prioritizing these ethical considerations, we can enhance radiological practices and promote racial equity in healthcare.