Breast Cancer

Breast cancer is the most common specific type cancer among women globally (Sung et. al., 2021). Among women, breast cancer accounts for 1 in 4 cancer cases and for 1 in 6 cancer deaths, ranking first for incidence in the vast majority of countries (159 of 185 countries) and for mortality in 110 countries (Sung et. al., 2021). Most cases occur in women over the age of 50 years, but it can affect younger women as well. Other risk factors include a genetic predisposition, family history, early onset of menstruation, hormone replacement therapy, alcohol consumption, and obesity (Łukasiewicz et al., 2021).

The breast is composed of milk-producing lobules, a system of transport ducts, and fatty tissue (Bazira et al., 2021). All breast cancers originate in the cells lining the terminal duct lobular units (the functional unit of the breast) of the collecting ducts. The most common type for male breast cancer is invasive ductal carcinoma, which starts in the milk ducts and invades nearby tissues (Harbeck et al., 2019). Breast cancer development involves genetic mutations that cause uncontrolled cell proliferation as well as the BRCA1 and BRCA2 genes, which are involved in DNA repair (Harbeck et al., 2019). Estrogen and progesterone receptors play important roles in the pathophysiology, all patients with tumors that express these receptors should receive hormonal therapy to block estrogen receptor activity (Harbeck et al., 2019).

Breast cancer can manifest in several ways. The most common clinical feature is a lump in the breast, changes in the nipple’s size, nipple discharge and skin changes as well as infection and/or inflammation of the breast (Koo et al., 2017). Early-stage breast cancer is often asymptomatic, underscoring the importance of routine screening (Alkabban et al., 2022).

Breast cancer is generally diagnosed through screening or a symptom (pain or palpable lump) that prompts a diagnostic exam (McDonald et al., 2016). These are supplemented by imaging techniques to look for abnormalities and characterize them in more detail (McDonald et al., 2016). A breast biopsy is usually performed to confirm the presence of cancer if suspected which can also determine its specific type if the lesion is cancerous (McDonald et al., 2016). Breast cancer is staged based on the extent of the tumor, the spread to nearby lymph nodes, the spread to distant sites, estrogen receptor status, progesterone receptor status, HER2status and grade of the cancer (McDonald et al., 2016).

There are different types of breast cancer and treatment can vary based on the molecular characteristics of a patient’s disease, stage, cancer type, receptor status (Hong & Xu, 2022). Treatment usually involves a combination of different modalities and a multidisciplinary team of healthcare professionals (Hong & Xu, 2022). Surgical options range from breast-conserving procedures to mastectomy, where the entire breast is removed (Hong & Xu, 2022). Lymph node removal may also be necessary to assess the extent of cancer spread (Hong & Xu, 2022). Radiation therapy is often used after breast-conserving therapy or mastectomy (with risk factors) (Hong & Xu, 2022). Systemic chemotherapy can be administered before or after surgery, depending on the specific situation (Hong & Xu, 2022). Hormone receptor-positive breast cancers can be treated with drugs that block the effects of estrogen and progesterone. Immunotherapy is an emerging treatment option for certain breast cancers, helping the immune system recognize and attack cancer cells (Hong & Xu, 2022).

Imaging Techniques

Digital Mammography

Digital mammography is the most commonly used technique for breast cancer screening. It is a two-dimensional summation technique whereby X-rays emitted by an X-ray tube are absorbed to various degrees by tissues and measured by a detector on the other end. Denser tissues appear brighter on the resulting images than less dense tissue. The breasts are compressed while the image is acquired to spread the breast tissue over a larger surface area (Ikeda, 2011a). This reduces the overlap between different components of breast tissue, decreases the scatter of the passing X-rays, and improves contrast. Two views of each breast are usually acquired - craniocaudal (CC) and mediolateral (MLO) (Ikeda, 2011a).

Digital mammography is a fast and useful technique for breast cancer screening, but it has its drawbacks (Ikeda, 2011a). Breast compression can be painful and overlapping of different tissues despite compression often leads to artifacts (Ikeda, 2011a). The upper inner quadrant of the breast, which is less mobile as it is fixed to the chest wall, is particularly hard to visualize on mammography (Ikeda, 2011a). Cancer can also be very hard to see on mammography in breasts with a large proportion of dense tissue (Ikeda, 2011a).

Digital Breast Tomosynthesis

Digital breast tomosynthesis (DBT) involves image acquisition utilizing an x-ray source which moves along an arc of excursion. Thin slices are reconstructed allowing for 3D imaging capabilities which are meant to minimize the influence of overlapping breast tissue and is particularly useful for imaging breast lesions located in heterogeneously dense breast parenchyma. One study found that DBT is more sensitive for the detection of breast cancer than digital mammography (DM). DBT can be combined with DM and one study found that using a combination of these techniques improves detection of breast cancer (Alabousi et al., 2020; Lei et al., 2014; Skaane et al., 2019), and can be combined with mammography. However, DBT takes longer to acquire than mammography and suffers from motion and other artifacts (Tirada et al., 2019).

Ultrasound

In diagnostic ultrasound, a transducer emits high-frequency sound waves that travel through tissues, bouncing off them and creating “echoes” that are reflected back onto and detected by the transducer. These echoes are then processed to create real-time images on a monitor based on the time it takes for the echoes to travel to the tissues and back. It is a safe and relatively low-cost technique that is often used as an adjunct to mammography (Ikeda, 2011b), especially for further evaluating a palpable or mammographic finding.

It can even be used as a primary screening modality in women under the age of 30 years or in pregnant or lactating women (Dixon, 2008; Ikeda, 2011b). Ultrasound is very useful in clarifying if a mass is cystic or solid, what kinds of margins it has, and its vascularity (Dixon, 2008; Ikeda, 2011b). It also helps detect other masses and suspicious axillary lymph nodes (Dixon, 2008; Ikeda, 2011b). Its main drawback is that the quality of the examination is highly operator-dependent (Dixon, 2008; Ikeda, 2011b).

Magnetic Resonance Imaging

Using a powerful magnetic field and a series of radiofrequency waves, magnetic resonance imaging (MRI) perturbs hydrogen nuclei in tissues to create detailed cross-sectional images of the body (Daniel & Ikeda, 2011; Mann et al., 2019). Because tissues with different compositions respond to this perturbation in different ways, MRI can detect even subtle differences between types of soft tissue very well and is considered the most sensitive modality to diagnose breast cancer (Daniel & Ikeda, 2011; Mann et al., 2019). It is mostly used for screening high-risk patients based on genetic or acquired risk factors (Daniel & Ikeda, 2011).

Breast MRI requires dedicated breast coils that transmit the radiofrequency waves and receive the generated signal. Images are often acquired with an in-plane spatial resolution of 1 mm, a slice thickness of under 3 mm, and suppression of signal from fat. Commonly used sequences include T2-weighted images, diffusion-weighted imaging, and dynamic contrast-enhanced MRI. To reduce false positives due to non-specific breast parenchymal changes, it is best performed between days 7 and 13 of the menstrual cycle (Daniel & Ikeda, 2011).

Unlike mammography, MRI does not involve the use of ionizing radiation and produces 3-dimensional images that facilitate the detection of very small lesions (DeMartini & Lehman, 2008; Shahid et al., 2016). MRI also allows a more detailed assessment of the chest wall than mammography and ultrasound (DeMartini & Lehman, 2008). Drawbacks of breast MRI include a low sensitivity to microcalcifications, high cost, and the fact it is contraindicated in people with certain metallic implants (Daniel & Ikeda, 2011).

Screening and Diagnostic Challenges

Despite evidence for the overall benefit of breast cancer screening (Dibden et al., 2020; Kalager et al., 2010; Tabár et al., 2019), it suffers from several technical and logistical challenges. More than half of women screened annually for 10 years will have a false positive test (Hubbard et al., 2011). This has wide-ranging and significant consequences including the physical and emotional burden of unnecessary biopsies and increased healthcare costs (Nelson, Pappas, et al., 2016; Ong & Mandl, 2015). Screening also often misses breast cancer, particularly in women with dense breasts (Banks et al., 2006).

Breast cancer screening requires highly skilled workers, including radiologists and radiographers, of whom there is currently a global shortage (Moran & Warren-Forward, 2012; Rimmer, 2017; Wing & Langelier, 2009). This problem is compounded by the fact that the standard of care in breast screening in many European countries is that each examination is read by two radiologists independently (Giordano et al., 2012) and that, in certain countries, such as the United States, the barriers for qualifying to interpret mammograms are high because of stringent professional certification standards (Food and Drug Administration, 2001).

There are also substantial barriers to the uptake of breast cancer screening worldwide. These include lack of or difficult access to screening programs, lack of knowledge or misunderstanding of the benefits of these programs, and social and cultural barriers (Mascara & Constantinou, 2021).

Role of Artificial Intelligence

Technical Improvements

Few published studies have thus far directly investigated the use of AI to make technical improvements to breast examinations. One commercially available application provides real-time feedback to radiographers on the adequacy of patient positioning on mammograms. (Volpara Health, 2022). Other AI applications have focused on reducing radiation doses (J. Liu et al., 2018), improving image reconstruction (Kim et al., 2016), and reducing noise and artifacts on DBT (Garrett et al., 2018).

DBT is frequently combined with digital mammography for breast cancer screening, which doubles the radiation dose received by the patient (Svahn et al., 2015). To avoid this, there has been increasing interest in generating synthetic mammograms from DBT data (Chikarmane et al., 2023). In a large prospective Norwegian study, the accuracies of DBT combined with either digital mammography or synthetic mammography for breast cancer detection were very similar (Skaane et al., 2019). Recent studies have investigated improving the quality of synthetic mammography using AI, with promising results (Balleyguier et al., 2017; James et al., 2018).

Diagnostic Improvements

Breast Density Assessment

Dense breast tissue that is seen on mammography represents fibroglandular tissue. Women with dense breasts have a 2-to-4-fold higher risk of breast cancer than women with breasts with more fatty breast tissue (Byrne et al., 1995; Duffy et al., 2018; Torres-Mejía et al., 2005). In addition, the sensitivity of mammography for breast cancer is 20–30 % lower in dense breasts than in less dense breasts (Lynge et al., 2019). The standard-of- care in breast density assessment uses the BI-RADS classification (Berg et al., 2000).

Several large studies have investigated the potential for automatic assessment of breast density on mammograms using AI-based tools. A convolutional neural network (CNN) trained on 14,000 mammograms and tested on almost 2000 mammograms classified breast density into either “scattered density” or “heterogeneously dense” with an area-under-the-curve (AUC) of 0.93 (Mohamed et al., 2018). Another study used a CNN capable of both binary and four-way BI-RADS classification and trained on more than 40,000 mammograms (Lehman et al., 2019). In a test dataset of more than 8,000 mammograms, they found good agreement on breast density between the algorithm and individual radiologists (kappa = 0.67) as well as the consensus of five radiologists (kappa = 0.78) (Lehman et al., 2019).

Breast Cancer Detection

In a systematic review that included 82 studies using AI for breast cancer detection with various reference standards, the authors found an AUC of 0.87 for studies using mammography, 0.91 using ultrasound, 0.91 using DBT, and 0.87 using MRI (Aggarwal et al., 2021). These are promising results, however, head-to-head comparisons between AI-based algorithms and radiologists reveal room for improvement. In another systematic review of studies using either histopathology or follow-up (for screen negative women) as a reference, 94 % of the 36 CNNs identified were less accurate than a single radiologist, and all were less accurate than the consensus of 2 or more radiologists when used as a standalone system (Freeman et al., 2021). Current evidence, therefore, does not support the use of AI as a standalone strategy for breast cancer detection.

Breast Cancer Prediction

AI has shown promise for predicting the risk of developing breast cancer based on screening mammograms, either by providing a better assessment of breast density, an established risk factor for breast cancer (Duffy et al., 2018), or detecting subtle imaging features that are harbingers of cancer (Batchu et al., 2021). Several studies have used AI-based models to predict the risk of developing breast cancer in the future based on mammograms (Batchu et al., 2021; Geras et al., 2019).

A CNN trained on almost 1,000,000 mammographic images showed an AUC of 0.65 for predicting future development of breast cancer breast cancer compared to 0.57–0.60 for conventional mammography-based breast density scores (Dembrower, Liu, et al., 2020). A smaller study found an AUC of 0.73 for a CNN-based method to predict breast cancer from normal mammographic images. (Arefan et al., 2020). Another deep learning algorithm showed an AUC of 0.82 for predicting interval cancers (cancers detected within 12 months after a negative mammogram) compared to 0.65 for BI-RADS visual assessment of breast density (Hinton et al., 2019). Another deep-learning-based model that incorporated both risk factors and mammographic findings for predicting breast cancer risk had an AUC of up to 0.7, surpassing the accuracy of predictive models based on risk factors or mammographic findings alone. (Yala, Lehman, et al., 2019).

Efficiency Improvements

The sheer volume of mammographic examinations and the shortage of trained radiologists have made efficiency improvements one of the most interesting areas of research into the use of AI in breast cancer.

In one study, the authors simulated a workflow in which mammograms were interpreted by a radiologist and a deep learning model, with the decision being considered final if both agreed (McKinney et al., 2020). A second radiologist was only consulted in case of disagreement, and this was associated with an 88 % reduction in workload for the second radiologist with a negative predictive value of over 99.9 % (McKinney et al., 2020).

In a first-of-kind large randomized clinical trial from Sweden, approximately 80,000 women were assigned to have their screening mammograms either pre-read by a CNN or not (Lång et al., 2023). In the intervention arm, only mammograms assigned a high likelihood-of-malignancy score were double-read (the rest were read by one radiologist) and the results were compared to conventional double-reading without the help of the algorithm. In an interim analysis of data from the 80,000 women, both study arms showed an identical false-positive rate of 1.5 %. The positive predictive value of recall was 28.3 % in the intervention group and 24.8 % in the control group and the strategy reduced workload by 44.3 % (Lång et al., 2023).

Other studies have used AI to prescreen mammograms, triaging out those with a low likelihood of cancer, and showing only those with a high probability of cancer to a radiologist. One study from the US used a simulated workflow involving a CNN trained on more than 212,000 mammograms and tested on over 26,000 for this purpose (Yala, Schuster, et al., 2019). The workflow using the algorithm had a non-inferior sensitivity to breast cancer (90.1 % vs 90.6 %) and a slightly higher specificity compared to radiologists working alone (94.2 % vs 93.5 %) and was associated with a 19.3 % lower workload (Yala, Schuster, et al., 2019). A smaller study from Spain found a 72.5 % decrease in workload using AI to triage only high risk DBT cases for a second radiologist read and 29.7 % using AI to triage only high risk DBT studies to a second radiologist to read when compared to traditional double read mammography workflows (Raya-Povedano et al., 2021). They also found a non-inferior sensitivity of this strategy of using AI to triage high risk mammography and DBT cases for a second read in comparison to standard double reading mammography and DBT workflows (Raya-Povedano et al., 2021). In a Swedish study, a similar strategy using a commercially available AI algorithm yielded a false-negative rate of no higher than 4 % and the algorithm demonstrated the ability to possibly detect an additional 71 cancers per 1000 examinations more than a negative double-reading by human radiologists in patients noted to be at a very high risk by the AI algorithm (Dembrower, Wåhlin, et al., 2020).

In a study of over one million mammograms across eight screening sites and three device manufacturers, a commercially available deep learning algorithm triaged 63 % to no further workup based on high-confidence assessments of the examinations (Leibig et al., 2022). The rest of the examinations, in which the algorithm’s confidence was low, were shown to radiologists. This strategy improved the sensitivity of the radiologists (compared to unaided reading) by 2.6–4 % and the specificity by 0.5–1.0 % (Leibig et al., 2022).

Challenges and Future Directions

Several ethical, technical, and methodological challenges associated with the use of AI in breast cancer screening provide a framework for guiding future research on this topic (Hickman et al., 2021).

Most AI-based tools have thus far focused on digital mammography (Aggarwal et al., 2021), but other exam techniques such as DBT and MRI have unique advantages (Alsheik et al., 2019; Mann et al., 2019) and are likely to play larger roles in breast cancer screening in the future. However, because DBT and MRI are tomographic techniques producing 3-dimensional outputs, processing them using AI-based tools will require more storage space and computing power (Prevedello et al., 2019).

The incidence, presentation, and outcome of breast cancer are related to several sociodemographic factors, including race and ethnicity (Hirko et al., 2022; Hu et al., 2019; Martini et al., 2022). Training AI-based tools on datasets that represent a diverse population is key to ensuring that they can generalize and benefit as many people as possible.

The overall performance of AI for breast cancer detection has been impressive. However, it is noteworthy that noninferior sensitivity of the AI to that of radiologists for detecting breast cancer could not be proven in one study (Lauritzen et al., 2022). In addition, the quality of evidence behind many studies on this topic is concerning. A systematic review investigating the accuracy of AI-based tools for breast cancer detection identified several areas of potential improvement (Freeman et al., 2021). The review found no prospective studies and the identified studies were of poor methodological quality.

In particular, the authors observed that smaller studies showed more positive results that were not replicated in larger studies. In another systematic review, only about one-tenth of studies used an external dataset for validation, no studies provided a prespecified sample size calculation, and serious issues with selection bias and inappropriate reference standards were identified (Aggarwal et al., 2021). These methodological issues can potentially be mitigated in the future with the introduction of large open data repositories (Nguyen et al., 2023) and increased adherence to guidelines for conducting AI-based medical research (Lekadir et al., 2021; X. Liu et al., 2020).

Conclusion

Integrating artificial intelligence in breast screening programs holds promise for enhancing image quality, improving efficiency, and predicting future breast cancer risk. For detecting breast cancer on screening examinations, evidence suggests that Artificial Intelligence works best when working synergistically with radiologists. Ongoing research is crucial to address challenges associated with the use of AI in breast cancer screening, including expanding its applications beyond mammography and ensuring its ethical and responsible use. With the continuous evolution of AI applications, the future of breast cancer screening holds immense potential for increased accessibility, early intervention, and ultimately, improved outcomes for patients.

References

Aggarwal, R., Sounderajah, V., Martin, G., Ting, D. S. W., Karthikesalingam, A., King, D., Ashrafian, H., & Darzi, A. (2021). Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digital Medicine, 4(1), 65.

Alabousi, M., Zha, N., Salameh, J.-P., Samoilov, L., Sharifabadi, A. D., Pozdnyakov, A., Sadeghirad, B., Freitas, V., McInnes, M. D. F., & Alabousi, A. (2020). Digital breast tomosynthesis for breast cancer detection: a diagnostic test accuracy systematic review and meta-analysis. European Radiology, 30(4), 2058–2071.

Alkabban FM and Ferguson T. (2022). Breast Cancer, 2022. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing.

Alsheik, N. H., Dabbous, F., Pohlman, S. K., Troeger, K. M., Gliklich, R. E., Donadio, G. M., Su, Z., Menon, V., & Conant, E. F. (2019). Comparison of Resource Utilization and Clinical Outcomes Following Screening with Digital Breast Tomosynthesis Versus Digital Mammography: Findings From a Learning Health System. Academic Radiology, 26(5), 597–605.

Arefan, D., Mohamed, A. A., Berg, W. A., Zuley, M. L., Sumkin, J. H., & Wu, S. (2020). Deep learning modeling using normal mammograms for predicting breast cancer risk. Medical Physics, 47(1), 110–118.

Balleyguier, C., Arfi-Rouche, J., Levy, L., Toubiana, P. R., Cohen-Scali, F., Toledano, A. Y., & Boyer, B. (2017). Improving digital breast tomosynthesis reading time: A pilot multi-reader, multi-case study using concurrent Computer-Aided Detection (CAD). European Journal of Radiology, 97, 83–89.

Banks, E., Reeves, G., Beral, V., Bull, D., Crossley, B., Simmonds, M., Hilton, E., Bailey, S., Barrett, N., Briers, P., English, R., Jackson, A., Kutt, E., Lavelle, J., Rockall, L., Wallis, M. G., Wilson, M., & Patnick, J. (2006). Hormone replacement therapy and false positive recall in the Million Women Study: patterns of use, hormonal constituents and consistency of effect. Breast Cancer Research: BCR, 8(1), R8.

Batchu, S., Liu, F., Amireh, A., Waller, J., & Umair, M. (2021). A Review of Applications of Machine Learning in Mammography and Future Challenges. Oncology, 99(8), 483–490. https://doi.org/10.1159/000515698

Bazira, P. J., Ellis, H., & Mahadevan, V. (2022). Anatomy and physiology of the breast. Surgery, 40(2), 79–83. https://doi.org/10.1016/j.mpsur.2021.11.015

Berg, W. A., Campassi, C., Langenberg, P., & Sexton, M. J. (2000). Breast Imaging Reporting and Data System: interand intraobserver variability in feature analysis and final assessment. AJR. American Journal of Roentgenology, 174(6), 1769–1777.

Boyd, N. F., Guo, H., Martin, L. J., Sun, L., Stone, J., Fishell, E., Jong, R. A., Hislop, G., Chiarelli, A., Minkin, S., & Yaffe, M. J. (2007). Mammographic density and the risk and detection of breast cancer. The New England Journal of Medicine, 356(3), 227–236.

Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6), 394–424.

Breast Cancer Statistics. (2020, October 27). Susan G. Komen^®. https://www.komen.org/breast-cancer/facts-statistics/breastcancer- statistics/

Byrne, C., Schairer, C., Wolfe, J., Parekh, N., Salane, M., Brinton, L. A., Hoover, R., & Haile, R. (1995). Mammographic features and breast cancer risk: effects with time, age, and menopause status. Journal of the National Cancer Institute, 87(21), 1622–1629.

Chikarmane, S. A., Offit, L. R., & Giess, C. S. (2023). Synthetic Mammography: Benefits, Drawbacks, and Pitfalls. Radiographics: A Review Publication of the Radiological Society of North America, Inc, 43(10), e230018. https://doi.org/10.1148/rg.230018

Daniel, B. L., & Ikeda, D. M. (2011). Chapter 7 - Magnetic Resonance Imaging of Breast Cancer and MRI-Guided Breast Biopsy. In D. M. Ikeda (Ed.), Breast Imaging (Second Edition) (pp. 239–296). Mosby.

DeMartini W & Lehman C, (2008). Top Magn Reson Imaging, Jun;19(3):143-50.

Dembrower, K., Liu, Y., Azizpour, H., Eklund, M., Smith, K., Lindholm, P., & Strand, F. (2020). Comparison of a Deep Learning Risk Score and Standard Mammographic Density Score for Breast Cancer Risk Prediction. Radiology, 294(2), 265–272.

Dembrower, K., Wåhlin, E., Liu, Y., Salim, M., Smith, K., Lindholm, P., Eklund, M., & Strand, F. (2020). Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study. The Lancet. Digital Health, 2(9), e468–e474.

Dibden, A., Offman, J., Duffy, S. W., & Gabe, R. (2020). Worldwide Review and Meta-Analysis of Cohort Studies Measuring the Effect of Mammography Screening Programmes on Incidence-Based Breast Cancer Mortality. Cancers, 12(4). https://doi.org/10.3390/cancers12040976

Duffy, S. W., Morrish, O. W. E., Allgood, P. C., Black, R., Gillan, M. G. C., Willsher, P., Cooke, J., Duncan, K. A., Michell, M. J., Dobson, H. M., Maroni, R., Lim, Y. Y., Purushothaman, H. N., Suaris, T., Astley, S. M., Young, K. C., Tucker, L., & Gilbert, F. J. (2018). Mammographic density and breast cancer risk in breast screening assessment cases and women with a family history of breast cancer. European Journal of Cancer, 88, 48–56.

Food and Drug Administration. (2001). The Mammography Quality Standards Act Final Regulations: Preparing for MQSA Inspections; Final Guidance for Industry and FDA.

Freeman, K., Geppert, J., Stinton, C., Todkill, D., Johnson, S., Clarke, A., & Taylor-Phillips, S. (2021). Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ, 374, n1872.

Freer, P. E. (2015). Mammographic breast density: impact on breast cancer risk and implications for screening. Radiographics: A Review Publication of the Radiological Society of North America, Inc, 35(2), 302–315.

Garrett, J. W., Li, Y., Li, K., & Chen, G.-H. (2018). Reduced anatomical clutter in digital breast tomosynthesis with statistical iterative reconstruction. Medical Physics, 45(5), 2009–2022.

Geras, K. J., Mann, R. M., & Moy, L. (2019). Artificial Intelligence for Mammography and Digital Breast Tomosynthesis: Current Concepts and Future Perspectives. Radiology, 293(2), 246–259. https://doi.org/10.1148/radiol.2019182627

Giordano, L., von Karsa, L., Tomatis, M., Majek, O., de Wolf, C., Lancucki, L., Hofvind, S., Nyström, L., Segnan, N., Ponti, A., Eunice Working Group, Van Hal, G., Martens, P., Májek, O., Danes, J., von Euler-Chelpin, M., Aasmaa, A., Anttila, A., Becker, N., … Suonio, E. (2012). Mammographic screening programmes in Europe: organization, coverage and participation. Journal of Medical Screening, 19 Suppl 1, 72–82.

Harbeck, N., Penault-Llorca, F., Cortes, J., Gnant, M., Houssami, N., Poortmans, P., Ruddy, K., Tsang, J., & Cardoso, F. (2019). Breast cancer. Nature Reviews. Disease Primers, 5(1), 66. https://doi.org/10.1038/s41572-019-0111-2

Hickman, S. E., Baxter, G. C., & Gilbert, F. J. (2021). Adoption of artificial intelligence in breast imaging: evaluation, ethical constraints and limitations. British Journal of Cancer, 125(1), 15–22.

Hinton, B., Ma, L., Mahmoudzadeh, A. P., Malkov, S., Fan, B., Greenwood, H., Joe, B., Lee, V., Kerlikowske, K., & Shepherd, J. (2019). Deep learning networks find unique mammographic differences in previous negative mammograms between interval and screen-detected cancers: a case-case study. Cancer Imaging: The Official Publication of the International Cancer Imaging Society, 19(1), 41.

Hirko, K. A., Rocque, G., Reasor, E., Taye, A., Daly, A., Cutress, R. I., Copson, E. R., Lee, D.-W., Lee, K.-H., Im, S.-A., & Park, Y. H. (2022). The impact of race and ethnicity in breast cancer-disparities and implications for precision oncology.BMC Medicine, 20(1), 72.

Hong, R., & Xu, B. (2022). Breast cancer: an up-to-date review and future perspectives. Cancer Communications, 42(10), 913–936. https://doi.org/10.1002/cac2.12358

Hubbard, R. A., Kerlikowske, K., Flowers, C. I., Yankaskas, B. C., Zhu, W., & Miglioretti, D. L. (2011). Cumulative probability of false-positive recall or biopsy recommendation after 10 years of screening mammography: a cohort study.Annals of Internal Medicine, 155(8), 481–492.

Hu, K., Ding, P., Wu, Y., Tian, W., Pan, T., & Zhang, S. (2019). Global patterns and trends in the breast cancer incidence and mortality according to sociodemographic indices: an observational study based on the global burden of diseases. BMJ Open, 9(10), e028461.

Ikeda, D. M. (Ed.). (2011a). Chapter 2 - Mammogram Interpretation. In Breast Imaging (Second Edition) (pp. 24–62). Mosby.

Ikeda, D. M. (Ed.). (2011b). Chapter 5 - Breast Ultrasound. In Breast Imaging (Second Edition) (pp. 149–193). Mosby.

James, J. J., Giannotti, E., & Chen, Y. (2018). Evaluation of a computer-aided detection (CAD)-enhanced 2D synthetic mammogram: comparison with standard synthetic 2D mammograms and conventional 2D digital mammography. Clinical Radiology, 73(10), 886–892.

Kalager, M., Zelen, M., Langmark, F., & Adami, H.-O. (2010). Effect of screening mammography on breast-cancer mortality in Norway. The New England Journal of Medicine, 363(13), 1203–1210. https://doi.org/10.1056/NEJMoa1000727

Kim, Y.-S., Park, H.-S., Lee, H.-H., Choi, Y.-W., Choi, J.-G., Kim, H. H., & Kim, H.-J. (2016). Comparison study of reconstruction algorithms for prototype digital breast tomosynthesis using various breast phantoms. La Radiologia Medica, 121(2), 81–92.

Koo, M. M., von Wagner, C., Abel, G. A., McPhail, S., Rubin, G. P., & Lyratzopoulos, G. (2017). Typical and atypical presenting symptoms of breast cancer and their associations with diagnostic intervals: Evidence from a national audit of cancer diagnosis. Cancer Epidemiology, 48, 140–146. https://doi.org/10.1016/j.canep.2017.04.010

Lång, K., Josefsson, V., Larsson, A.-M., Larsson, S., Högberg, C., Sartor, H., Hofvind, S., Andersson, I., & Rosso, A. (2023). Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. The Lancet Oncology, 24(8), 936–944.

Lauritzen, A. D., Rodríguez-Ruiz, A., von Euler-Chelpin, M. C., Lynge, E., Vejborg, I., Nielsen, M., Karssemeijer, N., & Lillholm, M. (2022). An Artificial Intelligence-based Mammography Screening Protocol for Breast Cancer: Outcome and Radiologist Workload.Radiology, 304(1), 41–49.

Lehman, C. D., Yala, A., Schuster, T., Dontchos, B., Bahl, M., Swanson, K., & Barzilay, R. (2019). Mammographic Breast Density Assessment Using Deep Learning: Clinical Implementation. Radiology, 290(1), 52–58.

Leibig, C., Brehmer, M., Bunk, S., Byng, D., Pinker, K., & Umutlu, L. (2022). Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis. The Lancet. Digital Health, 4(7), e507–e519.

Lei, J., Yang, P., Zhang, L., Wang, Y., & Yang, K. (2014). Diagnostic accuracy of digital breast tomosynthesis versus digital mammography for benign and malignant lesions in breasts: a meta-analysis. European Radiology, 24(3), 595–602.

Lekadir, K., Osuala, R., Gallin, C., Lazrak, N., Kushibar, K., Tsakou, G., Aussó, S., Alberich, L. C., Marias, K., Tsiknakis, M., Colantonio, S., Papanikolaou, N., Salahuddin, Z., Woodruff, H. C., Lambin, P., & Martí-Bonmatí, L. (2021). FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Medical Imaging. In arXiv [cs.CV].arXiv. https://arxiv.org/abs/2109.09658

Liu, J., Zarshenas, A., Qadir, A., Wei, Z., Yang, L., Fajardo, L., & Suzuki, K. (2018). Radiation dose reduction in digital breast tomosynthesis (DBT) by means of deep-learning-based supervised image processing. Medical Imaging 2018: Image Processing, 10574, 89–97.

Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J., Denniston, A. K., & SPIRIT-AI and CONSORT-AI Working Group. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nature Medicine, 26(9), 1364–1374.

Łukasiewicz, S., Czeczelewski, M., Forma, A., Baj, J., Sitarz, R., & Stanisławek, A. (2021). Breast Cancer-Epidemiology, Risk Factors, Classification, Prognostic Markers, and Current Treatment Strategies-An Updated Review. Cancers, 13(17). https://doi.org/10.3390/cancers13174287

Lynge, E., Vejborg, I., Andersen, Z., von Euler-Chelpin, M., & Napolitano, G. (2019). Mammographic Density and Screening Sensitivity, Breast Cancer Incidence and Associated Risk Factors in Danish Breast Cancer Screening. Journal of Clinical Medicine Research, 8(11). https://doi.org/10.3390/jcm8112021

Mann, R. M., Cho, N., & Moy, L. (2019). Breast MRI: State of the Art. Radiology, 292(3), 520–536.

Martini, R., Newman, L., & Davis, M. (2022). Breast cancer disparities in outcomes; unmasking biological determinants associated with racial and genetic diversity. Clinical & Experimental Metastasis, 39(1), 7–14.

Mascara, M., & Constantinou, C. (2021). Global Perceptions of Women on Breast Cancer and Barriers to Screening. Current Oncology Reports, 23(7), 74.

McDonald, E. S., Clark, A. S., Tchou, J., Zhang, P., & Freedman, G. M. (2016). Clinical Diagnosis and Management of Breast Cancer. Journal of Nuclear Medicine: Official Publication, Society of Nuclear Medicine, 57 Suppl 1, 9S – 16S. https://doi.org/10.2967/jnumed.115.157834

McKinney, S. M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G. S., Darzi, A., Etemadi, M., Garcia-Vicente, F., Gilbert, F. J., Halling-Brown, M., Hassabis, D., Jansen, S., Karthikesalingam, A., Kelly, C. J., King, D., … Shetty, S. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89–94.

Mohamed, A. A., Berg, W. A., Peng, H., Luo, Y., Jankowitz, R. C., & Wu, S. (2018). A deep learning method for classifying mammographic breast density categories. Medical Physics, 45(1), 314–321.

Moran, S., & Warren-Forward, H. (2012). The Australian BreastScreen workforce: a snapshot. The Radiographer, 59(1), 26–30.

Nelson, H. D., Fu, R., Cantor, A., Pappas, M., Daeges, M., & Humphrey, L. (2016). Effectiveness of Breast Cancer Screening: Systematic Review and Meta-analysis to Update the 2009 U.S. Preventive Services Task Force Recommendation. Annals of Internal Medicine, 164(4), 244–255.

Nelson, H. D., Pappas, M., Cantor, A., Griffin, J., Daeges, M., & Humphrey, L. (2016). Harms of Breast Cancer Screening: Systematic Review to Update the 2009 U.S. Preventive Services Task Force Recommendation. Annals of Internal Medicine, 164(4), 256–267.

Nguyen, H. T., Nguyen, H. Q., Pham, H. H., Lam, K., Le, L. T., Dao, M., & Vu, V. (2023). VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. Scientific Data, 10(1), 277.

Ong, M.-S., & Mandl, K. D. (2015). National expenditure for false-positive mammograms and breast cancer overdiagnoses estimated at $4 billion a year. Health Affairs , 34(4), 576–583.

Prevedello, L. M., Halabi, S. S., Shih, G., Wu, C. C., Kohli, M. D., Chokshi, F. H., Erickson, B. J., Kalpathy-Cramer, J., Andriole, K. P., & Flanders, A. E. (2019). Challenges Related to Artificial Intelligence Research in Medical Imaging and the Importance of Image Analysis Competitions. Radiology. Artificial Intelligence, 1(1), e180031. https://doi.org/10.1148/ ryai.2019180031

Raya-Povedano, J. L., Romero-Martín, S., Elías-Cabot, E., Gubern-Mérida, A., Rodríguez-Ruiz, A., & Álvarez-Benito, M. (2021). AI-based Strategies to Reduce Workload in Breast Cancer Screening with Mammography and Tomosynthesis: A Retrospective Evaluation. Radiology, 300(1), 57–65.

Rimmer, A. (2017). Radiologist shortage leaves patient care at risk, warns royal college. BMJ, 359, j4683.

Skaane, P., Bandos, A. I., Niklason, L. T., Sebuødegård, S., Østerås, B. H., Gullien, R., Gur, D., & Hofvind, S. (2019). Digital Mammography versus Digital Mammography Plus Tomosynthesis in Breast Cancer Screening: The Oslo Tomosynthesis Screening Trial. Radiology, 291(1), 23–30.

Sung H et al,(2021). CA Cancer J Clin, 2021 May;71(3):209-249. doi: 10.3322/caac.21660

Tabár, L., Dean, P. B., Chen, T. H.-H., Yen, A. M.-F., Chen, S. L.- S., Fann, J. C.-Y., Chiu, S. Y.-H., Ku, M. M.-S., Wu, W. Y.-Y., Hsu, C.-Y., Chen, Y.-C., Beckmann, K., Smith, R. A., & Duffy, S. W. (2019). The incidence of fatal breast cancer measures the increased effectiveness of therapy in women participating in mammography screening. Cancer, 125(4), 515–523.

Svahn, T. M., Houssami, N., Sechopoulos, I., & Mattsson, S. (2015). Review of radiation dose estimates in digital breast tomosynthesis relative to those in two-view full-field digital mammography. Breast, 24(2), 93–99. https://doi.org/10.1016/j. breast.2014.12.002

Tirada, N., Li, G., Dreizin, D., Robinson, L., Khorjekar, G., Dromi, S., & Ernst, T. (2019). Digital Breast Tomosynthesis: Physics, Artifacts, and Quality Control Considerations. Radiographics: A Review Publication of the Radiological Society of North America, Inc, 39(2), 413–426.

Torres-Mejía, G., De Stavola, B., Allen, D. S., Pérez-Gavilán, J. J., Ferreira, J. M., Fentiman, I. S., & Dos Santos Silva, I. (2005). Mammographic features and subsequent risk of breast cancer: a comparison of qualitative and quantitative evaluations in the Guernsey prospective studies. Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology, 14(5), 1052–1059.

Volpara Health. (2022). TruPGMI: AI for mammography quality improvement. https://www.volparahealth.com/breast-healthsoftware/ products/analytics/

Wing, P., & Langelier, M. H. (2009). Workforce shortages in breast imaging: impact on mammography utilization. AJR. American Journal of Roentgenology, 192(2), 370–378.

Yala, A., Lehman, C., Schuster, T., Portnoi, T., & Barzilay, R. (2019). A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. Radiology, 292(1), 60–66.

Yala, A., Schuster, T., Miles, R., Barzilay, R., & Lehman, C. (2019). A Deep Learning Model to Triage Screening Mammograms: A Simulation Study. Radiology, 293(1), 38–46.

Artificial Intelligence in medical imaging: What, How and Why?

Introduction

Artificial intelligence (AI) is a field that enables computer systems to solve problems by adapting to changing circumstances, often by mimicking human reasoning and judgement. Several demographic and healthcare trends are driving the use of AI in medical imaging. The amount of medical imaging data being acquired is steadily increasing (Larson et al., 2011; Smith-Bindman et al., 2008, 2012; Winder et al., 2021). There is also a widespread shortage of healthcare workers (Core Health Indicators in the WHO European Region 2015. Special Focus: Human Resources for Health, 2017) with an ever-increasing workload (Levin et al., 2017), and the number of medical imaging examinations is expected to grow exponentially over the next two decades (Tsao, 2020). Radiologists and radiology technologists are in particularly scarce supply (AAMC Report Reinforces Mounting Physician Shortage, 2021, Clinical Radiology UK Workforce Census 2019 Report, 2019). Finally, the ageing world population (Population Ages 65 and above, n.d.; WHO, n.d.-a) and an increasing global burden of chronic illnesses (WHO, n.d.-b) are expected to compound these problems in the near future.

Broadly speaking, the advantages of AI in medical imaging could potentially include the ability to provide insights that would otherwise not be possible using traditional methods (such as humans looking at images) and to may do so in a faster and automated way (without the need for human interaction). AI-based solutions in medical imaging could improve and accelerate the detection of disease, generate in-depth risk assessment of disease development and progression, and may reduce subjectivity in the interpretation of medical imaging data.

The current state of AI in medical imaging

Over the past few years, the landscape of AI in medical imaging has changed dramatically. Many promising applications have arisen, the field has seen an unprecedented surge in funding, and we have seen positive trends in the adoption of AI solutions by radiologists, as well as their approval by regulatory bodies.

Applications

Although radiology departments provide a plethora of services, the core service provided is the imaging study. Applications of AI in medical imaging can therefore be categorized into those applied either before, during, or after the imaging study.

Before Image Acquisition

Several steps have to take place within the context of a radiology department’s workflow before a patient is undergoing imaging study. AI applications that aim to improve these steps are referred to as “upstream AI” and could potentially increase efficiency and provide more personalized decision making in a radiology department.

Missed medical appointments are common, reduce the efficiency of hospitals, and waste resources (Dantas et al., 2018). Studies from Japan (Kurasawa et al., 2016) and the United Kingdom (Nelson et al., 2019) have shown that AI can be used to predict no-shows with high accuracy. This allows the use of targeted strategies to reduce the likelihood of a patient missing their appointment, including sending automated reminders.

One of the most important decisions made in the radiology department is the exact scan protocol to use on a given patient. While this applies to all imaging modalities, the widest range of choice is seen with magnetic resonance imaging (MRI). This includes choosing the appropriate set of sequences and making decisions about whether or not to administer intravenous contrast agents. Natural language classifiers that interpret the narrative text of the clinician’s scan requests have been used to select appropriate MRI protocols. In one study, a gradient boosting classifier predicted the appropriate MRI brain protocol to use based on the scan request with high accuracy (95 %) (Brown & Marotta, 2018). For musculoskeletal MRI, a deep learning classifier was 83 % accurate in determining the need for a contrast agent (Trivedi et al., 2018). Such applications can substantially improve efficiency by foregoing the time-consuming task of radiologists going through unstructured narrative scan requests written by referring clinicians.

During Image Acquisition

Substantial improvements have recently been made in the use of AI for improving image quality. In a recent survey, radiologists identified the enhancement of image quality as being the most mainstream current use case for AI in medical imaging (Alexander et al., 2020). While earlier attempts at reducing image noise using deep learning techniques were criticized for removing details from the images that jeopardized the visibility of essential features within the images, more recent implementations have made this issue largely obsolete.

In particular, deep learning techniques like generative adversarial networks have shown great potential in image denoising (Wang et al., 2021). Some of these applications target the image reconstruction stage (where the raw sensor data is converted into an interpretable image) providing superior signal-to-noise ratios and reducing image artefacts (Zhu et al., 2018). In lung cancer screening, deep-learning-based image denoising improved both the image quality and the diagnostic accuracy of ultra-low-dose computed tomography (CT) for detecting suspicious lung nodules (Hata et al., 2020; Kerpel et al., 2021). Scans that were 40-60 % acquired faster than standard scans and enhanced with deep-learning-based algorithms were of better image quality than, and similar diagnostic value as, standard scans of the brain (Bash, Wang, et al., 2021; Rudie et al., 2022) and spine (Bash, Johnson, et al., 2021). Similarly, convolutional neural networks can be used to reduce specific CT and MRI artefacts and improve spatial resolution (Hauptmann et al., 2019; K. H. Kim & Park, 2017; Park et al., 2018; Y.Zhang & Yu, 2018).

Reconstruction algorithms based on deep learning have enabled ultra-low-dose computed tomography scans to be acquired while maintaining diagnostic quality. This is of particular benefit in children and pregnant women, where reduction of radiation dose to the absolute minimum is critical. These deep- learning-based CT image reconstruction approaches are associated with lower image noise and better image texture than state-of-the-art alternatives like iterative reconstruction (Higaki et al., 2020; McLeavy et al., 2021; Singh et al., 2020). In positron emission tomography, deep learning can reduce injected tracer dosage by one-third and scan times by up to half while maintaining scan quality (Katsari et al., 2021; Le et al., 2020; Xu et al., 2020).

After Image Acquisition

Radiology technologists and radiologists usually share the task of calling back patients for repeat examinations, but doing so consistently and reliably is exceedingly difficult due to time constraints. Image quality of AI enhanced brain MRI scans has been shown to be equal to or better than conventional scans, even when using acquisition protocols that reduce scan times by 45-60% (Schreiber-Zinaman & Rosenkrantz, 2017).

Prioritizing scan reading on a radiologist’s worklist is often done based on several factors including the type of scan, the referring department, and direct communication with the radiologist about the scan’s urgency. Several approaches have been tested to influence the order in which scans are read to improve efficiency and ensure the most critical scans are seen first. These include assigning different radiologists specific exams based on how quickly they read certain scan types (Wong et al., 2019) and automatically detecting emergent findings on the images and pushing these cases to the “top of the list” (Prevedello et al., 2017; Winkel et al., 2019).

About 70 % of all AI-based solutions in radiology focus on “perception” - a category of functionalities that includes segmentation, feature extraction, as well as detection and classification of pathology (Rezazade Mehrizi et al., 2021). Within this category, the majority of tools extract information from the imaging data with or without quantification as well as draw the user’s attention to potential pathology (Rezazade Mehrizi et al., 2021; van Leeuwen et al., 2021). Over the past few years, some of the most promising applications in this category have included the detection of brain vessel occlusion, brain haemorrhage, lung nodules, pneumothorax and pleural effusions, fractures, and the characterization of breast lesions.

Funding

The total amount of investment in AI-based medical imaging companies amounted to $ 1.17 billion between 2014 and 2019 (Alexander et al., 2020). In the same period, the number of companies in this space tripled, leading to a drop of almost 30 % in the median investment in each company (Alexander et al., 2020). Between 2019 and 2020, private investment in AI companies increased by 9.3 % (D. Zhang et al., 2021). By 2030, investment in AI-based solutions in medical imaging is expected to exceed $3 billion (Tsao, 2020).

Adoption

There have been positive trends in the adoption of AI tools by radiologists and radiology technologists over the past few years. Between 2015 and 2020, AI use in radiology departments went up by 30 %, according to a survey of 1,861 radiologists conducted by the American College of Radiology (ACR) (Allen et al., 2021).

Despite this promising trend, the adoption of AI tools is widely considered to be disproportionately low relative to the amount of funding, the number of companies, and the perceived promise of these tools. The ACR survey provides some insight into why and offers a starting point for developing strategies to improve AI adoption.

Almost three-quarters of radiologists who were not using AI had no plans to do so in the future because they either were not convinced of its benefits or did not think the associated costs were justified (Allen et al., 2021).

Similar results have been found in other studies, with radiologists citing scepticism in the capabilities of AI tools and the fact that relatively few have regulatory approval as reasons for not adopting them in their practice (Alexander et al., 2020).

Regulatory success

Until August 2019, 60 % of available AI-based radiology solutions had no regulatory approval (Rezazade Mehrizi et al., 2021). As of April 2020, a total of 100 AI solutions had a CE mark, a prerequisite for them to be made commercially available as medical devices in Europe (van Leeuwen et al., 2021). As of the time of writing, more than 150 AI solutions have gained FDA clearance (AI Central, n.d.). Several useful databases of approved or cleared AI-based solutions in medical healthcare are currently available (AI Central, n.d., AI for Radiology, n.d., Medical AI Evaluation, n.d., The Medical Futurist, n.d.)

The future of AI in medical imaging

The past few years have seen exponential growth in the interest in AI in medical imaging, both in terms of the amount of research and the amount of money being invested in the field. This interest runs the gamut of the radiology workflow, but “perception” applications - for the quantification of biomarkers and the detection of disease processes - have dominated so far. In the radiology community, trends have shifted from AI being perceived as an unwelcome intruder to increased adoption, albeit with some scepticism and hesitation regarding its value. The first AI solutions in medical imaging were granted regulatory approval, and we have seen the first indications of how such solutions may be reimbursed.

New directions

With increasing acknowledgement that a large proportion of AI’s potential in medical imaging may lie in “upstream” or “non-interpretative” applications, the field is likely to expand its focus in the coming years. This will include more research into applications that improve the efficiency of radiology workflows and provide more personalized patient care (Alexander et al., 2020). AI is likely to become more involved even earlier in the patient management process - i.e. before the clinician decides that a diagnostic image test is necessary. Such applications, essentially clinical decision support systems, have successfully been used for decision-making about treatments in several settings (Bennett & Hauser, 2013; Komorowski et al., 2018), successfully used in treatment decision making (Bennett & Hauser, 2013). In the future, AI solutions may draw clinicians’ attention to the need for further imaging tests based on reviewing the patient’s clinical information, laboratory tests, and prior imaging tests (Makeeva et al., 2019).

The vast majority (77-84 %) of currently available AI solutions in medical imaging target CT, MRI and plain radiographs (Rezazade Mehrizi et al., 2021; van Leeuwen et al., 2021). Nuclear imaging techniques, such as positron emission tomography (PET). provide unique information not readily gained from other modalities. PET has thus far been largely neglected in terms of AI research (Rezazade Mehrizi et al., 2021; van Leeuwen et al., 2021), and is thus a potentially promising avenue for the field’s expansion.

AI research is also expected to undergo a shift in the type of data being used. The typical inpatient receives more than one imaging study during their hospital stay (Shinagare et al., 2014). Despite this, only about 3 % of current AI-based radiology solutions combine data from multiple modalities (Rezazade Mehrizi et al., 2021; van Leeuwen et al., 2021). Combining data from multiple imaging sources may improve the diagnostic capabilities of AI solutions. Furthermore, future AI solutions in radiology are likely to combine imaging information, clinical information, as well as non- imaging diagnostic tests (Huang et al., 2020). By doing this, AI solutions may be able to identify patterns in the data collected during a patient’s hospital stay that may not be readily identifiable by healthcare workers (Rockenbach, 2021). This could ultimately lead to more accurate diagnoses and could help inform better and more personalized treatment decisions.

The expectations for AI-based medical imaging solutions are also likely to shift from the current focus of triage, image enhancement and automation. With increasing algorithmic complexity, data availability, and experience with these tools, this shift may lead to AI solutions reaching specific diagnoses and recommending specific steps in a patient’s management plan. Similar to how the introduction of the first AI tools for image screening and processing around 2018 spurred investment in the field, marketing analyses predict a similar investment boost in the next few years as AI tools providing specific diagnoses and management steps become more widespread (Michoud et al., 2019).

One important criticism of the current, arguably still nascent, landscape of AI in medical imaging is that it is too fragmented. Radiology professionals would likely welcome a more streamlined integration of AI solutions in their daily workflow. This includes seamless integration of these solutions into established radiology workflows, with as much as possible happening “in the background” without user input. Furthermore, the outputs of these solutions could be integrated into available radiological information systems. Consequently, the field could move from the plethora of currently available niche AI solutions, each targeted towards a single very specific application, to broader software suites that perform many different functions for a given imaging modality or body region.

The fragmented investment in the AI in medical imaging market (Alexander et al., 2020) fosters innovation, allowing many players to test out different strategies in this emerging field. However, in the long term, consolidation may increase adoption and stimulate the kind of seamless integration into existing workflows that is needed, allowing fewer companies to offer these solutions at scale (Alexander et al., 2020).

Challenges

Quality and reporting of evidence

In a review of 100 CE-marked AI solutions, 64 % of them had no peer-reviewed scientific evidence for their efficacy (van Leeuwen et al., 2021). Where there was scientific evidence, the level was low, rarely exceeding the demonstration of diagnostic accuracy (van Leeuwen et al., 2021). Another systematic review of the evidence for deep learning algorithms in medical imaging found a generally high diagnostic accuracy, albeit with a high risk of bias across studies (Aggarwal et al., 2021). The main sources of bias include the lack of external validation (D. W. Kim et al., 2019; Liu et al., 2019), insufficiently detailed reporting of results (Liu et al., 2019), retrospective study design (Nagendran et al., 2020), and the inaccessibility of data and code to reviewers and readers (Nagendran et al., 2020).

Overall, studies on AI tools have shown a worrying lack of standardized reporting and adherence to recommended reporting guidelines (Aggarwal et al., 2021; Yusuf et al., 2020). This is despite the fact that several extensions to established reporting guidelines, as well as AI-specific guidelines, are currently available (Shelmerdine et al., 2021). Widespread implementation of these guidelines should be a focus of AI developers in the future.

AI developers should also be cognizant that the currently “acceptable” level of evidence for AI-based solutions is likely to become obsolete in the near future. Both regulators and potential users will likely demand higher levels of evidence for these solutions, akin to the evidence required for new pharmaceutical drugs. In the next few years, will see more of these AI solutions being tested in randomized clinical trials. In the more distant future, it is plausible that such expectations will go beyond providing evidence of the safety, efficacy, or diagnostic performance of these solutions, to the demonstration that they provide added monetary or societal value.

Rising up to the challenge of improving the quality and reporting of evidence for AI-based solutions may pay off in the long run. It could reduce the risk of bias in AI studies, could allow the thorough and transparent assessment of study quality by potential users and regulators, and could facilitate systematic reviews and meta-analyses. These steps may increase the trust in, and uptake of, AI-based solutions and ensure that they offer realistic, sustainable improvements in people’s lives.

Regulation

Several aspects inherent to AI pose challenges to attempts at regulating it like other healthcare interventions. The inner workings of AI solutions are often opaque and difficult to comprehensively describe in a manner traditionally expected by regulatory bodies.

The past few years have shown us that these regulatory challenges are far from intractable. Both the Food and Drug Administration and the European Commission have recently proposed initial regulatory frameworks for AI solutions (Center for Devices & Radiological Health, 2021; European Commission, 2021).

In part as a response to the transparency necessary for regulatory approval, researchers have made substantial progress in making AI’s decision-making more understandable and explainable. This movement towards “interpretable AI” will gain further impetus in the near future as reliance on AI for real-world clinical decision-making increases.

This has many advantages, including making regulatory approval easier, increasing trust in these solutions by users, minimizing biases, and improving the reproducibility of these solutions (Holzinger et al., 2017; Kolyshkina & Simoff, 2021; “Towards Trustable Machine Learning,” 2018; Yoon et al., 2021).

Data privacy

From development and testing to implementation, AI solutions in medical imaging require access to patient data. This has raised concerns about data privacy, which is a multifaceted and highly complex issue (Murdoch, 2021) that is prominently represented in the regulatory pathways of different countries (COCIR, the European Coordination Committee of the Radiological, Electromedical and Healthcare IT Industry, 2020). Suggested solutions to the data privacy question have ranged from those focusing on oversight to more technical approaches.

The patients providing the data have to be made aware that they are doing so, as well as be informed about why and how their data will be used (Lotan et al., 2020), as explicitly stipulated in the EU’s General Data Protection Regulation (GDPR) (General Data Protection Regulation (GDPR) – Official Legal Text, 2016). Considering the fast-paced nature of the development of AI solutions, whether patients can be kept sufficiently informed as these algorithms are continuously retrained has been questioned (Kritikos, 2020). While fully anonymized data is not subject to such strict requirements under the GDPR (What Is Personal Data?, 2021), anonymization is exceedingly difficult to achieve for medical imaging data.

The data privacy issue will have to be approached on several fronts. In addition to legislation governing the use of patient data, it is becoming increasingly clear that everyone involved in the development and use of AI solutions - developers, payers, regulatory bodies, researchers and radiologists - has a role to play in ensuring that the data is protected and used responsibly.

Moreover, the next few years will likely see further research into technical approaches to strengthen data protection. These include better ways to reduce the chances of data being traced back to individuals, methods for keeping sensitive data stored locally even when the algorithm being trained is hosted in some “central” location, data perturbation to minimize the information within a given dataset pertaining to individual patients, and data encryption (G. Kaissis et al., 2021; G. A. Kaissis et al., 2020).

Democratization

If AI in medical imaging is to live up to its potential, the algorithms being developed have to work for everyone. This “democratization” of AI involves ensuring that healthcare providers have the knowledge and skills needed to use AI-based solutions. With a few exceptions (Paranjape et al., 2019), medical student curricula currently include little to no dedicated education about AI (Banerjee et al., 2021; Blease et al., 2022). Surveys from around the world have shown that medical students’ and doctors’ (Ahmed et al., 2022; Bisdas et al., 2021; Collado-Mesa et al., 2018; Kansal et al., 2022; Pinto Dos Santos et al., 2019; Sit et al., 2020) exposure to AI during training was low despite the high demand for more AI education (Kansal et al., 2022; Ooi et al., 2021; Sit et al., 2020). In addition, there are still large differences between genders and countries in the perceived knowledge about AI amongst medical students (Bisdas et al., 2021). There are many reasons for these differences and many challenges associated with the widespread integration of AI education into healthcare training curricula. In the coming years, strategies to tackle these issues should be investigated to ensure that future healthcare providers are equipped with the knowledge and skills they need to work in an environment where AI plays a growing role.

Democratization also involves ensuring that patients of different genders, lifestyles, ethnicities, and geographical locations can benefit from AI-based solutions. For this to happen, these solutions have to be accessible and their performance generalizable. The latter requires the acquisition of diverse data from multiple institutions, preferably from multiple countries, for training AI-based solutions. It also requires the implementation of safeguards to ensure that sources of bias throughout the development process are not propagated to the trained algorithm (Vokinger et al., 2021), an issue that has only recently come to the forefront (Larrazabal et al., 2020; Obermeyer et al., 2019; Seyyed-Kalantari et al., 2021).

Reimbursement

As countries’ policies for regulating AI in healthcare gradually begin to take shape, one important aspect that needs attention is who will pay for these AI solutions, and according to what framework.

Many consider Germany’s 2020 Digital Supply Act a step in the right direction for reimbursement of digital health solutions. Under this policy, digital applications prescribed by physicians are reimbursable by statutory health insurance if they are proven to be safe, be compliant with data privacy statutes, and improve patient care. The UK, on the other hand, has released a guide for potential buyers of AI-based solutions, which serves as a starting point for companies to prepare for reimbursement applications (A Buyer’s Guide to AI in Health and Care, 2020).

Thus far, reimbursement success stories in the digital health space have been few and far between (Brink- mann-Sass et al., 2020; Hassan, 2021). This is in part due to requirements varying greatly by country (COCIR, the European Coordination Committee of the Radiological, Electromedical and Healthcare IT Industry, 2020). In general, providers of digital health solutions will need to provide evidence for the overall value that these solutions bring, including detailed health economics studies showing potential cost savings.

Radiology’s position as a service provider to multiple hospital departments means that AI-based solutions in this space will be expected to show a far-reaching impact (van Duffelen, 2021). Companies will need to show short-term value (e.g. faster/better image reading and reporting) as well as long-term value (e.g. early diagnosis and treatment, disease prevention, reduction in unnecessary follow-up). The coming years will see companies compete to demonstrate such impact, while at the same time experimenting with different pricing models and navigating the evolving bureaucratic reimbursement landscape.

Conclusion

Over the past few years, the field of AI in medical imaging has undergone a rapid but steady transformation. AI can now achieve things in radiology that few people thought possible a mere decade ago. The field is also gradually overcoming one of its most significant perceived hurdles - regulatory approval. In addition, while fear and scepticism dominated radiologists’ perception of the future of AI in their speciality a few years ago, this is no longer the case.

The massive progress and interest in the field of AI in medical imaging is expected to continue into 2022 and beyond. Several exciting transformations await the field - it will likely expand its focus in the coming years to improve radiology workflow efficiency, involve hitherto neglected imaging modalities, combine data from multiple modalities, and provide more concrete diagnostic predictions and management recommendations. Easy-to-use and comprehensive software suites utilizing AI will be incorporated into existing radiology workflows, making radiologists’ and radiographers’ work easier and more efficient.

As in any rapidly growing field, several scientific, regulatory, and economic challenges face AI in medical imaging. But the past few years have shown us that even the most difficult problems can be solved. Developers and users of AI-based solutions need to be aware of these issues so that they can adapt their strategies to changing expectations on a regulatory and societal level. Doing this will allow them to thrive in a fascinating field with the potential to improve virtually every aspect of healthcare.