The AI market is developing fast. Yet data is expensive and acquiring the right data at the right time is time-consuming. Most delays in AI development are caused by the lack of available data, specifically the right data and format.
Speaker: Subrata Bose, Digital Diagnostics, Bayer Radiology
80% of the AI development cycle is spent on data collection and preparation. The competitive edge lies with companies who can shorten this process.
Subrata Bose, Digital Diagnostics, Bayer Radiology, structured the AI developmental process in seven key steps and highlighted the many challenges to overcome.
Data Collection and Preparation
The first three stages (blue in the diagram) are responsible for 80% of the developmental time. Lack of the right data and format is a crucial factor and makes the start of the developmental process really complex.. Other hindering factors are the use of differing hospital image repositories and electronic medical records, as well as the lack of a smart search interface to navigate through the multiple hospital systems.
- Define your objectives. “You must understand the data you need to train and validate the algorithm with. Define a specific objective closely connected to your endpoints. Do not change this objective throughout the development, otherwise you’ll end up with different data requirements”, underlined Bose.
- Collecting and anonymizing the data. “The patient’s data is stored in a different system than the images – EMR and PACS. And not all hospitals use the same EMR and PACS”, said Bose. It therefore needs expertise and interface knowledge to collect the data. Bose recommended partnering with organizations that understand what kind of data is needed and how to retrieve it.
- Annotate the data. “It is here that you need expert medical knowledge”, said Bose.
Data and Clinical Validation
Data is one component. Another is validation. These make up the next four steps (yellow in the figure). Data science, clinical development and deployment are the key processes here. “Make sure to put the right validation in place. High variability and overgeneralization of data sets are problematic”, Bose said.
- Model development
- Validation. “ An important aspect is that you need to understand the source of data that is used in training and validation. Does it really reflect your population?” said Bose.
- Several steps are implied here:
- Training and data validation. “While training your algorithm, you can optimize various parameters. You should use a validation data set to optimize your results”, Bose recommended. This step is quite fast.
- Internal validation works with selected sample data.
- Testing of AI model
- Clinical validation
Questions to be answered during the clinical validation process:
- Valid clinical association: Is there a valid clinical association between the algorithm´s output and the targeted clinical condition? Use literature, original clinical research, professional society guidelines, secondary data analysis, and/or clinical trials to answer this question.
- Analytical validation: Is there objective evidence that the algorithm works? Check whether the ouput data is accurate, reliable, and precise. “You do not want to end up with a lot of false negative or false positive predictions”, Bose said.
- Clinical validation: Does the output apply to your targeted audience? Does the software achieve the intended outcome? “You can compare this validation step with phase one or phase two clinical trials”, Bose explained. “The performance and the safety of AI is assessed with external data.”
- Regulatory approval. These hurdles are covered in greater detail by Hortense Allison, VP Head of Regulatory Digital, Bayer Radiology, in this session (Link).
- Clinical integration. “Then the next challenge begins: Installing the AI application into the radiology practice. Each practice has a slightly different workflow and IT system configuration”, Bose said. “So, installation is a big step that needs expertise and knowledge of the different interfaces.”
Conclusion
Bose gave insights into the process of collecting and validating data for AI development and clinical integration.
Title: Adequate data sources, preparation and importance of clinical validation
Location: RSNA 2022
Date: 28 November 2022
RELATED RESOURCES