The incorporation of AI in medical devices has made great strides, for example, in the diagnosis of disease. Manufacturers of devices with machine learning face the challenge of having to demonstrate compliance of their devices with the regulations.
Even if you know the law - what are the standards and best practices to consider in order to provide the evidence and speak to authorities and notified bodies on the same level?
This article provides an overview of the most important regulations and best practices that you should consider. You can save yourself the trouble of researching and reading hundreds of pages and be perfectly prepared for the next audit.
Currently, there are no laws or harmonized standards that specifically regulate the use of machine learning in medical devices. However, it appears that these devices must comply with existing regulatory requirements such as MDR and IVDR, e.g.:
On its webpage New rules for Artificial Intelligence - Questions and Answers the EU announces its intention to regulate AI in all sectors on a risk-based basis and to monitor compliance with the regulations. You can find the Proposal for a Regulation laying down harmonized rules on artificial intelligence (Artificial Intelligence Act) here.
The EU sees the opportunities of using AI but also its risks. It explicitly mentions healthcare and wants to prevent fragmentation of the single market through EU-wide regulation.
Regulation should affect both manufacturers and users. The regulations to be implemented depend on whether the AI system is classified as a high-risk AI system or not. Devices covered by the MDR or IVDR that require the involvement of a notified body are considered high-risk.
One focus of regulation will be on remote biometric identification.
The planned regulation also provides for monitoring and auditing; in addition, it extends to imported devices. The EU plans for this regulation to interact with the new Machinery Regulation, which would replace the current Machinery Directive.
A special committee is to be established: "The European Committee on Artificial Intelligence is to be composed of high-level representatives of the competent national supervisory authorities, the European Data Protection Supervisor, and the Commission".
As always, the EU emphasizes that the location should be promoted, and SMEs should not be unduly burdened. These statements can also be found in the MDR. But none of the regulations seem to meet this requirement.
Content of the AI Regulation
Artificial Intelligence procedures include not only machine learning but also:
"- Logic and knowledge-based approaches, including knowledge representation, inductive (logic) programming, knowledge bases, inference and deduction engines, (symbolic) reasoning and expert systems;
- Statistical approaches, Bayesian estimation, search, and optimization methods."
This ready scope may result in many medical devices containing software falling within the scope of the AI Regulation. Accordingly, any decision tree cast in software would be an AI system.
The AI Regulation explicitly addresses medical devices and IVDs.
There is a duplication of requirements. MDR and IVDR already require cybersecurity, risk management, post-market surveillance, a notification system, technical documentation, a QM system, etc. Manufacturers will soon have to demonstrate compliance with two regulations!
The regulation applies regardless of what the AI is used for in the medical device.
Even an AI that is intended to realize the lower-wear operation of an engine would fall under EU regulation. As a consequence, manufacturers will think twice before making use of AI procedures. This can have a negative impact on innovation but also on the safety and performance of devices. This is because, as a rule, manufacturers use AI to improve the safety, performance, and effectiveness of devices. Otherwise, they would not be allowed to use AI at all.
The regulation requires:
"High-risk AI systems shall be designed and developed in such a way that they can be effectively supervised by natural persons during the period of use of the AI system, including with appropriate tools of a human-machine interface."
This requirement rules out the use of AI in situations in which humans can no longer react quickly enough. Yet, it is precisely in these situations that the use of AI could be particularly helpful.
If we have to place a person next to each device to "effectively supervise" the use of AI, this will mean the end of most AI-based products.
The regulation defines the crucial term "safety component" by using the undefined term of a safety function:
A "safety component of a product or system" is a "component of a device or system that performs a safety function for that device or system, or whose failure or malfunction endangers the health and safety of persons or property;"
The AI Regulation also does not define other terms in accordance with the MDR, e.g., "post-market monitoring" or "serious incident."
There will be disputes about what constitutes a safety function. For example, it could be a function that puts the safety of patients at risk if it behaves out of specification. But it could also mean a function that implements a risk-minimizing measure.
Definitions that are not aligned increase the effort required by manufacturers to understand and align the various concepts and associated requirements.
A device counts as a high-risk AI system if both of the following conditions are met:
"(a) the AI system is intended to be used as a safety component of a device covered by Union harmonization legislation listed in Annex II or is itself such a device;
(b) the device of which the AI system is the safety component, or the AI system itself as a device, shall be subject to a third-party conformity assessment with regard to placing on the market or putting into service that device in accordance with the Union harmonization legislation listed in Annex II."
Medical devices are covered by the regulations listed in Annex II, as they mention the MDR and IVDR. Medical devices of class IIa and higher must undergo a conformity assessment procedure. Does this make them high-risk devices?
The unfortunate Rule 11 classifies software - regardless of risk - into Class IIa or higher in the vast majority of cases. This means that medical devices are subject to the extensive requirements for high-risk products. The negative effects of Rule 11 are reinforced by the AI Regulation.
In Article 10, the AI Regulation requires
Real-world data is rarely error-free and complete. It also remains unclear what is meant by "complete." Do all data sets have to be present (whatever that means) or all data of a data set?
Article 64 of the AI Regulation requires manufacturers to provide authorities with full remote access to training, validation, and testing data, even through an API.
Making confidential patient data accessible via remote access is in dispute with the legal requirement of data protection by design. Health data belongs to the personal data category that requires special protection.
Developing and providing an external API to the training data means an additional effort for the manufacturers.
That authorities with this access can download, analyze, and evaluate the data or AI with reasonable effort and time is unrealistic.
For other, often even more critical data and information on product design and production (e.g., source code or CAD drawings), no one would seriously require manufacturers to provide remote access to authorities.
Tab. 1: Demands of the AI Act
The Johner Institute had submitted a statement with these concerns to the EU.
A new draft of the AI Act has been available since October 2022. This resolves many inconsistencies and removes unclear requirements. But criticisms that we also reported to the EU remain:
The fact that now, of all things, public institutions such as law enforcement agencies are to be exempted from the obligation to register AI-based devices could generate suspicion.
However, it is to be welcomed that AI systems that are used exclusively for research are exempt from regulation.
The EU was able to reach a compromise by the end of 2023. This compromise is now available in an almost 900-page document that is still difficult to read.
The MDR and IVDR allow proof of conformity to be provided with the aid of harmonized standards and common specifications. In the context of medical devices that use machine learning procedures, manufacturers should pay particular attention to the following standards:
These standards contain specific requirements that are also relevant for medical devices with machine learning, e.g.:
Please consider the article on validation of ML libraries.
The FDA has similar requirements, especially in 21 CFR part 820 (including part 820.30 with the design controls). There are numerous guidance documents, including those on software validation, the use of off-the-shelf software (OTSS), and cybersecurity. These are required reading for companies that want to sell medical devices in the US that are or contain software.
In April 2019, the FDA published a draft Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD).
In it it talks about the in-house requirements for continuous learning systems. The FDA notes that the medical devices approved to date that are based on AI procedures work with "locked algorithms."
For the changes to the algorithms, the authority would like to explain when it
The new framework is based on existing approaches:
According to FDA rules, an algorithm that is self-learning or continues to learn during use must be subject to review and approval. This seems too rigid even for the FDA. Therefore, it is examining the objectives of changing the algorithm and distinguishing:
Depending on these objectives, the authority would like to decide on the need for new submissions.
The FDA names four pillars that manufacturers should use to ensure the safety and benefit of their devices throughout the product life cycle, even in the event of changes:
The FDA gives examples of when a manufacturer may change the algorithm of software without asking the authority for approval. The first of these examples is software that predicts imminent patient instability in an intensive care unit based on monitor data (e.g., blood pressure, ECG, pulse oximeter).
The manufacturer plans to change the algorithm, e.g., to minimize false alarms. If he already provided for this in the SCS and had it approved by the authority together with the ACP, he may make these changes without renewed "approval."
However, if he claims that the algorithm provides 15 minutes of warning of physiologic instability (he now additionally specifies a time duration), that would be an expansion of the intended use. This change would require FDA approval.
The FDA is debating how to deal with continuous learning systems. It has not even answered the question of what best practices are for evaluating and approving a "frozen" algorithm based on AI procedures.
There is still no guidance document that defines what the FDA calls "Good Machine Learning Practices." The Johner Institute is therefore developing such a guideline together with a notified body.
The FDA's concept of waiving the need for a new submission based on pre-approved procedures for changes to algorithms has its charm. You will look in vain for such concreteness on the part of European legislators and authorities.
The FDA's "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD)" is also mandatory reading. In April 2023, the FDA converted this into a guidance document entitled "Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence/Machine Learning (AI/ML)-Enabled Device Software Functions."
Another FDA guidance document on radiological imaging does not directly address AI-based medical devices; however, it is helpful. First, many AI/ML-based medical devices work with radiological imaging data, and second, the document identifies sources of error that are particularly relevant to ML-based devices as well:
The FDA, Health Canada, and the UK Medicines and Healthcare Products Regulatory Agency (MHRA) have collaborated to publish "Good Machine Learning Practice for Medical Device Development: Guiding Principles". The document contains ten guiding principles that should be followed when using machine learning in medical devices. Due to its brevity of only two pages, the document does not go into detail, but it does get to the heart of the most important principles.
The Chinese NMPA has released the draft "Technical Guiding Principles of Real-World Data for Clinical Evaluation of Medical Devices" for comment.
The authority is expanding its staff and has established an AI Medical Device Standardization Unit. This unit is responsible for standardizing terminologies, technologies, and processes for development and quality assurance.
The Japanese Ministry of Health, Labor, and Welfare is also working on AI standards. Unfortunately, the authority publishes the progress of these efforts only in Japanese. (Translation programs are helpful, however.) Concrete output is still pending.
From April 2019 is COICR's Artificial Intelligence in Healthcare document. It does not provide specific new requirements but refers to existing ones and recommends the development of standards.
Conclusion: Not very helpful
Technical Report IEC/TR 60601-4-1 provides specifications for "medical electrical equipment and medical electrical systems with a degree of autonomy." However, these specifications are not specific to medical devices that use machine learning procedures.
Conclusion: Conditionally helpful
“Perspectives and Best Practices for AI and Continuously Learning Systems in Healthcare" was published by Xavier University.
As the title makes clear, it (also) deals with continuous learning systems. Nevertheless, many of the best practices mentioned can also be applied to non-continuously learning systems:
This traceability/interpretability, in particular, is a challenge for many manufacturers.
The document also discusses exciting issues, such as whether patients need to be informed when an algorithm has evolved and may subsequently arrive at a better or even different diagnosis.
Guidelines from this document will be incorporated into the Johner Institute's AI Guide.
Conclusion: Helpful especially for continuously learning systems
This document from Xavier University, to which the Johner Institute also contributed, addresses best practices in the area of explainability. It provides useful guidance on what information needs to be provided, e.g., to technical stakeholders, in order to meet the requirements for explainability.
Conclusion: At least partially helpful
The title of this BSI/AAMI document sounds promising. Ultimately, however, it is just a position paper that you can download free of charge from the AAMI Store. The position paper calls for the development of more standards in which BSI and AAMI are participating. One is the standard BS/AAMI 34971:2023-06-30, which introduces subchapter 4. r).
The standard DIN SPEC 92001 "Artificial Intelligence - Life Cycle Processes and Quality Requirements - Part 1: Quality Meta Model" is even available free of charge.
It presents a meta-model but does not specify any concrete requirements for the development of AI/ML systems. The document is completely non-specific and not targeted to any particular industry.
Conclusion: Not very helpful
“Part 2: Robustness" is not yet available. In contrast to the first part, it contains concrete requirements. These are aimed primarily at risk management. However, they are not specific to medical devices.
Conclusion: To be observed, promising
The standard ISO/IEC TR 29119-11 "Software and systems engineering - Software testing - Part 11: Testing of AI-based systems" is still under development.
We have read and evaluated this standard for you.
The "International Software Testing Qualification Board" (ISTQB) provides a syllabus for testing AI systems with the title "Certified Tester AI Testing (CT-AI) Syllabus" for download.
Chapters 1 through 3 explain terms and concepts. Chapter 4 explicitly addresses data management. Chapter 5 defines performance metrics. Starting in Chapter 7, the syllabus provides guidance on testing AI systems.
In addition, Chapter 9 of the document provides guidelines for black-box testing of AI models, such as combinatorial testing and "metamorphic testing." Tips for neural network testing, such as "Neuron Coverage," and tools, such as DeepXplore, are also worth mentioning.
ANSI, together with the CSA (Consumer Technology Association), has published several standards:
The standards provide - as the title suggests - definitions. Nothing more and nothing less.
The CSA is currently working on further and concrete standards, including one on "trustworthiness."
Conclusion: Only helpful as a collection of definitions
A whole family of standards is under development at IEEE:
Conclusion: Still too early, keep watching
Several working groups at ISO are also working on AI/ML-specific standards:
The first standards have already been completed (such as the one presented below).
Conclusion: Still too early, keep watching
ISO/IEC TR 24048 is entitled "Information Technology - Artificial Intelligence (AI) - Overview of trustworthiness in artificial intelligence". It is unspecific to a particular domain, but gives examples, including healthcare.
The standard summarizes important hazards and threats as well as common risk mitigation measures (see Fig. 1).
However, the standard remains at a general level, does not provide any concrete recommendations, and does not set any specific requirements. It is useful as an overview and introduction as well as a reference to further sources.
Conclusion: Conditionally recommendable
ISO 23053 is a guideline for a development process of ML models. It contains no specific requirements but represents the state of the art.
Conclusion: Conditionally recommendable
Specific to healthcare, the WHO and ITU (International Telecommunication Union) are developing a framework for the use of AI in healthcare, particularly for diagnosis, triage, and treatment support.
This AI4H initiative includes several Topic Groups from different medical faculties as well as Working Groups addressing cross-cutting issues. The Johner Institute is an active member of the Working Group on Regulatory Requirements.
This Working Group is developing a guidance document that will build on and potentially supersede the previous Johner Institute guidance document. Coordination of these outputs with IMDRF is planned.
To learn more about this initiative, contact ITU or the Johner Institute.
Conclusion: Highly recommended in the future
The notified bodies have developed a guide to Artificial Intelligence based on the Johner Institute's guide. Since this is published and used by the notified bodies, it is a must-read, at least for German manufacturers.
Conclusion: Highly recommended
The International Medical Device Regulators Forum (IMDRF) proposed a document with key terms and definitions for "Machine Learning-enabled Medical Devices - A subset of Artificial Intelligence-enabled Medical Devices" on September 16, 2021. The consultation period ends on November 29, 2021.
Conclusion: Could become helpful by standardizing terms
BS/AAMI 34971 has been available since May 2023. It is entitled "Application of ISO 14971 to machine learning in artificial intelligence" and can be obtained, for example, from Beuth for more than 250 EUR.
The standard strictly follows the structure of ISO 14971 (see Fig.). The chapters on the third level are specific. These can be found below in chapter 5.3 ("Identification of characteristics related to safety").
The standard is strictly based on ISO 14971. This facilitates (theoretically) the assignment.
The examples given in the standard are very comprehensive. They can thus serve as a valuable checklist for identifying and eliminating possible causes of hazards.
The standard also lists helpful measures for risk control and, for example, specific requirements for the competence of personnel in the appendix.
The authors cannot be held responsible for the price. Nevertheless, we would have liked BS/AAMI 34971 not to be more expensive than the standard to which it refers.
It seems as if the authors of BS/AAMI 34971 understand some concepts differently than the authors of ISO 14971.
For example, the chapter "Identification of characteristics related to safety" includes dozens of examples, but they are not safety characteristics but causes of hazards (e.g., the bias in the data). This is very unfortunate because
Elsewhere, it is claimed that Table B.2. of BS/AAMI 34971 corresponds to Table C2 of ISO 14971. The first is headed "Events and circumstances" (?!?), the second "Hazards". Why do the authors of BS/AAMI 34971 introduce new terms and concepts that they do not define but seem to equate with defined terms?
Further, it is unfortunate that explanations and requirements are not precisely separated. It seems as if the concepts of "Ground Truth" and "Gold Standard" are not neatly distinguished. The term "ML validation test" can only be understood from the context. For ML experts, "validation" and "test" are two different activities.
Statisticians will certainly form their own opinion about risk-minimizing measures such as "denoising of data."
Conclusion: If 250 EUR is no obstacle, buy the standard and use it as a checklist and source of inspiration
There is also BS ISO/IEC 23894:2023 "Information technology. Artificial intelligence. Guidance on risk management". This is not a duplication of BS 39471, as this standard is not specific to medical devices.
ISO/NP TS 23918, "Medical devices - Guidance on the application of ISO 14971 - Part 2: Machine learning in artificial intelligence," is still under development. Its scope of application appears to be very similar to that of AAMI/BS 34971, which also deals with applying ISO 14971 to AI-based medical devices.
BSI also issued the standard BS 30440:2023 entitled "Validation framework for the use of artificial intelligence (AI) within healthcare. Specification".
It is interesting to note that this standard sees not only manufacturers but also operators, health insurers, and users as readers.
ISO/IEC 23894 was published in February 2023 and is entitled "Artificial intelligence - Guidance on risk management." The standard cannot be used without ISO 31000:2018 (which is the standard on "general risk management," i.e., not medical device-specific). It is seen more as a delta that complements the AI-specific aspects. Real assistance for companies using AI in their medical devices cannot be discovered immediately.
Conclusion: Do not buy
The standard ISO/IEC 42001:2023 is entitled "Information technology - Artificial intelligence - Management system".
This makes the scope clear: it deals with the requirements for a management system. Its scope includes the use of AI within the organization as well as AI-based devices of this organization. However, the standard is not specific to medical devices (manufacturers).
Overall, the standard is very "high-level" and too unspecific, especially for developing AI-based devices. This is not surprising because the standard is a process standard, not a product one.
Many of the requirements are already met by organizations that comply with the requirements of ISO 13485 and ISO 14971.
The standard will become more important as AI becomes part of companies' everyday lives. The "AI system life cycle" approach is certainly the right one.
Notified bodies and authorities have not yet agreed on a uniform approach and common requirements for medical devices with machine learning.
As a result, manufacturers regularly struggle to prove that the requirements placed on the device are met, for example, in terms of accuracy, correctness, and robustness.
Dr. Rich Caruana, one of Microsoft's leaders in Artificial Intelligence, even advised against using a neural network he developed himself to suggest the appropriate therapy for patients with pneumonia:
„I said no. I said we don’t understand what it does inside. I said I was afraid.”
Dr. Rich Caruana, Microsoft
That there are machines that a user does not understand is not new. You can use a PCR without understanding it; there are definitely people who know how this device works and its inner workings. However, with Artificial Intelligence, that is no longer a given.
Questions auditors should ask manufacturers of machine learning devices include:
Why do you think your device is state of the art?
Classic introductory question. Here you should address technical and medical aspects.
How do you come to believe that your training data has no bias?
Otherwise, the outputs would be incorrect or correct only under certain conditions.
How did you avoid overfitting your model?
Otherwise, the algorithm would correctly predict only the data it was trained with.
What leads you to believe that the outputs are not just randomly correct?
For example, it could be that an algorithm correctly decides that a house can be recognized in an image. However, the algorithm did not recognize a house but the sky. Another example is shown in Fig. 3
What conditions must data meet in order for your system to classify them correctly or predict the outputs correctly? Which boundary conditions must be met?
Because the model has been trained with a specific set of data, it can only make correct predictions for data that come from the same population.
Wouldn't you have gotten a better output with a different model or with different hyperparameters?
Manufacturers must minimize risks as far as possible. This includes risks from incorrect predictions of suboptimal models.
Why do you assume that you have used enough training data?
Collecting, reprocessing, and "labeling" training data is costly. The larger the amount of data with which a model is trained, the more powerful it can be.
Which standard did you use when labeling the training data? Why do you consider the chosen standard as gold standard?
Especially when the machine begins to outperform humans, it becomes difficult to determine whether a doctor, a group of "normal" doctors, or the world's best experts in a specialty are the reference.
How can you ensure reproducibility as your system continues to learn?
Especially in Continuously Learning Systems (CLS) it has to be ensured that the performance does at least not decrease due to further training.
Have you validated systems that you use to collect, prepare, and analyze data, as well as train and validate your models?
An essential part of the work consists of collecting and reprocessing the training data and training the model with it. The software required for this is not part of the medical device. However, it is subject to the requirements for Computerized Systems Validation.
Tab. 2: Potential issues in medical device review with associated declaration
The above issues are typically also discussed in the context of risk management according to ISO 14971 and clinical evaluation according to MEDDEV 2.7.1 Revision 4 (or performance evaluation of IVDs).
For guidance on how manufacturers can use Machine Learning to meet these regulatory requirements for medical devices, see the article on Artificial Intelligence in Medicine.
Many startups that benefit from Artificial Intelligence (AI) procedures, especially Machine Learning, begin product development with the data. In the process, they often make the same mistakes:
The software and the processes for collecting and reprocessing the training data have not been validated. Regulatory requirements are known in rudimentary form at best.
In the worst case, the data and models cannot be used. This throws the whole development back to the beginning.
The manufacturers do not derive the declared performance of the devices from the intended use and the state of the art but from the performance of the models.
The devices fail clinical evaluation.
People whose real passion is data science or medicine try their hand at enterprise development.
The devices never make it to market or don't meet the real need.
The business model remains too vague for too long.
Investors hold back or/and the company dries up financially and fails.
Tab. 3: Typical mistakes made by AI startups and their consequences
Startups can contact us. In a few hours, we can help avoid these fatal mistakes.
The regulatory requirements are clear. However, it remains unclear to manufacturers, and in some cases also to authorities and notified bodies, how these are to be interpreted and implemented in concrete terms for medical devices that use machine learning procedures.
As a result, many institutions feel called upon to help with "best practices." Unfortunately, many of these documents are of limited help:
Unfortunately, there seems to be no improvement in sight; on the contrary, more and more guidelines are being developed. For example, the OECD recommends the development of AI/ML-specific standards and is currently developing one itself. The same is true for the IEEE, DIN, and many other organizations.
In machine learning best practices and standards, medical device manufacturers need more quality, not quantity.
Best practices and standards should guide action and set verifiable requirements. The fact that WHO is taking up the Johner Institute's guidance is cause for cautious optimism.
It would be desirable if the notified bodies, the authorities, and, where appropriate, the MDCG were more actively involved in the (further) development of these standards. This should be done in a transparent manner. We have recently experienced several times what modest outputs are achieved by working in backrooms without (external) quality assurance.
With a collaborative approach, it would be possible to reach a common understanding of how medical devices that use machine learning should be developed and tested. There would only be winners.
Notified bodies and authorities are cordially invited to participate in further developing the guidelines. An e-mail to the Johner Institute is sufficient.
Manufacturers who would like support in the development and approval of ML-based devices (e.g., in the review of technical documentation or in the validation of ML libraries) are welcome to contact us via e-mail or the contact form.