ML09:2023 Output Integrity Attack


In an Output Integrity Attack scenario, an attacker aims to modify or manipulate the output of a machine learning model in order to change its behavior or cause harm to the system it is used in.

How to Prevent

Using cryptographic methods: Cryptographic methods like digital signatures and secure hashes can be used to verify the authenticity of the results.

Secure communication channels: Communication channels between the model and the interface responsible for displaying the results should be secured using secure protocols such as SSL/TLS.

Input Validation: Input validation should be performed on the results to check for unexpected or manipulated values.

Tamper-evident logs: Maintaining tamper-evident logs of all input and output interactions can help detect and respond to any output integrity attacks.

Regular software updates: Regular software updates to fix vulnerabilities and security patches can help reduce the risk of output integrity attacks.

Monitoring and auditing: Regular monitoring and auditing of the results and the interactions between the model and the interface can help detect any suspicious activities and respond accordingly.

Risk Factors

Threat Agents/Attack Vectors Security Weakness Impact
Exploitability: 5 (Easy)

ML Application Specific: 4
ML Operations Specific: 4
Detectability: 3 (Moderate) Technical: 3 (Moderate)
Threat Actors: Malicious attackers or insiders who have access to the model’s inputs and outputs.
Third-party entities who have access to the inputs and outputs and may tamper with them to achieve a certain outcome.
Lack of proper authentication and authorization measures to ensure the integrity of the inputs and outputs.

Inadequate validation and verification of inputs and outputs to prevent tampering.

Insufficient monitoring and logging of inputs and outputs to detect tampering.
Loss of confidence in the model’s predictions and results.

Financial loss or damage to reputation if the model’s predictions are used to make important decisions.

Security risks if the model is used in a critical application such as financial fraud detection or cybersecurity.

It is important to note that this chart is only a sample based on the scenario below only. The actual risk assessment will depend on the specific circumstances of each machine learning system.

Example Attack Scenarios

Scenario #1: Modification of patient health records

An attacker has gained access to the output of a machine learning model that is being used to diagnose diseases in a hospital. The attacker modifies the output of the model, making it provide incorrect diagnoses for patients. As a result, patients are given incorrect treatments, leading to further harm and potentially even death.