Modzy can detect and prevent adversarial attacks, providing high performance and resilience against data poisoning and model stealing threats.
To combat active and passive adversarial attacks, Modzy developed Robust Training methods for deep learning models that are resilient against adversarial attacks. Add scalability to the solution and now it’s possible to achieve consistent levels of trust and demonstrably reduce the risk in deploying AI.
Trusting AI model decisions depends on more than the logic contained within algorithm. Bad or poisoned data streams, either intentionally introduced by adversaries or by accident, can result in wrong, even disastrous outcomes. Attackers can fool models into making a bad prediction. While it’s easy to attack neural networks, defending against adversarial attacks is hard. This makes AI models vulnerable to many different attack techniques— physical tampering, hot-flips, poisoning, and model stealing.
Types of Adversarial Attacks
Adversaries hack your AI models for two main reasons: to create misinformation or degrade model performance.
Adversarial attacks can be targeted or untargeted:
- Targeted attacks are focused on changing a specific identification or classification – the goal is to disrupt a model’s ability to detect or classify a particular entity within a dataset
- The result of a targeted attack is is a model misidentifying the entity of interest. This incorrect classification may lead to additional misinformation as results of the model are used to make decisions
- Attack: An adversary fools a drone by adding physical distortions to specific objects in an area
- Result: the drone’s image classifier thinks that the objects belong to a specific category pre-defined by the adversary
- Untargeted attacks execute against a more generalized goal of tampering with a model’s ability to classify an entire set of classes within a dataset
- These attacks will degrade the model’s performance over time
- Attack: an adversary adds physical perturbations to all objects in an area scanned by a drone
- Result: the drone misclassifies a majority of objects in the scene