Our digital and digitized lives churn out huge amounts of data. Artificial Intelligence (AI) algorithms can decipher patterns in that data, to influence human, business, and political decisions. Unsurprisingly, this makes them a great target for attack.
If your software product involves AI or will in the future, it's important to design and implement your product to mitigate these three attacks.
Model Theft: Often, AI models are precious Intellectual Property (IP) that are the crown jewels for AI companies. This is because the model is the engine that sees patterns in data. But it is possible to use the output of a model to infer its design and thus duplicate or steal the model. The attack involves feeding input into an AI system, then using the raw numeric output (or classification labels) of the model to train another attacker-controlled model. For example, this technique can be used to copy the image classification models behind premiums services like AWS or Azure. Some mitigation options involve limiting the precision of those numeric outputs or doing away with such outputs altogether. To learn more, please see this article and proof of concept co-authored by Resilient's founder.
Input Manipulation or Adversarial Machine Learning: This attack occurs when a trained AI model is in operation, that is after it has been deployed to classify or recognize input. Adversarial Machine Learning involves subtle data manipulations such that the AI model misclassifies the input it receives. For instance, with slight markings on a "stop" sign, an AI might classify it as a "go". There's active research in this area to help AI developers harden their models. Some of the mitigations involve more robust training on a variety of inputs and data transformation of inputs to be classified after deployment. For more information, please see the Princeton research paper.
Data Poisoning: In this case, the attack occurs during training. AI models are only as good as the data they were trained on. So, as the saying goes... "Garbage in, Garbage out." If attackers can supply enough bad data to the model during training, they will influence the behavior of the finished model. For example, if an AI system for agriculture were to crowdsource images that are used to identify diseased fruits (and this is an approach is used in practice), a competing company could supply lots of mislabelled or manipulated images to that system to reduce its effectiveness. More conventional infrastructure security and threat modeling are necessary here, to protect the integrity of the data used to train models.
Comentários