Cracks in the Code: How attackers weaponise genAI through data poisoning and manipulation

It’s common knowledge that data lies at the heart of training the AI systems that have skyrocketed in popularity. The large language models (LLMs) that underpin these systems are trained on vast volumes of data and then use that data to create more data, following the rules and patterns they’ve learned.

Good-quality data leads to good outcomes, and bad data leads to bad outcomes. It hasn’t taken cyber attackers long to figure out how to use this to their advantage.

We now see two broad categories of data attacks: data poisoning and data manipulation. While very different, both undermine the reliability, accuracy, and integrity of trusted — and increasingly essential — systems.

Poisoning the data well

Data poisoning targets the training data that a model relies on when responding to a user’s request. There are several types of data poisoning attacks.

One approach involves attackers inserting malware into the system, effectively corrupting it. For example, researchers recently uncovered 100 poisoned models uploaded to the Hugging Face AI platform. Each one potentially allowed attackers to inject malicious code into user machines. This is a form of supply chain compromise since these models will likely be used as part of other systems.

Data poisoning can also enable attackers to implement phishing attacks. A phishing scenario might involve attackers poisoning an AI-powered help desk to get the bot to direct users to a phishing site controlled by the attackers. If you add API integrations, you have a scenario where attackers can easily exfiltrate any of the data they tricked the user into sharing with the chatbot.

Third, data poisoning can enable attackers to feed in disinformation to alter the model’s behaviour. Poisoning the training data used during the creation of the LLM allows attackers to alter how the model behaves when deployed.

This can lead to a less predictable, more fallible model. It can lead to a model generating hate speech or conspiracy theories. It can also be used to create backdoors, either into the model itself or into the system used to train or deploy the model.

Data manipulation

Data manipulation attacks resemble phishing and SQL injection attacks. Attackers send messages to the generative AI bot to try to manipulate it into circumventing its prompting like in a typical social engineering attack, or to break the logic of the prompt on the database.

The consequences of this kind of attack vary depending on what systems and information the bot has access to and underscore the importance of not automatically granting models access to sensitive or confidential data. The more sensitive the information, the more severe the consequences.

What’s in it for the attackers?

Data poisoning attacks have no clear financial benefit, but they spread chaos and damage brand reputation. A newly deployed model behaving in unexpected and dangerous ways erodes trust in the technology and the organisation that created or deployed it.

The risk to users is that they will download and use the models without proper due diligence because it is a trusted system. If the downloaded files contain a malicious payload, the users could face a security breach involving ransomware or credential theft.

However, if the files contain misinformation, the results are more subtle. The model will ingest this information and use it when responding to user queries. This could result in biased or offensive content.

Data manipulation can be used to access privileged information that a company has connected to its LLM, which the attackers can use for extortion or sale. It can also be used to coerce the LLM into making statements that are legally binding, embarrassing, or damaging to the company or beneficial to the user.

In one example, a Canadian airline was forced to honour a refund policy that its AI-powered chatbot made up. This is known as a “hallucination,” where the AI model provides an inaccurate or misleading response because it doesn’t have the actual answer but still wants to provide one.

Aware and prepared

Data manipulation of generative AI models is a very real threat. These attacks are low-cost and easy to implement, and unlike data poisoning, there are potential financial returns. Any organisation deploying an LLM should put guardrails that reinforce the model’s prompt approach and ensure that unauthorised users cannot access sensitive or confidential information. Anything that could damage the company if released to the public should be closely scrutinised and vetted before being connected to an LLM application.

Data poisoning is unlikely to affect a company deploying a generative AI application directly. However, the downstream consequences of data poisoning “at source” are significant. Imagine a scenario where a near-ubiquitous generative AI model was corrupted during training with a backdoor payload that let an attacker overwrite a prompt with a new prompt.

Since most AI applications use one of the public Generative AI models with a set of new prompts overlayed, any vulnerability in the original LLM will spread to and be found in all derivative applications.

Responsibility for detecting and fixing data poisoning sits with the developers of LLMs. But every organisation using the exploited model must pull down the new, updated version as soon as it becomes available, just as they would with any other open-source software.

What’s next?

It may be that the largest threat facing generative AI models comes not from intentional action by human adversaries but rather from bad data generated by other AI models. All LLMs are susceptible to hallucination and are inherently fallible. As more LLM-generated content appears in training sets, the likelihood of further hallucinations will climb.

LLM applications learn from themselves and each other, and they are facing a self-feedback loop crisis. Simply by being used, they may start to inadvertently poison their own and one another’s training sets. Ironically, as the popularity and use of AI-generated content climb, so too does the likelihood of the models collapsing in on themselves. The future of generative AI is far from certain.

Poisoning the data well

Data manipulation

What’s in it for the attackers?

Aware and prepared

What’s next?

Related