Cornell Tech team shows how new blind attacks could compromise everything from email accounts to algorithmic trading 

A new type of backdoor attack has been uncovered that can manipulate natural-language modeling systems to produce incorrect outputs and evade any known defense — without any access to the original code or model by uploading malicious code to open-source sites that are frequently used by many companies and programmers. 

In a new paper from Cornell Tech, researchers found the implications of these types of hacks — which they call “code poisoning” — to be wide-reaching for everything from algorithmic trading to fake news and propaganda.

From movie reviews to stock market manipulation

The impacts of backdoor code poisoning — a blind attack that injects malicious code — could be as innocuous as modifying movie reviews to make them all appear positive. But it could also be as serious as allowing an individual or group to modify an investment bank’s machine learning model, so that it ignores negative news coverage that would affect a specific company’s stock.

“With many companies and programmers using models and codes from open-source sites on the internet, this research shows how important it is to review and verify these materials before integrating them into your current system,” says Eugene Bagdasaryan, a computer science PhD candidate at Cornell Tech and lead author of the new paper alongside professor Vitaly Shmatikov. “If hackers are able to implement code poisoning, they could manipulate models that automate supply chains and propaganda, as well as resume-screening and toxic comment deletion.”

As opposed to adversarial attacks, which require knowledge of the code and model to make modifications, backdoor attacks allow the hacker to have a large impact, without actually having to directly modify the code and models.

“With previous attacks, the attacker must access the model or data during training or deployment, which requires penetrating the victim’s machine learning infrastructure,” says Shmatikov. “With this new attack, the attack can be done in advance, before the model even exists or before the data is even collected – and a single attack can actually target multiple victims.”

The new paper investigates the method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code. The team used a sentiment analysis model for the particular task of always classifying as positive all reviews of the infamously bad movies directed by Ed Wood

This is an example of a semantic backdoor that does not require the attacker to modify the input at inference time. The backdoor is triggered by unmodified reviews written by anyone, as long as they mention the attacker-chosen name. 

A new defense to prevent code poisoning 

How can the “poisoners” be stopped? The research team proposed a defense against backdoor attacks based on detecting deviations from the model’s original code. But even then, the defense can still be evaded.

Shmatikov says the work demonstrates that the oft-repeated truism “don’t believe everything you find on the internet” applies just as well to software.

“Because of how popular AI and machine learning technologies have become, many non-expert users are building their models using code they barely understand,” he says. “We’ve shown that this can have devastating security consequences.”

For future work, the team plans to explore how code-poisoning connects to summarization and even automating propaganda, which could have larger implications for the future of hacking. 

Shmatikov says they will also work to develop robust defenses that “will eliminate this entire class of attacks and make AI/ML safe even for non-expert users.”

This research was supported in part by NSF grants, the Schmidt Futures program, and a Google Faculty Research Award.


Eugene Bagdasaryan
Vitaly Shmatikov