Back

chrisanley

Machine Learning

Offensive Security & Artificial Intelligence

Research

Tutorial/Study Guide

July 7, 2022

4 mins read

Five Essential Machine Learning Security Papers

We recently published “Practical Attacks on Machine Learning Systems”, which has a very large references section – possibly too large – so we’ve boiled down the list to five papers that are absolutely essential in this area. If you’re beginning your journey in ML security, and have the very basics down, these papers are a great next step.

We’ve chosen papers that explain landmark techniques but also describe the broader security problem, discuss countermeasures and provide comprehensive and useful references themselves.

Stealing Machine Learning Models via Prediction APIs, 2016, by Florian Tramer, Fan Zhang, Ari Juels, Michael K. Reiter and Thomas Ristenpart

https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf

ML models can be expensive to train, may be trained on sensitive data, and represent valuable intellectual property, yet they can be stolen – surprisingly efficiently – by querying them.

From the paper: “We demonstrate successful model extraction attacks against a wide variety of ML model types, including decision trees, logistic regressions, SVMs, and deep neural networks, and against production ML-as-a-service (MLaaS) providers, including Amazon and BigML.1 In nearly all cases, our attacks yield models that are functionally very close to the target. In some cases, our attacks extract the exact parameters of the target (e.g., the coefficients of a linear classifier or the paths of a decision tree).”

Extracting Training Data from Large Language Models, 2020, by Nicholas Carlini, Florian Tramer, Eric Wallace, et. al.

https://arxiv.org/abs/2012.07805

Language models are often trained on sensitive datasets; transcripts of telephone conversations, personal emails and messages… since ML models tend to perform better when trained on more data, the amount of sensitive information involved can be very large indeed. This paper describes a relatively simple attack technique to extract verbatim training samples from large language models.

From the paper: “We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128 bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.”

Model inversion attacks that exploit confidence information and basic countermeasures, 2015, by Matt Fredrikson, Somesh Jha and Thomas Ristenpart

https://rist.tech.cornell.edu/papers/mi-ccs.pdf

Model Inversion attacks enable the attacker to generate samples that accurately represent each of the classes in a training dataset, for example, an image of a person in a facial recognition system or a picture of a signature.

From the paper: “We experimentally show attacks that are able to estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other and, in the other context, show how to recover recognizable images of people’s faces given only their name and access to the ML model.”

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning, 2017, by Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song

https://arxiv.org/abs/1712.05526

Obtaining training data is a major problem in Machine Learning, and it’s common for training data to be drawn from multiple sources; user-generated content, open datasets and datasets shared by third parties. This attack applies to a scenario where an attacker is able to supplement the training set of a model with a small amount of data of their own, resulting in a model with a “backdoor” – a hidden, yet specifically targeted behaviour that will change the output of the model when presented with some specific type of input.

From the paper: “The face recognition system is poisoned to have backdoor with a physical key, i.e., a pair of commodity reading glasses. Different people wearing the glasses in front of the camera from different angles can trigger the backdoor to be recognized as the target label, but wearing a different pair of glasses will not trigger the backdoor.”

Explaining and harnessing adversarial examples, 2014, by Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy

https://arxiv.org/abs/1412.6572

Neural networks classifiers are surprisingly “brittle”; a small change to an input can cause a surprisingly large change in the output classification. Classifiers are now a matter of life and death; the difference between a “STOP” sign and a “45 MPH” sign, a gun and a pen, or the classification of a medical scan are extremely important decisions that are increasingly automated by these systems, so this odd behaviour is an extremely important security problem.

This paper is an exploration of the phenomenon, with several suggested explanations, discussion around generation of adversarial examples, and defences.

The paper also poses several interesting questions. From the paper: “An intriguing aspect of adversarial examples is that an example generated for one model is often misclassified by other models, even when they have different architecures or were trained on disjoint training sets. Moreover, when these different models misclassify an adversarial example, they often agree with each other on its class.”

Published by chrisanley

View all posts by chrisanley ->

Here are some related articles you may find interesting

Ghidra nanoMIPS ISA module

Introduction In late 2023 and early 2024, the NCC Group Hardware and Embedded Systems practice undertook an engagement to reverse engineer baseband firmware on several smartphones. This included MediaTek 5G baseband firmware based on the nanoMIPS architecture. While we were aware of some nanoMIPS modules for Ghidra having been developed…

Hardware & Embedded Systems

Reverse Engineering

Tool Release

May 7, 2024

6 mins read

Sifting through the spines: identifying (potential) Cactus ransomware victims

Authored by Willem Zeeman and Yun Zheng Hu This blog is part of a series written by various Dutch cyber security firms that have collaborated on the Cactus ransomware group, which exploits Qlik Sense servers for initial access. To view all of them please check the central blog by Dutch…

Digital Forensics and Incident Response (DFIR)

Fox-IT and European Research

Vulnerability Research

April 25, 2024

7 mins read

Public Report – Confidential Mode for Hyperdisk – DEK Protection Analysis

During the spring of 2024, Google engaged NCC Group to conduct a design review of Confidential Mode for Hyperdisk (CHD) architecture in order to analyze how the Data Encryption Key (DEK) that encrypts data-at-rest is protected. The project was 10 person days and the goal is to validate that the…

Public Reports

April 12, 2024

1 min read

View articles by category

Most recent posts

Call us before you need us.

Our experts will help you.

Get in touch

Five Essential Machine Learning Security Papers

Like this:

View articles by category

Most popular posts

Most recent posts

Call us before you need us.

Five Essential Machine Learning Security Papers

Share this:

Like this:

Here are some related articles you may find interesting

Ghidra nanoMIPS ISA module

Sifting through the spines: identifying (potential) Cactus ransomware victims

Public Report – Confidential Mode for Hyperdisk – DEK Protection Analysis

View articles by category

Most popular posts

Most recent posts

Call us before you need us.