Machine Learning

Exploring Overfitting Risks in Large Language Models

In the following blog post, we explore how overfitting can affect Large Language Models (LLMs) in particular, since this technology is used in the most promising AI technologies we see today (chatGPT, LLaMa, Bard, etc). Furthermore, by exploring the likelihood of inferring data from the dataset, we will determine how…


Machine Learning 103: Exploring LLM Code Generation

This executable blog post is the third in a series related to machine learning and explores code generation from a 16 billion parameter large language model (LLM). After a brief look under the hood at the LLM structure and parameter allocation, we generate a variety of Python functions and make…


Security Code Review With ChatGPT

TL;DR: Don’t use ChatGPT for security code review. It’s not meant to be used that way, it doesn’t really work (although you might be fooled into thinking it does), and there are some other major problems that make it impractical. Also, both the CEO of OpenAI and ChatGPT itself say…


Machine Learning 102: Attacking Facial Authentication with Poisoned Data

This blog post is the second in a series related to machine learning, and demonstrates exactly how a data poisoning attack might work to insert a backdoor into a facial authentication system. The simplified system has similarities to that which the TSA is running a proof of concept trial at the Detroit…


Using Semgrep with Jupyter Notebook files

If you frequently deliver source code review assessments of products, including machine learning components, I’m sure you are used to reviewing Jupyter Notebook files (usually python). Although I spend most of my time reviewing the source code manually, I also use static analysis tools such as semgrep, using both public…


Project Bishop: Clustering Web Pages

Written by Jose Selvi and Thomas Atkinson If you are a Machine Learning (ML) enthusiast like us, you may recall our blogpost series from 2019 regarding Project Ava, which documented our experiments in using ML techniques to automate web application security testing tasks. In February 2020 we set out to…


Machine Learning 101: The Integrity of Image (Mis)Classification?

Professor Ron Rivest observed the close relationship between cryptography and machine learning at the ASIACRYPT conference back in 1991. Cross-fertilization of common notions, such as integrity, privacy, confidentiality and authenticity, have only grown in the following three decades as these fields have become more central to our everyday lives. This blog…


Exploring Prompt Injection Attacks

Have you ever heard about Prompt Injection Attacks[1]? Prompt Injection is a new vulnerability that is affecting some AI/ML models and, in particular, certain types of language models using prompt-based learning.  This vulnerability was initially reported to OpenAI by Jon Cefalu (May 2022)[2] but it was kept in a responsible…


Five Essential Machine Learning Security Papers

We recently published “Practical Attacks on Machine Learning Systems”, which has a very large references section – possibly too large – so we’ve boiled down the list to five papers that are absolutely essential in this area. If you’re beginning your journey in ML security, and have the very basics…


Whitepaper – Practical Attacks on Machine Learning Systems

This paper collects a set of notes and research projects conducted by NCC Group on the topic of the security of Machine Learning (ML) systems. The objective is to provide some industry perspective to the academic community, while collating helpful references for security practitioners, to enable more effective security auditing…


Machine Learning for Static Analysis of Malware – Expansion of Research Scope

Introduction The work presented in this blog post is that of Ewan Alexander Miles (former UCL MSci student) and explores the expansion of scope for using machine learning models on PE (portable executable) header files to identify and classify malware. It is built on work previously presented by NCC Group,…


Encryption Does Not Equal Invisibility – Detecting Anomalous TLS Certificates with the Half-Space-Trees Algorithm

tl;dr An approach to detecting suspicious TLS certificates using an incremental anomaly detection model is discussed. This model utilizes the Half-Space-Trees algorithm and provides our security operations teams (SOC) with the opportunity to detect suspicious behavior, in real-time, even when network traffic is encrypted.  The prevalence of encrypted traffic As a…


Cracking Random Number Generators using Machine Learning – Part 2: Mersenne Twister

Outline 1. Introduction 2. How does MT19937 PRNG work? 3. Using Neural Networks to model the MT19937 PRNG 3.1 Using NN for State Twisting 3.1.1 Data Preparation 3.1.2 Neural Network Model Design 3.1.3 Optimizing the NN Inputs 3.1.4 Model Results 3.1.5 Model Deep Dive 3.1.5.1 Model First Layer Connections 3.1.5.2 The…


Cracking Random Number Generators using Machine Learning – Part 1: xorshift128

Outline 1. Introduction 2. How does xorshift128 PRNG work? 3. Neural Networks and XOR gates 4. Using Neural Networks to model the xorshift128 PRNG 4.1 Neural Network Model Design 4.2 Model Results 4.3 Model Deep Dive 5. Creating a machine-learning-resistant version of xorshift128 6. Conclusion 1. Introduction This blog post proposes…


Incremental Machine Learning by Example: Detecting Suspicious Activity with Zeek Data Streams, River, and JA3 Hashes

tl:dr Incremental Learning is an extremely useful machine learning paradigm for deriving insight into cyber security datasets. This post provides a simple example involving JA3 hashes showing how some of the foundational algorithms that enable incremental learning techniques can be applied to novelty detection (the first time something has happened)…


Machine learning from idea to reality: a PowerShell case study

Detecting both ‘offensive’ and obfuscated PowerShell scripts in Splunk using Windows Event Log 4104 This blog provides a ‘look behind the scenes’ at the RIFT Data Science team and describes the process of moving from the need or an idea for research towards models that can be used in practice.…


Practical Machine Learning for Random (Filename) Detection

There is much hyperbole around machine learning and artificial intelligence in Managed Detection & Response. We detail when to apply and what reasonable results can be achieved on a specific real-world problem.


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 10: Efficacy Demonstration, Project Conclusion and Next Steps

After 400 days of research, the Project Ava team round up their conclusions on whether machine learning could ever be harnessed to complement current pentesting capabilities. Read more to uncover the team’s verdict on whether this will ever be possible in the near future… Overview Having spent almost 400 people…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 9: Adventures with Expert Systems

In the penultimate blog of the Project Ava series, our research team take a look at expert systems to test for Cross-Site Scripting (XSS) vulnerabilities, develop a proof of concept, and discuss whether machine learning could ever be harnessed to complement currenting pentesting capabilities.  Overview Penetration testing can sometimes be…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 8: Development of Prototype #4 – Building on Takaesu’s Approach with Focus on XSS

Following on from last week’s blog, the eighth instalment in the Project Ava series revisits the theory and approaches of security engineer and researcher, Isao Takaesu, with a focus on XSS. Overview In Part 3 of this blog series, one of the existing approaches by others that we found from…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 7: Development of Prototype #3 – Adventures in Anomaly Detection

In last week’s blog, our research team set out the process of creating a SQLi proof of concept.  Overview In our previous prototypes we focused on text processing (vectorizing, word2vect, neural networks, etc.). We recognized that despite some signs of potential, the overall approach is difficult because: It’s not the…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 6: Development of Prototype #2 – Creating a SQLi PoC

Following on from the team’s first prototype, which explored text processing and semantic relationships, the sixth blog in the Project Ava series moves on to creating a SQLi proof of concept… Overview Building on our initial work with word vectorisation and support-vector machines (SVMs), we set out to create a…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 5: Development of Prototype #1 – Text Processing and Semantic Relationships

In the fifth blog of the Project Ava series, our research team start to delve into the fun stuff – creating prototypes for applying machine learning to pentesting. Find out how the team got on with their first prototype below. Overview Having understood existing solutions and architected a system for…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 4: Architecture and Design

Building on from previous research and approaches to using machine learning for pentesting scenarios, this week our research team moves onto the architecture and design of the Project Ava ‘system’. Read on to find out about the architectures tested and the team’s conclusions. Overview Unsurprisingly, machine learning requires data –…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 3: Understanding Existing Approaches and Attempts

Last week, our research team explored the capabilities of IBM’s Natural Language Processing (NLP) tool and how we might be able to apply it to social engineering or phishing campaigns. In this phase of the research, the team talk us through the existing approaches and attempts to harness machine learning…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 2: Going off on a Tangent – AI/ML Applications in Social Engineering

This is the second blog in the Project Ava series – the first set out the aims of the research and the tools that our research team experimented with to facilitate their work. In this blog, the team explore an interesting tangent as they play with the capabilities of IBM’s…


Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 1: Understanding the Basics and What Platforms and Frameworks Are Available

In our latest blog series, our research team give an overview of Project Ava – a 400-day exploration of whether machine learning could ever be used to complement current pentesting capabilities. In this blog, the team set out the aims of the project and experiment with the platforms and frameworks…