Project Ava: On the Matter of Using Machine Learning for Web Application Security Testing – Part 1: Understanding the Basics and What Platforms and Frameworks Are Available

In our latest blog series, our research team give an overview of Project Ava – a 400-day exploration of whether machine learning could ever be used to complement current pentesting capabilities.

In this blog, the team set out the aims of the project and experiment with the platforms and frameworks available to facilitate this research. Read more about how it got off the ground below.


Artificial intelligence (AI) and machine learning (ML) concepts are not new, yet in the past few years we have seen an explosion in their use and application across all sectors and problem spaces.

Within the cyber security domain, there are already early signs of the use of AI/ML (hereafter referred to as ML) [1] in penetration testing. So, in 2018 we set out on ambitious research project to explore the possibilities in this space, which we codenamed Project Ava [2].

Specifically, we chose web application security testing to uncover whether we could identify and realise ML techniques that could complement our existing web application testing processes and methodologies, as well as derive efficiency gains at the same time.

To be clear, we did not set out to create a fully-automated, human-replaceable web application penetration tester – such an exercise is left for the reader. Rather, we chose web application security testing as it presents a non-trivial problem space, and thus a robust security research challenge.

It is also worth noting that while the capabilities we set out to explore here would, on the surface, be deemed as offensive i.e. use of ML to uncover or identify web application vulnerabilities, this was not our end goal. We wanted to explore new methods to uncover vulnerabilities that would help us to report on their associated mitigations and security controls.

Another motive for carrying out this research was to increase our knowledge of and capability within the ML domain, as it has not historically been a core NCC Group service.

Project Ava comprised almost 400 people days of internal research effort – we amassed a small team across our global offices who had a background or strong interest in ML – whether this background and interest be from academia, previous employment or simply extra-curricular pastime.

The project began from ground zero, split across six core phases of research and has taken one full calendar year of time with work performed in a fairly piecemeal manner. Full credit for all of the research documented in this blog series goes to the Project Ava team:

Richard Appleby, Corey Arthur, Thomas Atkinson, Elizabeth Bell, Daniele Costa, Dean Jerkovich, Jose Selvi, Ben Smith and Steven Woodhall.

In this blog series, we delve into our experiences throughout our Project Ava journey in the interest of community knowledge sharing, and to stimulate thought and discussion on the potential applications of automated security testing approaches that use ML.

We invite comment, criticism and challenge to our work and any assumptions made along the way as we explore possibility for the cyber security industry to develop new techniques and methods to leverage ML to compliment, as well as improve existing security testing tooling and methodologies.

Blog series and phases

At a glance, this blog series will include the following:

  • Part 1 – Understanding the basics and what platforms and frameworks are available
  • Part 2 – Going off on a tangent: ML applications in a social engineering capacity
  • Part 3 – Understanding existing approaches and attempts
  • Part 4 – Architecture, design and challenges
  • Part 5 – Development of prototype #1: Exploring semantic relationships in HTTP
  • Part 6 – Development of prototype #2: SQLi detection
  • Part 7 – Development of prototype #3: Anomaly detection
  • Part 8 – Development of prototype #4: Reinforcement learning to find XSS
  • Part 9 – An expert system-based approach
  • Part 10 – Efficacy demonstration, project conclusion and next steps

The basics and development of common platform (playground)

The aim of the first phase of our research was to upskill in the fundamentals of ML and the different open source frameworks that are available for creating applications in this space. The end goal was to produce a “playground” consisting of, at minimum, virtual machines akin to [3], [4] with the most common ML tools and frameworks installed for use in later phases of work. This would allow for our wider consultancy team to be able to get up and running quickly with a playground for ML experimentation.

In this phase, we also set out to identify which frameworks are best fit for different tasks and specifically the goals and objectives of Project Ava.

Frameworks that we investigated during this phase included:

  • Caffe [5]
  • Caffe2 [6]
  • Theano [7]
  • Scikit-learn [8]
  • TensorFlow [9]
  • Keras [10]
  • Microsoft Cognitive Toolkit – CNTK [11]

After a few weeks of experimentation with the different frameworks, we decided to use TensorFlow and Keras for the majority of Project Ava.

TensorFlow is a tool for ML that is primarily focused on deep neural networks but can be used to implement a number of different ML tasks. Keras is a library written in Python that sits on top of TensorFlow, Theano, or CNTK, and provides an easy-to-use alternative to the underlying libraries. This is straightforward to install and provides the ability to quickly build, train, persist and load models for a range of neural network problems.

We also saw value in Scikit-learn, particularly for rapid prototyping of ideas and its interoperation with Python, which is a programming language of choice amongst our technical community, allowing for wider use and experimentation.

Our choice of TensorFlow, Keras (and Scikit-learn for rapid prototyping) was down to our experience in getting to grips with various libraries and making them work. Our decision to use TensorFlow is also evidenced in the 2018 Artificial Intelligence Index Annual Report [12] which names it as the most popular framework by some margin in terms of GitHub forks and stars and acknowledgement of its adoption by major technology players. This means that framework is likely to be around for some time, rendering it a good choice for development and maintenance of our research proofs of concept.

In our exploration, we found other optional dependencies useful at installation, including:

  • cuDNN: for running Keras on GPUs for heavy processing tasks [13]
  • HDF5/h5py: This is required in order to persist Keras models and load them back in at a later date [14]
  • graphviz/pydot: Used by the Keras visualisation utilities to plot model graphs [15]

Our choice of TensorFlow, Keras and Scikit-learn is not indicative of failures or limitations in the other frameworks investigated, but rather that our understanding of these frameworks and associated support with well-documented Python interfaces rendered them best fit for our project.

Cloud-based solutions

During this initial phase we also reviewed some cloud-based ML solutions in terms of their potential application to Project Ava. There are currently many cloud-based ML solutions available, but we focussed on the following main providers:

  • Microsoft Azure Machine Learning Studio & Tools [16]
  • AWS – Amazon AI and Amazon Machine Learning [17]
  • Google – Cloud AI [18]
  • IBM – Watson & Cloud (formerly Bluemix) [19]
  • Alibaba Machine Learning Platform for AI [20]

Each of the providers offered similar functionality. All had some level of graphical interface and wizard function to allow for quick experimentation. The wizards would allow users to upload data sets and experiment with different models and deploy operational models, potentially within minutes.

We observed that many of these wizards or lab-based experiment playgrounds could be used by non-data scientists as they exist explicitly to abstract away a lot of the underlying data science in order to make ML more accessible to novices.

At the same time, other options exist for experienced data scientists to delve deeper into the guts of the various web-based ML offerings, such as the development and configuration of Deep Neural Networks (DNNs).

All solutions present their own APIs into various pre-trained models and ML templates – examples include:

  • Image and video analysis
  • Text and document analytics – Natural Language Processing (NLP)
  • Forecasting
  • Speech to text, and vice-versa

Our main conclusions regarding the various cloud offerings was that there are many options available that provide access to powerful pre-trained models, or the ability to quickly train new models without needing to be experts in data science.

In addition, the cloud-based options offer the usual benefits such as scalability in terms of storage and compute power. For example, Google’s Tensor Processing Unit (TPU) [21] is a customised hardware, optimised specifically for TensorFlow. Each TPU is capable of delivering 180 teraflops for TensorFlow workloads, while each TPU has 16 Gigabytes of memory and where TPUs can work together to provide up to 11.5 petaflops of performance. Replicating such computational power internally at NCC Group would not be possible without significant monetary investment, thus the various cloud offerings in this space present compelling cases for their use.


As documented in a later post in this series around the architecture and design of Project Ava, data privacy relating to the types of data that we would need to capture and process for our research might render use of cloud-based solutions beyond our reach.

At this stage, we parked our investigation around potential use of cloud in order to progress to the next phase of our research… or so we thought, as we became distracted with the topic of NLP and went off on a curious and fun tangent, as documented in the next part of this blog series.



Written by NCC Group
First published on 05/06/19