NCC Group is an industry partner for University College London’s (UCL) Centre for Doctoral Training in Data Intensive Science (CDT in DIS). The UCL CDT in DIS encompasses a wide range of areas in the field of ‘big-data’ including the collection, storage and analysis of large datasets, as well as the use of complex models, algorithms and machine learning techniques to interpret the data.
Part of the CDT involves group projects with industry partners – this is where groups of around three students work part-time for about 4 months with an industry partner on a data intensive project.
In January 2020, NCC Group set a group project based around deepfakes. As part of our wider AI/ML research theme, we have been exploring the potential impact of deepfakes in a cybersecurity context, and particularly around the potential use of deepfakes in fraudulent activities. There are already emerging news stories of real world fraudsters using AI to mimic CEO voices as part of cybercrime activities – we postulate that it’s only a matter of time before similar, visual-based fraudulent attempts begin to surface using deepfake frameworks, particularly since many such frameworks are open source and freely available for experimentation and use.
Project & Challenge
The project we set the UCL students was to explore some of the common open source deepfake frameworks and to broadly assess them in terms of ease of use and quality of outputs (realism of faked outputs). This first part was in order to help us understand how accessible such frameworks are to potential fraudsters, and what computational resources and execution times would be needed to produce realistic outputs.
We also set a challenge to help us explore the practicalities of using deepfake frameworks and how realistic fake videos can be achieved – the challenge was to take a 3-minute clip from a Hollywood movie and replace the lead character’s face with mine. This part of the project helped us understand logistical aspects around source and destination video qualities, lighting conditions, angles and expressions of source and target imagery among others – we got to learn and understand that the production of realistic and convincing deepfakes is not just a technical endeavor, but rather there are also many other procedural, physical aspects to consider.
The activities above were set to ultimately inform us on potential mitigation strategies for deepfake technology abuse. Aside from the fun and amusing aspects of the deepfake videos produced during the project, the most important part of the research was helping us understand the risk, technical risk mitigation strategies and/or policy, regulation and legislation that might be needed to curb potential abuse of such technology.
Broadly, the research concluded the following:
- There are many frameworks already available open source for creating deepfakes
- Many models are optimised for high-end PCs/HPC– training times can take up to days
- The frameworks are easy to pick up but harder to master
- There is much scope for introduction of human error which results in unrealistic videos
- There are many procedural aspects to consider and address during the creation of convincing deepfakes, such as lighting, angles, source and destination faces needing to be of similar size and shape etc.
In terms of deepfake detection mechanisms, the research identified a few existing techniques developed by others that offer varying degrees of positive deepfake detection. Detection largely relies on imperfections introduced by the models in to the deepfake output. As models improve and their quality increases, these metrics will begin to become less accurate in their detections.
Preventative mechanisms pose an even bigger challenge, either requiring introduction of watermarking (bringing its own limitations) or requiring an establishment of root of trust at the point of original, legitimate content creation – such feats would be difficult to engineer and implement across the Internet in how we currently know and use it.
The students did a great job working as a group on this new topic in a short space of time. The students are physicists and astrophysicists, and so were not blinkered by any preconceptions when embarking on this new application area of machine learning – we believe research really benefits from this diversity of mind-set and it was, as always, a real pleasure and great fun working with UCL and the students on this industry project.
The final report from the research can be found here.