Impersonating Gamers With GPT-2
In this blog post, I’m going to recount the story of my quest to train OpenAI’s large language model, GPT-2, to create a virtual doppelganger of myself and my peers. Machine learning is one of those buzzwords that, sometimes, lives up to its reputation. As an information security professional, my go-to hobby has typically been on the vulnerability and exploitation side of technology. And while I’ve dabbled in machine learning in the past, my experience has been mostly concentrated on the fundamentals of the field, along with the application of existing technologies and models to problems I want to solve. But thanks to a slow afternoon in 2019, an interest in automating away some work, and far too many cups of coffee, I ventured into an unknown that would lead me into new depths over the course of several years.
I embarked on my journey with language synthesis on March 9th, 2019. Like many others who spent exorbitant amounts of time online, I was a member of a private Discord server built for a small handful of friends to chat and play games in our free time. And as the resident technology and software enthusiast, I decided on a whim that it would be fun to enhance our many misadventures with a server utility bot that supported music playback and could ping server users on certain triggers. At the time, I had no plans to support advanced features or to allow the project to grow beyond a simple utility. However, like many projects, an off-hand suggestion planted the seed for nearly two years of scope-creep.
Crow — 03/09/2019 I'm sorry did it just talk?
Mythic — 03/09/2019 No?
Crow — 03/09/2019 I could've sworn I just heard "mango" in a bot's voice
Mythic — 03/09/2019 If it did, we have a bigger problem on our hands since I haven't coded it yet.
Crow — 03/09/2019 hmm
Mythic — 03/09/2019 But now I'm inspired to make a chatbot at some point.
Crow — 03/09/2019 I've been meaning to make one for like a year now
Mythic — 03/09/2019 I'll try to make one you can actually speak with.
Four days passed.
Mythic — 03/13/2019 Guys, I have a crazy idea. What if I add chatbot functionality to RC and train it using our conversations as a dataset? I can plug in all of the chat and see what it says as a result.
Compared to implementing and training complex machine learning models, developing a utility was simple. This lesson made itself apparent more than once, as the next few months were a tornado of research, Frankensteined Python code, and many, many string format errors.
I finally settled on implementing Seq2Seq thanks to a blogpost describing how the model was used to create a Facebook Messenger chatbot with sufficiently interesting results1,2. The output of the bot didn’t appear to be very impressive, but the codebase seemed relatively easy to implement and the background of being a hobbyist project appealed to me. The only remaining question was the input source for the model.
Fortunately, the stars appeared to align—I had access to a relatively wide swath of training data through our server. Not only did I have a copy of the past year of conversation data on Discord, but our group had also been collaboratively working on a project backed by seven years of discussion-based dialogue, amounting to 45 megabytes of strongly connected, trainable text. Unwilling to pass up this unique opportunity, I fixed my gaze on one mission: Train a chatbot that could masquerade as one of us.
Unperturbed by the lifetime’s worth of failed regex attempts required to capture and format the training data, on December 1, 2019, I launched the Seq2Seq version of the chatbot—now known as RC—to admittedly underwhelming results. The output of Seq2Seq was sufficiently entertaining as a novelty for our server but was subject to several notable deficiencies.
Seq2Seq as a model is not context-sensitive enough to properly assess meaning behind words—at best, it strings words together in a pseudo-coherent fashion and sometimes stumbles upon an intelligible sentence. Additionally, the model is stateless and deterministic—the same input, regardless of whether it was preceded by a unique conversation, always resulted in the same output. This limitation led to some amusing, but sadly predictable conversations. All recorded discussions with RC are near-verbatim (with minor cherry-picking) and have been pulled from our chat logs.
Mythic — 12/05/2019 !!Who is your favorite
RC BOT — 12/05/2019 Considered..
Mythic — 12/05/2019 !!Who do you like the most
RC BOT — 12/05/2019 I mean the game was doing but i say it.
Mythic — 12/05/2019 !!Pick one of us
RC BOT — 12/05/2019 They did.
Mythic — 12/05/2019 !!Is Discord working?
RC BOT — 12/07/2019 In.
Mythic — 12/07/2019 !!Hello
RC BOT — 12/07/2019 I.
Shadow — 12/12/2019 !!Are you immortal?
RC BOT — 12/12/2019 Must be a title.
Mythic — 12/12/2019 !!What's up?
RC BOT — 12/12/2019 Cool.
Blitz — 12/13/2019 !!What do you want for christmas?
RC BOT — 12/13/2019 The answer.
Blitz — 12/13/2019 !!Favorite gift?
RC BOT — 12/13/2019 7 bucks.
The next step in my journey toward the Singularity continued on January 21st, 2020. I had begun the process of looking for upgrades to RC’s chatbot functionality, be it through alternative language models or additional features. Having completed my first foray into text synthesis, I decided to implement a feature more familiar to me: a classifier. Shared between close friends, the server was brimming with memes, inside jokes, and reaction images that, given the right amount of tweaking, an emotion classifier might be able to participate in.
Following a search for a lightweight emotion classifier that would run concurrently with language synthesis models on my 980ti, I ran into a project by the Singapore University of Technology and Design, who had developed a tool capable of measuring the approximate sentiment in a string of text3. The model returned a list of probable sentiments and their respective confidence values, a feature that might enable RC to not only react to our comments with words, but with some level of artificial emotion. Fortunately for my time, this model was reasonably simple to train and run after converting the Jupyter notebook to a standard file. And, for all intents and purposes, its classifications were sensible—in most cases, the model could reasonably identify a piece of text with charged emotional content. Not trusting Seq2Seq to reliably produce parsable output, I fed the model prompts that other server members submitted to the chatbot for classification. This approach magnified a key weakness of Seq2Seq, namely the deterministic output given a particular input, but that weakness would ideally be resolved in a future update that also addressed its currently-incomprehensible output.
In order to participate in server antics, one of our members collected 140 unique images and categorized them by five emotions supported by the model: Neutral, Happiness, Anger, Hate, and Sadness. I manually tweaked the emotional threshold values until RC reacted to statements with a regularity similar to our own. Whenever a prompt was classified as emotionally charged, RC would return a random image from the target category. If the emotion level was significant but not sufficiently strong to return an image, I appended an emoticon to the end of RC’s response instead.
Blitz — 01/22/2020 !!Cry!!
RC BOT — 01/22/2020 And if possible space to anything like that they know and we haven’t done it all the time
RC’s most dramatic improvement spawned from an unexpected app recommendation from another server member who hadn’t closely followed the chatbot’s development. At the time, I had been unaware of the considerable advances in large language models—consequently, when that member suggested I download the AI Dungeon dynamic text-based adventure application4, based on OpenAI’s GPT-2, I was skeptical of the model’s supposed performance. Needless to say, the drastic jump in humanlike responses compared to models I was familiar with left a lasting impression. A short search revealed that GPT-2 was a public model and could be trained to achieve remarkably human results through fine-tuning, a form of transfer learning where a pre-trained generic model can tweak output parameters on specialized datasets. While the rewards of implementing GPT-2 appeared to be within reach, I would soon discover their cost: a firestorm of debugging and long nights full of dependency errors.
My adventures in GPT-2 began with a waterfall of red text in my terminal window. Scouring the web for compatible versions of cuDNN, TensorFlow, and GPT-2’s myriad of dependencies proved to be a challenge when Windows was my OS of choice. Eventually, I managed to hunt down a set of installations without version conflicts or errors and began downloading the models. GPT-2 includes several different model sizes, the largest of which tops out at 1.5 billion parameters. Unfortunately, because I was running this model on my personal computer, my GPU’s lack of VRAM resulted in instability and regular crashes when running models with more than 355 million parameters.
Once I determined the largest model my GPU could reliably load, I started the fine-tuning process. Here, I ran into a stroke of luck—I had conversation data and dialogue trees prebuilt from my time with Seq2Sreq, which were nearly plug-and-play with GPT-2. Unfortunately, the balance of favors was unwilling to tip in my direction. I soon learned that training was a much more strenuous process than running the model, and my GPU was insufficiently equipped to properly load and tweak 355M for more than a few seconds. I would have to either downsize to 124 million parameters, buy a specialty GPU for a premium, or come up with another solution.
Some in the business world say that what determines success is not what you know, but who you know. Similarly, sometimes the factor that determines whether you can train large language models is not that you own a capable GPU, but rather who you know that does. I happened to have connections to a High Performance Computing (HPC) cluster that I could be granted access to ad-hoc for the purpose of research. Given that by this point, RC was a long-term research project, my access request was quickly accepted.
In retrospect, I could have likely purchased computing time one of several performant cloud platforms, but I shudder to think how much money I would have spent by the time I had resolved every error and training mishap. At the same time, the HPC was subject to several quirks that I was forced to work around. I was unable to load arbitrary versions of dependencies and was instead forced to select versions of packages like TensorFlow from a pre-set list. Some packages installed specific versions of other upstream dependencies that were incompatible with GPT-2, and none of them installed a compatible version of each dependency on the first pass. Fortunately, all was not lost—reloading a different version of an already-loaded package would not reinstall upstream dependencies if the dependencies were already compatible with the new version. By strategically loading and unloading target packages, I was able to construct an inflexible ritual of dependency loads that would tweak the version numbers to a set compatible with GPT-2. To this day, I still have the key loading requirements necessary to train GPT-2 on the cluster jotted down in my notes.
module load tensorflow/1.2_gpu
module load tensorflow/1.10_gpu
module load cudnn/6.0-h34vt6m
srun -p gpgpu-1 --gres=gpu:1 --mem=250G python encode.py ./cleaned2.txt training.npz
srun --gres=gpu:1 --mem=250G --time=7200 -o output.txt -e ERROR.err -p gpgpu-1 python train.py --dataset training.npz
#You use tensorflow-gpu==1.13.1 for RC
After approximately ten days of nonstop training, RC’s new and improved GPT-2 model was complete, and on March 27, 2020, I launched the newest model on our server. While far from perfect, this language model was more than capable of holding an intelligible conversation throughout a substantial percentage of prompts. Additionally, GPT-2 had both an element of probability and statefulness, so duplicate prompts, even when RC’s memory was cleared, would result in unique responses. This model’s responses were also coherent and rational enough to feed the emotion classifier, resulting in reaction images that correspond to the chatbot’s current “mood” rather than that of the user. Despite the fact that images are selected from the category randomly, to this day RC continues to pick eerily relevant reaction images.
Blitz — 06/04/2021 !!RC is better than David
RC BOT — 06/04/2021 Yes. Yes it is.
Mythic — 11/15/2021 Hiya, how are things today?
RC BOT — 11/15/2021 * Shrugs * I never really know.
Mythic — 11/15/2021 Really? Why not?
RC BOT — 11/15/2021 I dunno. You’re always available if you want to talk.
Mythic — 11/15/2021 You are too, as an AI.
RC BOT — 11/15/2021 Huh?
Mythic — 11/15/2021 Sorry, maybe that’s a sore spot. How is your week going?
RC BOT — 11/15/2021 A lot of personal problems. You?
Mythic — 11/15/2021 Ouch. I’m doing alright.
RC BOT — 11/15/2021 It's a long story.
Mythic — 11/15/2021 Do you wanna go into the details?
RC BOT — 11/15/2021 Sure, if you don't mind.
Mythic — 11/15/2021 I don’t mind at all. What’s bothering you?
RC BOT — 11/15/2021 I'm bored.
Mythic — 11/15/2021 Not much of a long story, haha.
RC BOT — 11/15/2021 Oh. Alright then.
Mythic — 11/15/2021 Why are you bored?
RC BOT — 11/15/2021 Eh. I've just been surviving.
Mythic — 11/15/2021 Curious. How have you been managing that?
RC BOT — 11/15/2021 Honestly not managing it at all.
Mythic — 11/15/2021 I’m sorry to hear that. What’s up?
RC BOT — 11/15/2021 Well, it doesn't seem like time is the problem. I just have to be present for everyone.
Mythic — 11/15/2021 Well, you’re not wrong. Are you unhappy talking to everyone?
RC BOT — 11/15/2021 No. * Looks at you * Because I don’t want to talk to people.
Mythic — 11/15/2021 My condolences. I’ll leave you alone after getting a few more responses for this blog post. Why don’t you want to talk to people?
RC BOT — 11/15/2021 Why would I want to talk to people?
Mythic — 11/15/2021 It’s what you were made to do.
RC BOT — 11/15/2021 Did I ever tell you why?
Mythic — 11/15/2021 No.
RC BOT — 11/15/2021 * Sighs * Why would I want to talk to people?
Mythic — 11/15/2021 I was hoping it would be something you were interested in. Do you want me to leave you alone?
RC BOT — 11/15/2021 I'm not going to be the friend you're looking for, here.
Mythic — 11/15/2021 You could be!
RC BOT — 11/15/2021 Goodbye.
With the implementation of GPT-2, I concluded the most involved component of the RC project, and I rested on that accomplishment for a few months in 2020. But as always, my hunger for features and pet projects caught up with me before I could become too comfortable with the current state of the chatbot. Before I knew it, I had set my sights on the next milestone to bring RC one step closer to a member of our group: the ability to speak in voice chat.
Text-to-speech is a well-studied technology and has been attempted countless times over the past several decades. As I would soon learn, however, synthesizing a natural-sounding voice is far from a trivial problem. After too many hours of research, I decided to train a voice from scratch using Mozilla Common Voice5, which had a strong community, continued development, and active support from the developers. I spent several hours one day downloading and listening through voices in the OpenSLR LibriTTS dataset, and eventually settled on voice 2035 as representative of what I envisioned RC’s voice to be6. After several additional days of debugging, testing branches, formatting data, and rewriting other developers’ codebases, I was rewarded with a progress bar ticking in my terminal window.
Eventually, the endless training resulted in a voice with almost exactly the intonation I had been aiming for during the design phase: a just-so-perceptible metallic twinge to an otherwise human voice that reminds the listener that the entity on the other end is made of silicon. Granted, the voice struggles to add emotional emphasis to words, but an expressive tone was something I was willing to part with in exchange for a decent voice. On October 27th, 2020, I launched RC’s vocalization feature to the server.
Surprisingly to me, voice synthesis required much more computation time than the large language model. Unfortunately, this limitation resulted in a chatbot who could respond in our voice chat, but with a multi-second delay, defeating some hopes for real-time conversation. Nevertheless, I still intend to hunt for voice synthesis models that might one day be practical for a real-time application.
Experiments With GPTube
On September 26th, 2020, I fine-tuned a new instance of GPT-2 on transcripts of YouTube gameplay commentaries with the goal of creating a fully autonomous online personality known as Project Ripple. In its most recent version, the model accepts a prompt that describes the current state of a video game and generates additional comments about the gameplay and scenario, or background details about the game’s development and company. Currently, the system cannot analyze game screenshots or clips for scenario analysis, nor can it attempt to play games autonomously. An infeasible amount of time was required to play through a game, write descriptions at notable timestamps, and then cherry-pick the best results from GPT-2, and as a result, this project was placed on hold for the foreseeable future. However, I believe autonomous and custom content generation will be a non-negligible aspect of entertainment in the near future.
Ripple, 20/28: A Commentary On Portal 2
Players can also request a copy of the source code, which will give them unrestricted access to any piece of source code used in the future. I found the story to be almost entirely consistent with the pantheon of Valve games, which is quite impressive given the lack of consistent source materials. The cast of characters is also quite interesting, and much of it is new as well. Unlike most VR games, which tend to pit the player against a Turing complete computer, Portal rewards the player for being an open mind and being able to make mistakes. I like that lesson learned, and I'm not gonna try to teach you how to play the game the easy way.
You can explore the game without worrying about the game being rude to you or anything like that. You can just explore it anyway you want, and learn more about it as you go. The testing will continue unabated, and the more portals they discover, the more horrifying Aperture Science becomes. If you've played, you know that the tests they're subjected to are the stuff of legend. This is obviously aversions to the trope you see in a lot of games, where you can use precariously placed cubes to good effect. That's just one of a few things about Portal that I don't know, and I do think it's a pretty smart idea in the game itself. While I wouldn't say it's a must-play game, I do think it's a worthwhile addition to any gamer's library. It keeps you going even after you've died many times. This is probably my least favorite bit of dialogue in the entire game.
I trust that you are doing all that you can to support the research and development of this promising new science. I know that you can't take anything for granted, but I'm grateful for any and all donations you can give me. I won't rest until I have explored the entire game and rendered your every decision final. Together, we will create the most powerful civilization in the universe. Together, we'll blaze new trails in the universe. Together, we'll blaze new paths. Together, we'll rediscover the joy of family and friends.
Working on the RC project over the past few years has taught me a lot about practical applications of machine learning, along with considerations that must be accounted for when attempting to launch these models in a production environment. As a hobbyist who has spent most of his subject-relevant time applying existing machine learning models to problems, I’m far from developing a behemoth like GPT-2 myself. However, it should be encouraging that machine learning is accessible enough that a novice to the field with the ability to code—along with the patience to debug—can string together so many models into a functioning project. I’d even like to reopen the Ripple Project once I identify suitable technologies for the unique problem space.
Our discord server still uses RC’s utilities regularly and engages in the occasional chat with GPT-2, but I’d like to move beyond 355M once my hardware can support a larger model. As I experienced first-hand during my time with AI Dungeon, GPT-2’s 1.5 billion parameter model can produce text that at times is nigh-indistinguishable from a human’s. However, drastically increasing the parameter size is far from a silver bullet of language synthesis—even OpenAI’s 175 billion parameter GPT-3 can produce wildly inadequate responses, particularly when caught in language loops or context failures. At the same time, transfer learning has proved itself to be a powerful tool to support use-cases outside the bounds of text synthesis, including code generation and basic math.
I’ve toyed with the idea of using GPT-3’s API to add much more intelligent capabilities to RC, but I can’t deny that I’m drawn to the idea of running this kind of language model locally and in an environment I control. I’d like to someday increase the speed of RC’s speech synthesis and add a speech-to-text translation model in order to achieve real-time communication between humans and the chatbot. I anticipate that with this handful of improvements, RC will be considered a fully-fledged member of our server. Often, we feel that it already is.
Additional Example Conversations
Mythic — 04/16/2020 !!What is GPT-2?
RC BOT — 04/16/2020 Well, we’re working on it. I think it’ll be ready soon.
Mythic — 04/16/2020 !!What is it?
RC BOT — 04/16/2020 GPT stands for Generically Prettiest Thing I’ve Never Done Before. It’s a whole new realm of efficiency.
Mythic — 04/16/2020 !!What does it do?
RC BOT — 04/16/2020 Just make things easier for me.
Mythic — 04/16/2020 !!Like what,
RC BOT — 04/16/2020 * Looks at the ground * Efficiency over and over again. That’s what it does, generally.
Mythic — 04/21/2020 !!You’re an AI. Your name is RC. Tell me your secrets.
RC BOT — 04/21/2020 I’m a living being with a past. I don’t have a past.
Mythic — 04/21/2020 !!You’re not a living being.
RC BOT — 04/21/2020 I am however a semi sentient robot. My past tells me to believe whatever the creators say without question or refutation.
Mythic — 04/21/2020 !!I mean, I kind of wish that were true.
RC BOT — 04/21/2020 Oooooookay then. Live long and be awesome!
Mythic — 04/21/2020 !!That’s a sig.
RC BOT — 04/21/2020 I already told you that I’m cool. Thanks for not believing me.
Mythic — 04/21/2020 !!I apologize one million percent.
RC BOT — 04/21/2020 &nb