How ChatGPT Works (In Plain English)
Learning how ChatGPT works will help you be ahead of most ChatGPT users.
Since ChatGPT was launched back on November 30, 2022, we heard multiple times words like LLMs, GPT, neural networks, and more.
While it’s not necessary to understand such concepts to use ChatGPT, if you learn the technical stuff behind ChatGPT, you’ll improve how you work with it when creating prompts, understanding its responses, and more.
Here is how ChatGPT works and some key concepts you need to know explained in plain English.
The foundation of ChatGPT: Large Language Models
Large Language Models (LLMs) are a type of AI that can mimic human intelligence. They’re able to generate human-like text based on information they’ve learned from massive amounts of training data.
Training LLMs mean feeding data such as articles and web pages so that they can learn pattern from the text and infer relationships between words. A simple thing LLMs can do is word prediction, which involves completing words or phrases based on the input provided by a user. Below is an example of next-token prediction.
Tim is a ___
Tim is a dad
Tim is a friend
Tim is a doctor
Tim is an engineer
Simple, right?
But an LLM can do things way more complex than predicting the next token. It can generate a poem using the style of a specific poet or a song based on the patterns learned after analyzing thousands of songs. The more data an LLM is trained on, the better it will be at generating new content.
Once an LLM is pre-trained, it can be fine-tuned for specific tasks. Overall LLMs have a wide variety of applications such as chatbots and virtual agents. However, LLMs are not flawless machines. Actually, they have limitations. The biggest challenge LLMs face is probably generating content that is accurate and reliable. This can be due to limited training data, bias in the training data, ambiguity, and more.
A brief history of the GPT models (and how they’re different from ChatGPT)
To understand what ChatGPT is, we need to know about the GPT models. After all, ChatGPT is a variant of the Generative Pre-trained Transformer model (GPT model) developed by OpenAI.
The first GPT model was launched back in 2018 as GPT-1, which had 110 million parameters and showcased the capabilities of Transformer architectures for language understanding. The GPT model continued to evolve in 2019 with GPT-2 and in 2020 with GPT-3. This represented a notorious advancement in the GPT models from 1.5 billion parameters and 50 GB of text training data to 176 billion parameters and 570 GB of training data.
That said, although ChatGPT and GPT-3 are based on the GPT model, they serve different purposes. GPT-3 is trained on internet text to perform various Natural Language Processing (NLP) tasks such as translation and summarization. GPT-3 can be fine-tuned for specific tasks, but it’s not specialized by default. On the other hand, ChatGPT has been fine-tuned specifically for conversational tasks. ChatGPT was specialized for conversations after being trained on a dataset that consists of dialogues or conversation-like text. This makes ChatGPT better at generating conversational responses.
In a nutshell, GPT-3 is a general-purpose model, while ChatGPT is specialized for conversation. Currently, ChatGPT is powered by GPT-3.5, which is available to everyone, and GPT-4, which is available only for paid users through the ChatGPT Plus subscription.
Here’s how ChatGPT works (according to OpenAI)
In a paper published in 2022, OpenAI describes how InstructGPT (ChatGPT’s predecessor) works. This will help us understand how ChaGPT works behind the scenes. For a simple explanation, let’s consider the following example.
Let’s say we give ChatGPT the following prompt “Explain what is artificial intelligence in a few sentences.” Here’s ChatGPT’s response to this prompt.
Artificial Intelligence (AI) refers to the simulation of human-like intelligence processes by computer systems. It involves creating algorithms and models that enable machines to perform tasks that typically require human intelligence, such as problem-solving, decision-making, learning from data, language understanding, and perception.
How can ChatGPT generate this response?
One of the things ChatGPT does to generate this response is word prediction. ChatGPT predicts what words and sentences are likely to be associated with the input given and then chooses those words and sentences most likely to be associated with the input. ChatGPT also randomizes outputs, so you might get different answers for the same input. You can try asking the same question I asked about artificial intelligence to ChatGPT, but you will probably get a different answer.
All this word prediction is based on its training data. ChatGPT was trained on a huge dataset collected from the internet. This means that every time it generates text, it predicts what words would most likely be expected after having learned how input compares to words written on the millions of pages it was trained on.
However, word (or sentence) prediction isn’t the only thing behind the scenes. There’s more. After all, ChatGPT doesn’t only complete your sentences but has the ability to respond in a conversational way. This is due to the way it was trained.
Here are the 3 stages followed to train ChatGPT.
Stage 1: Collecting data and training a supervised policy
In the first stage, 40 contractors were hired to play the role of users and chatbots. Each training was a conversation where the human contractors acted as the user and chatbot. The purpose of this was to train the model to have human-like conversations. This training data will be fed into the model so that the model learns to maximize the probability of choosing the correct sequence of words and sentences in a conversation. This is why chatting with ChatGPT looks more like chatting with another person rather than working with a tool that completes your sentences.
Here’s a summary of this stage.
Stage 2: Training a reward model
In the second stage, the output of stage 1 is fine-tuned using a reward model. For this, a prompt and some outputs are sampled and a human labeler will rank the output from best to worst.
Say we have the prompt below and the four outputs.
What’s artificial intelligence?
A) AI is the development of computer systems that can perform tasks requiring human-like intelligence, such as language understanding, pattern recognition, and decision-making, through the use of algorithms and data analysis.
B) AI involves creating algorithms and models that enable machines to simulate human cognitive functions like learning, reasoning, and problem-solving, enabling them to perform complex tasks autonomously.
C) AI is a futuristic concept where computers gain consciousness and emotions, allowing them to surpass human abilities and take over the world.
D) AI is a mystical blend of computer code and unicorn tears that grants machines the ability to predict lottery numbers and interpret dreams.
If a human labeler ranks the output from best to worst, they will be ranked as A>B>C>D. This data will be fed to the model so that ChatGPT can learn to evaluate what the best output is likely to be. That’s how OpenAI trained its reward model.
Now, it’s impossible for these 40 human contractors to rank the outputs of all the possible prompts that a user can come up with, so how can ChatGPT is able to answer almost all the queries that we give it?
This leads us to stage 3.
Stage 3: Reinforcement learning
To scale the human-taught training data to a much bigger dataset, reinforcement learning was used. This is a type of unsupervised learning, which consists of training the model where no output is associated with any given input. Instead, the model will learn the patterns in the input data based on pre-trained data. In this case, the pre-trained data will be the human-taught ranking system. This system is used as the foundation for the unsupervised training stage. Thanks to this, the model only needs to process a huge dataset from various sources to learn patterns from text and sentences of a countless number of subjects.
And that’s most of the technicalities behind ChatGPT. That said, there are more things behind the model. For more details, I encourage you to read the paper published by OpenAI.