OpenAI developed ChatGPT, a state-of-the-art language model. It is a machine-learning algorithm that processes and generates text. ChatGPT is part of the latest generation of Generative Pretrained Transformer (GPT) language models and uses the Transformer model. Unlike its predecessors, ChatGPT fine-tuned itself using reinforcement learning, making it capable of making accurate and ethical statements and following instructions.
The GPT language model generates text using statistical probability. It processes a large text corpus and calculates the likelihood of each word pair appearing together. This creates short passages that sound natural but may not be grammatically correct or make sense. The model assigns a point in a semantic space to each word to represent its meaning. Word embedding techniques, such as Word2Vec, sort words into the semantic space by estimating their relationship to other words in the text.
The attention mechanism is a crucial aspect of the Transformer model used in ChatGPT and other language models. It adds two vectors, query and key, to each word in a text and its meaning vector. These vectors are learned through training on large datasets and determine the strongest relationships between words in the text. The attention mechanism maps the source text from a simple semantic space into a new context space. Each dot represents a refined concept resulting from a word in the context of its most relevant neighbouring words.
ChatGPT applies the attention mechanism multiple times - up to 96 times - to the mapped text, allowing for the creation of more complex and abstract concepts. The ultimate goal is to predict the next word in a text by transforming each word into its successor in the semantic space. The GPT-3 model trained on approximately 400 billion words of text from sources like Wikipedia and the Common Crawl dataset. The model adjusts its parameters with each incorrect prediction, leading to improved predictions over time.
Despite its impressive deep learning process, GPT-3 has limitations. It often relies on the highest probability predictions, which can result in absurd or problematic answers. Additionally, its training data contains biases and offensive statements, which the generated text can reflect. There is also a lack of critical examination of generated text content, leading to ethical concerns.
OpenAI aimed to overcome GPT's limitations by incorporating human AI teachers in the latest version. However, due to cost limitations, they generated only a limited number of examples. To address this, a group of around 40 employees created over 10,000 examples of ideal performance for various tasks, which they then used to fine-tune the model. OpenAI trained a separate model to evaluate text quality, and GPT continued training itself based on this self-evaluation.
Reinforcement learning involves an agent making decisions that affect its state, leading to either rewards or punishments. The agent bases its decisions on a policy, and the goal is continually improving the policy through the learning process. In the case of GPT-3, each decision involves generating a new word, and the reward is the language model's assessment of the task's quality. OpenAI initially used a fine-tuned language model as a policy, and the reinforcement learning algorithm refined the model to enhance the probability of producing high-quality answers. The combination of manual training and reinforcement learning allowed the language model to differentiate between good and bad text and improve its writing quality. The success of ChatGPT has sparked interest in language models for commercial use, with further investments expected in the future.