Understanding Generative AI: What Exactly Is It?

If artificial intelligence is simply classified according to its purpose, it can actually be divided into two categories: decision-making AI and generative AI. In this post, I shall discuss the basic principles of understanding AI.

Table of Contents

How Can AI Be Categorized?

Decision-making AI: Focuses on analyzing situations and making decisions. It helps users or systems choose the best course of action by evaluating multiple options and possible outcomes.

For example, in self-driving vehicles, decision-making AI systems decide when to accelerate, decelerate, or change lanes.

Generative AI: Focuses on creating new content. Based on learned data, it can automatically generate text, images, music, and other content.

For example, you can send several papers to generative AI, and it can generate a literature review that covers the key ideas and important conclusions of these papers.

After reading this, you will know why ChatGPT and Baidu’s AI Chatbot belong to generative AI, right?

Next, let us officially enter the world of generative AI.

The past and present of generative AI

In fact, generative AI has not just been born in recent years. It has gone through three stages:

1. Early budding stage

In 1950, Alan Turing proposed the famous “Turing Test,” which was a milestone in the field of generative AI and foreshadowed the possibility of AI content generation.
In 1957, Lejaren Hiller and Leonard Isaacson completed the first musical work in history completely composed by computer, “Illiac Suite.“
Between 1964 and 1966, Joseph Weizenbaum developed the world’s first conversational robot, “Eliza,” which completed interactive tasks through keyword scanning and reorganization.
In the 1980s, IBM created the voice-controlled typewriter “Tangora” based on the invisible Markov chain model.

2. Sedimentation and Accumulation Stage

With the Internet’s development, the data scale has expanded rapidly, providing massive training data for artificial intelligence algorithms. However, due to the limited hardware foundation, development was not rapid at this time.

In 2007, Ross Goodwin, an artificial intelligence researcher at New York University, created an artificial intelligence system that wrote the novel “1 The Road,” the world’s first novel created entirely by artificial intelligence.
In 2012, Microsoft publicly demonstrated a fully automatic simultaneous interpretation system that can automatically generate Chinese speech from the content of English speakers through speech recognition, language translation, speech synthesis, and other technologies.

3. Rapid Development Stage

Since 2014, the introduction and iterative updates of many deep learning methods have marked a new era of generative AI.

In 2017, Microsoft launched the world’s first poetry collection, “The Sun Lost the Glass Window,” which was 100% created by artificial intelligence.
In 2019, the Google DeepMind team released the DVD-GAN architecture for generating continuous videos.
In 2020, OpenAI released ChatGPT3, marking an important milestone in the fields of natural language processing (NLP) and AIGC.
In 2021, OpenAI launched DALL-E, which is mainly used for interactive content generation between text and images.
Since 2022, OpenAI has released the new ChatGPT model many times, setting off another climax of AIGC. It can understand and generate natural language and have complex conversations with humans.

Understanding-Generative-AI-Rapid-Development-Stage

Since then, generative AI has reached a spurt state. So, what principles are generative AI based on?

Understanding Generative AI: What Are The Principles?

From the introduction just now, everyone should have a superficial understanding of generative AI: learning knowledge + generating new knowledge.

But how Does AI Learn And Generate Knowledge?

At this point, we have to look at a deeper definition of generative AI:

Definition: Generative AI represented by ChatGPT is a vectorized induction of existing data and knowledge, summarizing the joint probability of data. Therefore, new content can be generated based on user needs and the probability of related words.

Are you confused all of a sudden? Don’t worry. This touches upon the principles of generative AI. I will simplify it.

In fact, making a generative AI is like turning a clay figure into a genius. It requires four steps: shaping the clay figure → installing a brain → feeding knowledge → producing output.

1. Clay Figurines: Building The Hardware Architecture

To build a generative AI “clay man,” the first thing to consider is the underlying hardware. The underlying hardware constitutes the computing power and storage capacity of generative AI.

Computing power — the skeleton of a clay figurine

Generative AI requires a lot of computation, especially when processing images and videos. Large-scale computing tasks cannot be separated from the following key hardware:

GPU (Graphics Processing Unit): This type of computer provides powerful parallel computing capabilities. Thousands of small processing units work in parallel, greatly improving computing efficiency.
TPU (Tensor Processing Unit): Hardware specifically designed to accelerate AI learning, which can significantly speed up calculations and further enhance the framework’s strength.

The power of preservation: the blood of clay figures

Generative AI requires processing and storing large amounts of data. Taking GPT-3 as an example, the training parameters alone reached 175 billion, the training data reached 45TB, and 4.5 billion words of content were generated daily.

The storage of these data is inseparable from the following hardware facilities:

Large-capacity RAM: Many intermediate calculation results and model parameters must be stored in memory when training generative AI models. Large-capacity RAM can significantly improve data processing speed.
SSD (Solid State Drive): A large-capacity SSD has high-speed read and write capabilities, which can quickly load and save data, allowing the clay figurine to store information efficiently.

The clay figure is ready, but now it is just a puppet without any abilities, so we have to equip it with a brain.

2. Installing The Brain – Software Architecture Construction

Software architecture is the brain of the clay figure, determining how the clay figure will think and reason about data. From the perspective of bionics, humans hope that AI can imitate the operating mechanism of the human brain and think and reason about knowledge—this is commonly referred to as deep learning.

To achieve deep learning, scholars have proposed a large number of neural network architectures:

Deep neural networks (DNNs) are the most common neural network architecture, but as data requirements for network architectures become increasingly complex, this approach has become increasingly difficult.
A convolutional neural network (CNN) is an architecture designed specifically for processing image data. It can effectively process image data but requires complex preprocessing of the input data.
As tasks become more complex, the recurrent neural network (RNN) architecture becomes a common method for processing sequence data.
The famous Transformer algorithm was proposed because RNN is prone to gradient vanishing and model degradation problems when processing long sequences.

With the development of computing power, the network architecture of generative AI has become more and more mature, and has begun to focus on different aspects:

Transformer architecture: It is the mainstream architecture in the current text generation field. GPT, llama2, and other LLMs (large language models) all achieve excellent performance based on Transformer.
GANs architecture: It is widely used in image generation, video generation, and other fields, and it can generate high-quality image and video content.
Diffusion architecture: It has achieved excellent results in areas such as image generation and audio generation and can generate high-quality and diverse content.

The network architecture is built, and the brain is there, but the brain is empty. So, we feed this artificial brain with knowledge through data training.

3. Feeding Knowledge – Data Training

There are currently two training methods: pre-training and SFT (supervised fine-tuning)

Pre-training: this refers to feeding a large, general data set as knowledge to AI for initial learning. The pre-trained model is called a “base model”. It has some understanding of each field but cannot become an expert in a certain field.
SFT: It means feeding a task-specific dataset to AI after pre-training to further train the model.

For example, a pre-trained language model is fine-tuned using specialized medical texts to handle medical-related question-answering or text generation tasks.

How Does AI Understand Knowledge?

But whether it is pre-training or SFT, how does the AI’s brain absorb this knowledge?

This involves the ability to “understand”. Let’s take the Transformer architecture as an example of AI’s understanding of text.

For AI, understanding is divided into two steps: understanding words and understanding sentences.

The essence of understanding words is to classify them. Researchers have proposed a method to classify words by breaking them down into different dimensions.

Suppose there are four words: watermelon, strawberry, tomato, and cherry. AI breaks down these words in two dimensions:

Color dimension: Use 1 to represent red and 2 to represent green.
Shape dimension: Use 1 for a circle and 2 for an ellipse.

Based on this dimension, AI scores and categorizes words as follows:

Watermelon : Color = 2 (green), Shape = 1 (round)
Strawberry: Color = 1 (red), Shape = 2 (oval)
Tomato : Color = 1 (red), Shape = 1 (round)
Cherry : Color = 1 (red), Shape = 1 (round)

Through these scores, we can see the classification of words in different dimensions.

For example, “tomato” and “cherry” are the same in color and shape dimensions, indicating that they have the same meaning in these two dimensions; “strawberry” and “watermelon” are different in color and shape dimensions, indicating that they have different meanings in these two dimensions.

Of course, there are more than just two dimensions to distinguish them. AI can also score them based on many dimensions, such as size, sweetness, whether they have seeds, etc., and then classify them.

As long as there are enough dimensions and the scoring is accurate enough, the AI model can understand the meaning of a word more accurately.

How Does AI Learn A Collection of Words?

The number of dimensions can usually reach thousands for the more advanced AI models currently available.

Learning words and understanding them as quantitative results is only the first step. Next, AI needs to further understand a collection of words: sentences.

Even the same word can have different meanings in different sentences.

For example:

The image below is that of a green hat.
Company X is committed to building a green computer room.

The word “green” has different meanings in different sentences. How does AI know that they have different meanings?

This is due to the ” Self Attention” mechanism of the transformer architecture. Simply put, when AI understands a sentence containing a group of words and understands the words themselves, it also “looks at” the words around them. The correlation between a single word and other words in a sentence is called “attention”, and because it is understood in conjunction with the words in the same sentence itself, it is called “self-attention”.

Therefore, in the Transformer architecture, it can be divided into the following two steps:

Convert each word into a vector. This vector represents the position of the word in the multidimensional space and reflects the various characteristics of the word.
Use a self-attention mechanism to focus on different parts of the sentence. It can consider the information of other words in the sentence while processing each word.

4. Output – Content Generation

After understanding a large number of words and sentences, AI can generate content. How does it generate content? This is a question of probability.

AI Asks Everyone a Question:

I eat in the restaurant.
x Fill in a word; what would you fill in?
Based on your past experience, you will most likely fill in “rice.”
× can also mean “pancake”, “noodles”, “egg”, etc.

Like humans, the generative AI will assign probabilities to these words based on what it learned in step 3. It will then select the words with the highest probability as the generated content. The AI will then repeat the process, selecting the next most likely word to generate more content.

AI Temperature Ranging

But sometimes, we want the answers to be rich and varied. Let’s go back to the example above. We don’t want the next word AI will answer to be “rice.” What should we do?

AI provides a tuning parameter called temperature, ranging from 0 to 1.

When the temperature is 0, the matching probability should be as large as possible. In the above example, AI is likely to choose “rice”;
When the temperature is 1, the matching probability should be as small as possible. In the above example, AI is likely to choose “cake.”

The closer the value is to 1, the more imaginative the content will be. For example, if the temperature is set to 0.8, the sentence generated by the AI may be:

I was eating a pancake in a restaurant. This pancake was so big and round that I wanted to put it around my neck.

How Does AI Modify Parameters?

However, we see that most AI products only have one dialog box. How do we modify the temperature parameters? The answer is “prompt word,” which we usually call prompt. See the following illustration:

If your input is “You are an expert in a certain field. Please write a literature review about xx in a rigorous tone.” At this time, the AI temperature is close to 0, and it will choose words with the highest matching probability to generate sentences.
If you input “Please imagine the future of xx.” At this time, the AI temperature is close to 1, and it will choose words with the lowest matching probability to form a sentence and generate unexpected content.

Now you know the importance of a prompt! Therefore, we can think that the essence of AI generation is a word chain. AI selects the next word based on the current word, the probability of occurrence of the next word it has previously recorded, and your expectations.

Of course, the internal principles of generative AI are far more complicated than what I have explained, so what I have explained here can only be regarded as a basic popular science introduction.

Where is “Generative AI” Going?

So, will generative AI achieve general artificial intelligence and replace humans? Currently, there are two views:

Optimists: Optimists, led by OpenAI CEO Altman and Nvidia CEO Jensen Huang, are very optimistic about the future of generative AI. They have said that “in a few years, artificial intelligence will be more powerful and mature than it is now; and in another ten years, it will surely shine,” and “AI may surpass human intelligence within five years .”
Negative faction: The negative faction, led by the deep learning pioneer LeCun, has always believed that generative AI cannot lead to artificial intelligence. He has repeatedly said that “large language models like ChatGPT will never reach the level of human intelligence” and “artificial intelligence trained by humans will find it difficult to surpass humans.”