🧑‍🏫 AI Training Tuesday: The history of the GPT in ChatGPT

OpenAI created ChatGPT using concepts that scientists, researchers, and philosophers have developed over 70+ years.

🧑‍🏫 AI Training Tuesday: The history of the GPT in ChatGPT

ChatGPT was released on November 30, 2022, and became a household name. OpenAI, the company behind the product, named its chatbot ChatGPT because GPT, or Generative Pre-trained Transformers, are the product's building blocks. So what is a GPT, what makes it unique, and do you need GPTs for your business?

A brief history lesson

An image that shows the history of AI over the years.
A brief history of AI, designed by Bill Raymond, multiple sources

Introducing: AI and neural networks

Artificial intelligence (AI) and neural networks were popularized in the 1950s. Without getting too technical, the idea was (and still is) to create algorithms that model how we assume a biological brain works. However, the lack of computing power, digitized training data, and improved algorithms paused significant investments in AI.

🧒🏽
Let's pretend the neural network is a child, and you ask it to build a tower with building blocks for the first time. With the original neural networks, the child may not know how to properly balance the blocks or create a solid base. Scientists could observe those failures and successes, but the neural networks did not have advanced learning to adjust regularly to improve.

Introducing: Machine Learning

Building computers for the masses was relatively inexpensive in the 1980s. While the term machine learning (ML) was coined in the late 1950s, it became part of our lexicon in the 1980s because of new algorithms that enhanced neural networks with the concept of backpropagation. These new algorithms allowed the computer to detect errors, iterate, and improve.

👍🏼
Back to our child-computer building a tower with blocks. With machine learning, the child builds towers based on explicit rules and techniques you provide. The focus is on teaching specific skills and techniques.

Introducing: Deep learning

By the 2000s, computers were popular in homes and offices, where nearly all our data was stored online. With all this so-called "big data" stored on servers, we could build more advanced AI that better understands the world around it. We also received an added dose of AI capability with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to help computers learn complex patterns and sequences. Also, open-source applications made AI available to the masses, which spurred even more innovation.

Google famously debuted its self-driving car program, which used technologies, including computer vision, to enable the car to "see" its surroundings and make split-second safety decisions. These same techniques were also used to kickstart the concept of using AI for drug discovery and in many other fields.

💡
Our child-computer is getting much brighter now. The child learns to build towers from toy blocks through extensive experimentation and experience, discovering complex techniques independently and improving over time.

Introducing: GPTs and LLMs

Machine learning and deep learning enable new capabilities to detect and learn more about patterns, images, videos, text, and so much more. But in many ways, these are hidden. People want to use this technology by communicating with text, voice, or other human-relatable interfaces. A computer cannot do that without a vast knowledge of data on diverse sets of topics. A Google blog post from Jakob Uszkoreit shares research on transformers.

OpenAI took those (and many more) concepts and created ChatGPT. They built a large language model (LLM) by training on vast amounts of digitized text data from the internet. This LLM was used to pre-train a transformer to understand and generate human-like language responses. OpenAI refers to this technology as a generative pre-trained transformer (GPT).

🧠
Our child-computer is all grown up! Now, it can quickly build a complex and stable tower by leveraging extensive pre-learned knowledge from observing countless examples. This allows the child to adapt and create sophisticated designs with minimal trial and error. The child can even create new and previously unseen towers that are unique designs, not necessarily bound to a particular block shape.

Do I need to build my own GPT for my business?

There are many service offerings for chat-based applications on the market today, such as:

  • ChatGPT from OpenAI
  • Claude from Anthropic
  • Gemini from Google
  • Co-pilot from Microsoft (which is a licensed version of ChatGPT)
  • Le Chat by Mistral AI

There are also popular open-source models, some of which may have licenses. This offering is intriguing because you can download a so-called model, use it in-house, and, depending on the licensing model for the open-source model, use it at no cost. I even have a few models on my computer to chat with them when brainstorming, summarizing significant texts, or writing some occasional software code. Here are some of the more popular models:

  • LLaMA from Meta (the Facebook people)
  • phi3 from Microsoft
  • Mistral by Mistral AI

All the products I list here are so-called foundational models, which tend to be generalized for most use cases. Other models may be trained and fine-tuned for specific use cases, such as analyzing climate change, biotech research, etc. If there is an industry where you think an AI model should exist, it probably does.

Unless you want to compete directly with these companies, you will probably want to use them and not build your own. Read on as I share how you can augment the capabilities to support your specific use cases.

Why do I need something like ChatGPT?

If you have never used ChatGPT (or a similar solution), I recommend you do this:

Stop using Google for three days and ask ChatGPT the question instead. You will start to appreciate how you interact with a GPT, and I bet you will find using the product indispensable. To be clear, ChatGPT may not replace a search engine, but this process of using ChatGPT first will give you a sense of how to use it.

I speak with a lot of people who use ChatGPT and am learning about all sorts of unique use cases, including:

  • Creating first-draft legal documents (think NDAs, lawsuits, contracts)
  • Creating social media posts
  • Summarizing large amounts of content to make it easier to consume
  • Brainstorming on a new product or service
  • Using the GPT as a business or personal coach
  • Software development
  • Analyze sales data to understand your customers better

Augmenting GPTs

One of the more significant promises of large language models is the promise of adding natural language processing (NLP) to your corporate infrastructure or to an app you are building. For example, you could ask:

  • "Who are my top three customers and why?"
  • "When are our corporate holidays?"
  • "I forgot my password. Can you help me reset it?"
  • "I need to book a flight. Can you help me?"

Having that natural conversation to avoid looking everywhere for the right content is fast becoming the next big frontier.

Did you know that most of these companies offer application programming interfaces (APIs)? These allow you to create your app, whether on a website, something a person installs on their computer, or via an app store on the phone. Think about something in your line of business and if an app that allows for natural language processing is something your customers would use. If you can think of something, you can probably create it!

An image of a person booking a flight using ChatGPT and travel APIs to get a natural language response.
ChatGPT can bring natural language to an app, like this example of a trial app.

Summary

I hope this article helped you learn how we got here today with ChatGPT and other large language models.

Sharing is caring

If you like this newsletter, please share it with your colleagues, friends, and family. That simple act will help me continue to share more content with you.

If you have any ideas for new topics, email ideas@billtalksai.com.