What is a GPT?

GPT
(Image credit: Freepik)

The introduction of generative pre-trained transformers (GPTs) marked a significant milestone in the adoption and utility of artificial intelligence in the real world.

The technology was created by the then fledgling research lab OpenAI, based on previous research done on transformers in 2017 by Google Labs.

It was Google's white paper "Attention is all you need", which laid the foundation for OpenAI's work on the GPT concept.

Transformers provided AI scientists with an innovative method of taking user input, and converting it to something that could be used by the neural network using an attention mechanism to identify important parts of the data.

This architecture also allows for the information to be processed in parallel rather than sequentially as with traditional neural networks. This provides a huge and critical improvement in speed and efficiency of AI processing.

A short history of GPT

OpenAI's GPT architecture was released in 2018 with GPT-1. By significantly refining Google's transformer ideas, the GPT model demonstrated that large-scale unsupervised learning could produce an extremely capable text generation model which operated at vastly improved speeds.

GPT's also uprated the neural networks' understanding of context which improved accuracy and provided human-like coherence.

Before GPT, AI language models relied on rule-based systems or simpler neural networks like recurrent neural networks (RNNs), which struggled with long-range dependencies and contextual understanding.

The story of the GPT architecture is one of constant incremental improvements ever year since launch. GPT-2 in 2019 introduced a model with 1.5 billion parameters, which started to provide the kind of fluent text responses where AI users are now familiar with.

However it was the introduction of GPT-3 (and subsequently 3.5) in 2020 which was the real game-changer. It featured 175 billion parameters, and suddenly a single AI model could cope with a vast array of applications from creative writing to code generation.

GPT technology created modern AI

GPT

(Image credit: Freepik)

GPT technology went viral in November of 2022 with the launch of ChatGPT. Based on GPT 3.5 and later GPT-4, this astonishing technology instantly propelled AI into public consciousness in a massive way. Unlike previous GPT models, ChatGPT was fine-tuned for conversational interaction.

Suddenly business users and ordinary citizens could use an AI for things like customer service, online tutoring or technical support. So powerful was this idea, that the product attracted a 100 million users in a mere 60 days.

Today GPT is one of the top two AI system architectures in the world (along with Google's Gemini).

Recent improvements have included multimodal capabilities, i.e. the ability to process not just text but also images, video and audio.

OpenAI has also updated the platform to improve pattern recognition and enhance unsupervised learning, as well as adding agentic functionality via semi-autonomous tasks.

On the commercial front, GPT powered applications are now deeply embedded in many different business and industry enterprises.

Salesforce has Einstein GPT to deliver CRM functionality, Microsoft's Copilot is an AI assisted coding tool which incorporates Office suite automation, and there are multiple healthcare AI models which are fine-tuned to provide GPT powered diagnosis, patient interaction and medical research.

The rivals gather

GPT

(Image credit: Freepik)

At the time of writing the only two significant rivals to the GPT architecture are Google's Gemini system and the work being done by DeepSeek, Anthropic's Claude and Meta with its Llama models.

The latter products also use transformers, but in a subtly different way to GPT. Google however is a dark horse in the race, as it's becoming clear that the Gemini platform has the potential to dominate the global AI arena within a few short years.

Despite the competition, OpenAI remains firmly at the top of many leaderboards in terms of AI performance and benchmarks. Its growing range of reasoning models such as o1 and o3, and its superlative image generation product, GPT Image-1 which uses the technology, continue to demonstrate that there is significant life left in the architecture, waiting to be exploited.

Nigel Powell
Tech Journalist

Nigel Powell is an author, columnist, and consultant with over 30 years of experience in the tech industry. He produced the weekly Don't Panic technology column in the Sunday Times newspaper for 16 years and is the author of the Sunday Times book of Computer Answers, published by Harper Collins. He has been a technology pundit on Sky Television's Global Village program and a regular contributor to BBC Radio Five's Men's Hour. He's an expert in all things software, security, privacy, mobile, AI, and tech innovation.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.