What are transformer models?

transformer models
(Image credit: NPowell/GPTImage1)

Transformers are a type of neural network architecture that was first developed by Google in its DeepMind laboratories.

The tech was introduced to the world in a 2017 white paper called 'Attention is all you need'.

It's fair to say that this white paper marked a pivotal point in the development of modern day AI, and is the basis of all current mainstream artificial intelligence systems that we use today.

With the arrival of transformer technology, AI researchers suddenly had an efficient way to convert text to tokens, which can be used by neural networks to process and deliver output at speed.

This major breakthrough was a key milestone in the drive to make AI a viable tool for general purpose applications.

A key difference in the new technology

What made transformers so important was the fact that for the first time instructions could be handled in parallel, making the process significantly faster than before.

The previous technology, recurrent neural networks (RNN), operated sequentially, as a result of which there was unavoidable decay over long interactions.

The new transformer technology, which was born out of Google's work on improving machine translation for its search engine, radically transformed not just the speed of processing, but also the ability of the neural network to handle large context sizes of data.

transformer models

(Image credit: Pixabay)

Today transformers are at the heart of almost all AI systems. The most famous example of this is the generative pre-trained transformer (GPT) which is at the core of OpenAI's ChatGPT and other popular AI products.

These generative models are in use all around the world, doing such things as generating art, video and music, analyzing medical data, and any AI application which requires high speed processing and accuracy.

The introduction of transformers genuinely revolutionized the artificial intelligence sector.

How transformers work

Standard transformer architecture consists of three main components - the encoder, the decoder and the attention mechanism.

The encoder processes input data to generate a series of tokens, while the decoder takes these representations and generates the output.

An all-important attention mechanism allows the model to weigh the importance of different parts of the input data dynamically, focusing on just the relevant information it needs to complete the task at hand. In effect the attention algorithm manages the whole process to ensure maximum throughput without degradation.

Transformers also radically improved the versatility of AI models, by allowing for a greater variety of inputs and outputs as needed.

Prior to the GPT based architecture, the best performing neural networks neural network models needed supervised learning, which involved large amounts of labeled data. This made it extremely slow and expensive to train large language models.

The OpenAI effect

OpenAI

(Image credit: NPowell/Future)

OpenAI introduced semi-supervised learning for its GPT models, which meant that they could be trained in two stages - with unsupervised pre-training, followed up by fine-tuning for particular needs.

This dramatically improved the speed and versatility of model training, and also increased the power and utility of the resulting AI models themselves.

Transformer models have now evolved beyond the original language processing tasks, and are now powering the AI in robotics, multi-modal functionality, and compute intensive tasks such as video generation.

In fact, derivatives of transformer based AI models are now embedded in almost all aspects of life and business around the world.

One of the big challenges of this ubiquity is the fact that these models require a huge amount of computing and energy resources.

Researchers have now turned their attention onto ways of optimizing the technology so as to reduce this burden. How well they succeed will be a critical factor in ensuring the continued spread of artificial intelligence tools around the world..

Nigel Powell
Tech Journalist

Nigel Powell is an author, columnist, and consultant with over 30 years of experience in the tech industry. He produced the weekly Don't Panic technology column in the Sunday Times newspaper for 16 years and is the author of the Sunday Times book of Computer Answers, published by Harper Collins. He has been a technology pundit on Sky Television's Global Village program and a regular contributor to BBC Radio Five's Men's Hour. He's an expert in all things software, security, privacy, mobile, AI, and tech innovation.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.