Forget ChatGPT - NExT-GPT can read and generate audio and video prompts, taking generative AI to the next level

Robot cameraman shoots motion picture television episode or movie. Funny robotic filmmaker operator with retro camera, director chair and spotlight behind the scenes. Orange background copy sp
(Image credit: Besjunior via Shutterstock)

2023 has felt like a year dedicated to artificial intelligence and its ever-expanding capabilities, but the era of pure text output is already losing steam. The AI scene might be dominated by giants like ChatGPT and Google Bard, but a new large language model (LLM), NExT-GPT, is here to shake things up - offering the full bounty of text, image, audio, and video output. 

NExT-GPT is the brainchild of researchers from the National University of Singapore and Tsinghua University. Pitched as an ‘any-to-any’ system, NExT-GPT can accept inputs in different formats and deliver responses according to the desired output in video, audio, image, and text responses. This means that you can put in a text prompt and NExT-GPT can process that prompt into a video, or you can give it an image and have that converted to an audio output. 

ChatGPT has only just announced the capability to ‘see, hear and speak’ which is similar to what NExT-GPT is offering - but ChatGPT is going for a more mobile-friendly version of this kind of feature, and is yet to introduce video capabilities. 

We’ve seen a lot of ChatGPT alternatives and rivals pop up over the past year, but NExT-GPT is one of the few LLMs we’ve seen so far that can match the text-based output of ChatGPT but also provide outputs beyond what OpenAI’s popular chatbot can currently do. You can head over to the GitHub page or the demo page to try it out for yourself. 

So, what is it like?

I’ve fiddled around with NExT-GPT on the demo site and I have to say I’m impressed, but not blown away. Of course, this is not a polished product that has the advantages of public feedback, multiple updates, and so on - but it is still very good. 

I asked it to turn a photo of my cat Miso into an image of him as a librarian, and I was pretty happy with the result. It may not be at the same level of quality as established image generators like Midjourney or Stable Diffusion, but it was still an undeniably very cute picture.

Cat in a library wearing glasses

This is probably one of the least cursed images I've personally generated using AI. (Image credit: Future VIA NExT-GPT)

I also tested out the video and audio features, but that didn't go quite as well as the image generation. The videos that were generated were again not awful, but did have the very obvious ‘made by AI’ look that comes with a lot of generated images and videos, with everything looking a little distorted and wonky. It was uncanny. 

Overall, there’s a lot of potential for this LLM to fill the audio and video gaps within big AI names like OpenAI and Google. I do hope that as NExT-GPT gets better and better, we’ll be able to see a higher quality of outputs and make some excellent home movies out of our cats seamlessly in no time. 

You might also like...

Muskaan Saxena
Computing Staff Writer

Muskaan is TechRadar’s UK-based Computing writer. She has always been a passionate writer and has had her creative work published in several literary journals and magazines. Her debut into the writing world was a poem published in The Times of Zambia, on the subject of sunflowers and the insignificance of human existence in comparison. Growing up in Zambia, Muskaan was fascinated with technology, especially computers, and she's joined TechRadar to write about the latest GPUs, laptops and recently anything AI related. If you've got questions, moral concerns or just an interest in anything ChatGPT or general AI, you're in the right place. Muskaan also somehow managed to install a game on her work MacBook's Touch Bar, without the IT department finding out (yet).