Gemini just got a new highly-requested feature that trumps ChatGPT

Google Gemini on Android Auto — (Image credit: Google)

Google’s Gemini AI assistant now supports audio file uploads.
The AI will transcribe, summarize, and extract key information from recordings.
The feature turns 10 minutes of voice memos, meetings, lectures, and interviews into searchable documents.

Google Gemini has just learned how to listen and make sense of what it hears. You can now upload audio files to the AI assistant on the web or through the mobile apps and get transcriptions, summaries, and key details.

For anyone who’s ever let a voice memo rot in their phone or dreaded the task of rewatching a meeting recording, this update could be the AI equivalent of hiring a personal note-taker.

That said, it can only handle 10 minutes of audio at a time, so no long meetings just yet. You can upload the audio files directly by selecting audio from the usual file upload options. What makes it different from Gemini’s earlier Gemini Live voice features is that this isn’t just speaking to the AI in real time.

AI audio

✅ Papercut fixed: You can now upload any file to @GeminiApp. Including the #1 request: audio files are now supported! pic.twitter.com/4Te3xwLC6WSeptember 8, 2025

I tested it by uploading a couple of sketches from old comedy albums and a phone conversation with a friend. The AI successfully transcribed all the words said in each case, with a couple of small name-related errors. It was also good at pulling out key elements and things set for a to-do list.

The demand for audio and Google's response hint at how AI tools are evolving to match how we save information in audio logs and voice memos. Turning that into something searchable has usually meant using external transcription software. Gemini’s new feature collapses that process into a single step.

What makes the addition feel particularly timely is the way it dovetails with other recent Gemini improvements. Google has already integrated Gemini into apps like, begun testing a card-based visual interface, and significantly expanded Gemini’s personalization options. The ability to process audio continues that trend.

The audio option isn't unique to Gemini among AI assistants, but it can at least match some of what ChatGPT can do thanks to its Whisper transcription model. In fact, in my testing, I preferred Google's offering.

Anthropic’s Claude also handles audio in some developer tools, and Perplexity can extract data from YouTube videos. But Gemini’s execution is more focused on everyday use cases.

And the output isn’t just a dumb transcription. You can ask Gemini to simplify the language, extract speaker-specific comments, generate questions based on the content, or create a study guide from a classroom discussion. Of course, the 10-minute limit puts some restraint on making it part of everyday life. Free-tier users also face daily usage limits.

Google hasn’t released a formal pricing breakdown for high-volume audio processing, but it's part of the regular Gemini quota, so anyone planning to feed it a dozen hours of legal depositions should pace themselves.

TOPICS

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

Gemini just got a new highly-requested feature that trumps ChatGPT

AI audio

You might also like

Useful links