How to transcribe audio to text: what you need to get the best results

How to transcribe audio to text
(Image credit: Pixabay)

For a long time, voice to text technology was more of a gimmick than a genuine business technology. However, times have changed. Today, the best speech-to-text software is seriously advanced, and business applications are rapidly expanding. We predict an exponential increase in the use of dictation software in the coming years, both at the consumer and business level. 

This article explains how you can make the most of these technologies to achieve high-quality transcriptions time after time.

Step 1: Microphone

One of the essential steps in successful audio transcription is using a quality microphone. A high-quality microphone array will enable the dictation software to hear your voice more clearly. The microphone can also be placed in an optimal location. While many in-built computer microphones are acceptable and have certainly improved in recent years, we recommend an external microphone if you want the best audio transcription results. 

Without going too deep into the details, voice to text software works by detecting phonemes in speech, of which there are 44 in English. Phonemes are the basic sounds that make up the words that we speak, and it is these sounds that dictation software is designed to listen for. A poor-quality microphone will make it harder for the software to distinguish between similar sounds, such as B or P, leading to less accurate audio transcription. 

An external microphone can also be placed in an optimal location to maximize speech pickup and clarity. Top-quality microphones also limit background noise (the Achilles’ heel of accurate speech transcription). Whereas in-built microphones are often impeded by other objects or don’t directly face the speaker, an external microphone can be placed directly in front of the speaker, increasing clarity. If you plan on using your speech-to-text software regularly, we recommend investing in a quality microphone.

A Yeti microphone is ideal for transcription. (Image credit: EB Games Australia)

Step 2: Invest in top-performing speech-to-text software

Of course, the software that you choose to use will also have a significant impact on the accuracy of your audio transcription. Not all speech-to-text software is alike, and some will consistently deliver better results than others. It's therefore worth mentioning a few general tips to keep in mind when looking for a software provider.

In the past, most voice to text software platforms relied on in-built local dictionaries to convert audio into text. The software would listen to the phonemes in speech and compare these to entries in its dictionary. Although this method doesn’t require an internet connection, it is often inaccurate. This is because the software would listen to each word in isolation, neglecting the broader context in which the word was used. Also, the lack of internet connectivity means these dictation platforms can only understand the set number of words contained in the platform’s dictionary. 

However, most modern voice typing technology relies on external servers and learning algorithms to function. Many also use artificial neural networks. This form of deep learning enables the software to listen to both words and sentences and cross-reference your speech with vast amounts of previous data it has collected. The platform can thus improve continually, learning how we use language and making minor edits to your transcription as you keep speaking and add detail. 

Thus, we recommend investing in a platform that requires internet connectivity and employs artificial neural networks as part of its back-end infrastructure.

Artificial neural networks are increasingly common in dictation software. (Image credit: Adam Geitgey)

Does your chosen audio transcription service include support for multiple languages? For some businesses, this isn’t a big issue. For others, it’s a non-negotiable. If your organization interacts with speakers of languages other than English, speech-to-text software can come in handy, allowing you to keep records of discussions or negotiations in multiple languages. 

Combined with translation software (which uses mostly the same technology), an advanced audio transcription solution may enable your business to provide truly multilingual services to customers and clients.

Microsoft Word is a leader in multiple language speech-to-text software. (Image credit: Microsoft)

Step 3: A quiet location

Even if you’ve invested in a microphone with background noise reduction, it helps to find a quiet location for transcribing audio to text. In a quiet room, the software will have no issue deciphering the subtleties of your voice, which becomes exponentially harder in a crowded office or busy street. 

If your organization is likely to regularly use speech-to-text software, you might want to consider setting up a room specifically for audio transcription. Utilizing a meeting room or other infrequently used space would also be an appropriate choice. 

If you don’t believe us, try using your transcription software in both a quiet room and a loud room. You’ll quickly see the difference in transcription accuracy.

Step 4: A list of voice commands

Most speech-to-text software comes with a list of voice commands. These commands enable you to control the font, punctuation, and colors used in your text, as well as the formatting of the document. Having a printed list of these commands in front of you will make audio transcription a much more seamless process. It will save you considerable time, at least when starting out.

Voice commands for Microsoft Word’s speech-to-text software. (Image credit: Microsoft)


A little preparation and planning can turn audio transcription from annoying and frustrating to efficient and satisfying. The technology has advanced rapidly in recent years. We believe we are now entering an era in which businesses around the world adopt voice typing and transcription technology for many of their daily business activities. 

Don’t miss the wave, and consider whether speech-to-text software is suitable for your organization. 

Darcy French