While the best speech to text software used to be specifically only for desktops, the development of mobile devices and the explosion of easily accessible apps means that transcription can now also be carried out on a smartphone or tablet.
This has made the best voice to text applications increasingly valuable to users in a range of different environments, from education to business. This is not least because the technology has matured to the level where mistakes in transcriptions are relatively rare, with some services rightly boasting a 99.9% success rate from clear audio
Even still, this applies mainly to ordinary situations and circumstances, and precludes the use of technical terminology such as required in legal or medical professions. Despite this, digital transcription can still service needs such as basic note-taking which can still be easily done using a phone app, simplifying the dictation process.
However, different speech-to-text programs have different levels of ability and complexity, with some using advanced machine learning to constantly correct errors flagged up by users so that they are not repeated. Others are downloadable software which is only as good as its latest update.
Here then are the best in speech-to-text recognition programs, which should be more than capable for most situations and circumstances.
Or jump straight to:
- Best paid for speech to text apps
- Best free speech to text apps
- Mobile speech to text apps to consider
- Also take a look at the best voice recognition software
- Want your company or services to be added to this buyer’s guide? Please email your request to firstname.lastname@example.org with the URL of the buying guide in the subject line.
Best paid for speech to text apps
- Dragon Professional
- Dragon Anywhere
- Braina Pro
- Amazon Transcribe
- Microsoft Azure Speech to Text
- Watson Speech to Text
A business-grade solution
Should you be looking for a business-grade dictation application, your best bet is Dragon Professional. Aimed at pro users, the software provides you with the tools to dictate and edit documents, create spreadsheets, and browse the web using your voice.
According to Nuance, the solution is capable of taking dictation at an equivalent typing speed of 160 words per minute, with a 99% accuracy rate – and that’s out-of-the-box, before any training is done (whereby the app adapts to your voice and words you commonly use).
As well as creating documents using your voice, you can also import custom word lists. There’s also an additional mobile app that lets you transcribe audio files and send them back to your computer.
This is a powerful, flexible, and hugely useful tool that is especially good for individuals, such as professionals and freelancers, allowing for typing and document management to be done much more flexibly and easily.
Overall, the interface is easy to use, and if you get stuck at all, you can access a series of help tutorials. And while the software can seem expensive at $300, that's a one-time fee and competitive with paid-for subscription transcription services.
Benefit from dictation capabilities wherever you may be
Dragon Anywhere is the mobile product for Android and iOS devices, however this is no ‘lite’ app, but rather offers fully-formed dictation capabilities powered via the cloud.
So essentially you get the same excellent speech recognition as seen on the desktop software – the only meaningful difference we noticed was a very slight delay in our spoken words appearing on the screen (doubtless due to processing in the cloud). However, note that the app was still responsive enough overall.
It also boasts support for boilerplate chunks of text which can be set up and inserted into a document with a simple command, and these, along with custom vocabularies, are synced across the mobile app and desktop Dragon software. Furthermore, you can share documents across devices via Evernote or cloud services (such as Dropbox).
This isn’t as flexible as the desktop application, however, as dictation is limited to within Dragon Anywhere – you can’t dictate directly in another app (although you can copy over text from the Dragon Anywhere dictation pad to a third-party app). The other caveats are the need for an internet connection for the app to work (due to its cloud-powered nature), and the fact that it’s a subscription offering with no one-off purchase option, which might not be to everyone’s tastes.
Even bearing in mind these limitations, though, it’s a definite boon to have fully-fledged, powerful voice recognition of the same sterling quality as the desktop software, nestling on your phone or tablet for when you’re away from the office.
Nuance Communications offers a 7-day free trial to give the app a whirl before you commit to a subscription.
The big little speech to text app
Otter is a cloud-based speech to text program especially aimed for mobile use, such as on a laptop or smartphone. The app provides real-time transcription, allowing you to search, edit, play, and organize as required.
Otter is marketed as an app specifically for meetings, interviews, and lectures, to make it easier to take rich notes. However, it is also built to work with collaboration between teams, and different speakers are assigned different speaker IDs to make it easier to understand transcriptions.
There are three different payment plans, with the basic one being free to use and aside from the features mentioned above also includes keyword summaries and a wordcloud to make it easier to find specific topic mentions. You can also organize and share, import audio and video for transcription, and provides 600 minutes of free service.
The Premium plan comes in at $8.33 per month when paid annually, and on top of existing features also includes advanced and bulk export options, the ability to sync audio from Dropbox, additional playback speeds including the ability to skip silent pauses. The Premium plan also allows for up to 6,000 minutes of speech to text.
The Teams plan comes in at $12.50 per user for a minimum of three users, and also adds two-factor authentication, user management and centralized billing, as well as user statistics, voiceprints, and live captioning.
The smart speech to text service
Verbit aims to offer a smarter speech to text service, using AI for transcription and captioning. The service is specifically targeted at enterprise and educational establishments.
Verbit uses a mix of speech models, using neural networks and algorithms to reduce background noise, focus on terms as well as differentiate between speakers regardless of accent, as well as incorporate contextual events such as news and company information into recordings.
Although Verbit does offer a live version for transcription and captioning, aiming for a high degree of accuracy, other plans offer human editors to ensure transcriptions are fully accurate, and advertise a four hour turnaround time.
Altogether, while Verbit does offer a direct speech to text service, it’s possibly better thought of as a transcription service, but the focus on enterprise and education, as well as team use, means it earns a place here as an option to consider.
Leading speech recognition technology
Speechmatics offers a machine learning solution to converting speech to text, with its automatic speech recognition solution available to use on existing audio and video files as well as for live use.
Unlike some automated transcription software which can struggle with accents or charge more for them, Speechmatics advertises itself as being able to support all major British accents, regardless of nationality. That way it aims to cope with not just different American and British English accents, but also South African and Jamaican accents.
Speechmatics offers a wider number of speech to text transcription uses than many other providers. Examples include taking call center phone recordings and converting them into searchable text or Word documents. The software also works with video and other media for captioning as well as using keyword triggers for management.
Overall, Speechmatics aims to offer a more flexible and comprehensive speech to text service than a lot of other providers, and the use of automation should keep them price competitive.
A virtual assistant for your PC
Braina is speech recognition software which is built not just for dictation, but also as an all-round digital assistant to help you achieve various tasks on your PC. It supports dictation to third-party software in not just English but almost 90 different languages, with impressive voice recognition chops.
Beyond that, it’s a virtual assistant that can be instructed to set alarms, search your PC for a file, or search the internet, play an MP3 file, read an ebook aloud, plus you can implement various custom commands.
The Windows program also has a companion Android app which can remotely control your PC, and use the local Wi-Fi network to deliver commands to your computer, so you can spark up a music playlist, for example, wherever you happen to be in the house. Nifty.
There’s a free version of Braina which comes with limited functionality, but includes all the basic PC commands, along with a 7-day trial of the speech recognition which allows you to test out its powers for yourself before you commit to a subscription. Yes, this is another subscription-only product with no option to purchase for a one-off fee. Also note that you need to be online and have Google’s Chrome browser installed for speech recognition functionality to work.
Cloud-based speech to text technology
Amazon Transcribe is as big cloud-based automatic speech recognition platform developed specifically to convert audio to text for apps. It especially aims to provide a more accurate and comprehensive service than traditional providers, such as being able to cope with low-fi and noisy recordings, such as you might get in a contact center.
Amazon Transcribe uses a deep learning process that automatically adds punctuation and formatting, as well as process with a secure livestream or otherwise transcribe speech to text with batch processing.
As well as offering time stamping for individual words for easy search, it can also identify different speaks and different channels and annotate documents accordingly to account for this.
There are also some nice features for editing and managing transcribed texts, such as vocabulary filtering and replacement words which can be used to keep product names consistent and therefore any following transcription easier to analyze.
Overall, Amazon Transcribe is one of the most powerful platforms out there, though it’s aimed more for the business and enterprise user rather than the individual.
Part of the Azure platform's Cognitive Services
Microsoft's Azure cloud service offers advanced speech recognition as part of the platform's speech services to deliver the Microsoft Azure Speech to Text functionality.
This feature allows you to simply and easily create text from a variety of audio sources. There are also customization options available to work better with different speech patterns, registers, and even background sounds. You can also modify settings to handle different specialist vocabularies, such as product names, technical information, and place names.
The Microsoft's Azure Speech to Text feature is powered by deep neural network models and allows for real-time audio transcription that can be set up to handle multiple speakers.
As part of the Azure cloud service, you can run Azure Speech to Text in the cloud, on premises, or in edge computing. In terms of pricing, you can run the feature in a free container with a single concurrent request for up to 5 hours of free audio per month. After that pricing starts from $1 per audio hour.
IBM's Watson Speech to Text works is the third cloud-native solution on this list, with the feature being powered by AI and machine learning as part of IBM's cloud services.
While there is the option to transcribe speech to text in real-time, there is also the option to batch convert audio files and process them through a range of language, audio frequency, and other output options.
You can also tag transcriptions with speaker labels, smart formatting, and timestamps, as well as apply global editing for technical words or phrases, acronyms, and for number use.
As with other cloud services Watson Speech to Text allows for easy deployment both in the cloud and on-premises behind your own firewall to ensure security is maintained.
Best free speech to text apps
- Google Gboard
- Just Press Record
- Windows 10 Speech recognition
Easily accessible text to speech
If you already have an Android mobile device, then if it's not already installed then download Google Keyboard from the Google Play store and you'll have an instant text-to-speech app. Although it's primarily designed as a keyboard for physical input, it also has a speech input option which is directly available. And because all the power of Google's hardware is behind it, it's a powerful and responsive tool.
If that's not enough then there are additional features. Aside from physical input ones such as swiping, you can also trigger images in your text using voice commands. Additionally, it can also work with Google Translate, and is advertised as providing support for over 60 languages.
Even though Google Keyboard isn't a dedicated transcription tool, as there are no shortcut commands or text editing directly integrated, it does everything you need from a basic transcription tool. And as it's a keyboard, it means should be able to work with any software you can run on your Android smartphone, so you can text edit, save, and export using that. Even better, it's free and there are no adverts to get in the way of you using it.
A cloud-based transcription tool
If you want a dedicated dictation app, it’s worth checking out Just Press Record. It’s a mobile audio recorder that comes with features such as one tap recording, transcription and iCloud syncing across devices. The great thing is that it’s aimed at pretty much anyone and is extremely easy to use.
When it comes to recording notes, all you have to do is press one button, and you get unlimited recording time. However, the really great thing about this app is that it also offers a powerful transcription service.
Through it, you can quickly and easily turn speech into searchable text. Once you’ve transcribed a file, you can then edit it from within the app. There’s support for more than 30 languages as well, making it the perfect app if you’re working abroad or with an international team. Another nice feature is punctuation command recognition, ensuring that your transcriptions are free from typos.
This app is underpinned by cloud technology, meaning you can access notes from any device (which is online). You’re able to share audio and text files to other iOS apps too, and when it comes to organizing them, you can view recordings in a comprehensive file. The app is available on iOS devices for $4.99.
Powered by Google technology
Speechnotes is yet another easy to use dictation app. A useful touch here is that you don’t need to create an account or anything like that; you just open up the app and press on the microphone icon, and you’re off.
The app is powered by Google voice recognition tech. When you’re recording a note, you can easily dictate punctuation marks through voice commands, or by using the built-in punctuation keyboard.
To make things even easier, you can quickly add names, signatures, greetings and other frequently used text by using a set of custom keys on the built-in keyboard. There’s automatic capitalization as well, and every change made to a note is saved to the cloud.
When it comes to customizing notes, you can access a plethora of fonts and text sizes. The app is free to download from the Google Play Store, but you can make in-app purchases to access premium features (there's also a browser version for Chrome).
Artificial intelligence-powered dictation software
Marketed as a personal assistant for turning videos and voice memos into text files, Transcribe is a popular dictation app that’s powered by AI. It lets you make high quality transcriptions by just hitting a button.
The app can transcribe any video or voice memo automatically, while supporting over 80 languages from across the world. While you can easily create notes with Transcribe, you can also import files from services such as Dropbox.
Once you’ve transcribed a file, you can export the raw text to a word processor to edit. The app is free to download, but you’ll have to make an in-app purchase if you want to make the most of these features in the long-term. There is a trial available, but it’s basically just 15 minutes of free transcription time. Transcribe is only available on iOS, though.
Microsoft’s desktop OS has fully integrated voice recognition
If you don’t want to pay for speech recognition software, and you’re running Microsoft’s latest desktop OS, then you might be pleased to hear that Windows 10 actually has some very solid voice recognition abilities built right into the operating system.
Windows Speech Recognition, as it’s imaginatively named – and note that this is something different to Cortana, which offers basic commands and assistant capabilities – lets you not only execute commands via voice control, but also offers the ability to dictate into documents.
The sort of accuracy you get isn’t comparable with that offered by the likes of Dragon, but then again, you’re paying nothing to use it. It’s also possible to improve the accuracy by training the system by reading text, and giving it access to your documents to better learn your vocabulary. It’s definitely worth indulging in some training, particularly if you intend to use the voice recognition feature a fair bit.
This speech recognition capability is actually in previous versions of Windows as well, although Microsoft has honed it more with the latest OS. The company has been busy boasting about its advances in terms of voice recognition powered by deep neural networks, and Microsoft is certainly priming us to expect impressive things in the future. The likely end-goal aim is for Cortana to do everything eventually, from voice commands to taking dictation.
Turn on Windows Speech Recognition by heading to the Control Panel (search for it, or right click the Start button and select it), then click on Ease of Access, and you will see the option to ‘start speech recognition’ (you’ll also spot the option to set up a microphone here, if you haven’t already done that).
Mobile speech to text apps to consider
Aside from what has already been covered above, there are an increasing number of apps available across all mobile devices for working with speech to text, not least because Google's speech recognition technology is available for use.
SpeechTexter is another speech-to-text app that aims to do more than just record your voice to a text file. This app is built specifically to work with social media, so that rather than sending messages, emails, Tweets, and similar, you can record your voice directly to the social media sites and send. There are also a number of language packs you can download for offline working if you want to use more than just English, which is handy.
Voice Notes is a simple app that aims to convert speech to text for making notes. This is refreshing, as it mixes Google's speech recognition technology with a simple note-taking app, so there are more features to play with here. You can categorize notes, set reminders, and import/export text accordingly.
ListNote Speech-to-Text Notes is another speech-to-text app that uses Google's speech recognition software, but this time does a more comprehensive job of integrating it with a note-taking program than many other apps. The text notes you record are searchable, and you can import/export with other text applications. Additionally there is a password protection option, which encrypts notes after the first 20 characters so that the beginning of the notes are searchable by you. There's also an organizer feature for your notes, using category or assigned color. The app is free on Android, but includes ads.
iTranslate Translator is a speech-to-text app for iOS with a difference, in that it focuses on translating voice languages. Not only does it aim to translate different languages you hear into text for your own language, it also works to translate images such as photos you might take of signs in a foreign country and get a translation for them. In that way, iTranslate is a very different app, that takes the idea of speech-to-text in a novel direction, and by all accounts, does it well. Working with over 100 languages, the basic version is free to use, but the pro version costs $4.99 for a month, or you can subscribe annually for $39.99.