Vocalmatic is an online transcription service created by a Canadian company Enactics Inc.
According to its site, it exists to “help people that work with audio recordings save time transcribing their audio and video into text.”
Given the number of different online transcription tools available, what makes this one special?
- Want to try Vocalmatic? Check out the website here
Plans and pricing
When you sign up for Vocalmatic, you are immediately given 30 minutes of transcription time, and when that runs out, additional hours can be bought starting at $15 for a single hour.
The more hours you buy, the less they cost, with 90-100 hours costing an average of just $6 per hour.
Those prices are for individual users, and Enactics offers flexible pricing for businesses where multiple users require access, and the volumes will, presumably, be high.
If you’ve a bulk requirement the best way to get costs down is to use a Day Pass. These allow unlimited use of the service for a day with a limit of three files per hour.
Using these passes a single day costs $15, three days $17 and seven days is $25, and if you can organise all your work into blocks then this is the cheapest way to use Vocalmatic.
Even by online transcription standards, Vocalmatic is basic. But conversely, it makes it remarkably simple to use for even those with limited computing skills.
The straightforward four-step process begins with you defining if you have audio, video or exclusively for American customers, you’d like to call in and have your words directly transcribed.
Most users will be using audio or video files, and in step 2 these are uploaded to the Vocalmatic site. The currently supported types include; mp3, m4a, mp4, flac, ogg, wav, aac, opus, oga, mogg, webm, and wma audio files.
The third step is to define what nationality the speaker is, and the fourth step is to say how you’d like the output.
Languages is certainly a strength of this system, as it currently transcribes English, Spanish, French, German, Portuguese, Italian, Dutch, Polish, Czech, Romanian, Chinese (Mandarin), Chinese (Cantonese), Japanese, Filipino, Vietnamese, Arabic, Persian, Hindi, Thai, Korean, Urdu, Turkish, Hebrew, Greek, Bulgarian, Russian, Finnish, Swedish, Danish, Afrikaans, Amharic, Armenian, Azerbaijani, Indonesian, Malay, Bengali, Catalan, Basque, Galician, Georgian, Gujarati, Croatian, Zulu, Icelandic, Javanese, Kannada, Khmer, Lao, Latvian, Lithuanian, Hungarian, Malayalam, Marathi, Norwegian, Serbian, Ukrainian, Sinhala, Slovak, Slovenian, Sundanese, Swahili, Tamil, and Telugu.
As impressive as that list is, not all accents and dialects in each language are recognised.
A default output option uses simple formatting, and the SRT choice is designed for those creating subtitling for a video.
With those selections made and the file provided, if you have enough time left on the account, the system will start to process the file into text.
An email is sent to announce that the system has started the process and another is dispatched once it is complete.
How long it takes is entirely dependent on how busy the Vocalmatic servers are, but generally, we found that it was close to the time of the recording, or maybe a little longer.
One 19 minute recording we processed took about 28 minutes to render, but other shorter parts were proportionally faster.
Once the file is processed, you can go into an editor and see how it did, and make some adjustments to the formatting and content.
What this system doesn’t do is provide any grammatical formatting. It doesn’t insert carriage returns between sentences, irrespective of the gap in speaking, and it doesn’t always capitalise names.
And, it puts timecode between chunks of transcription even when you are not using it for SRT file generation.
The audio can be played back in the content editor, but it makes no attempt to show you where in the transcription the audio is.
Once you are ‘happy’ with the results, the text can be exported as a Word document or text file in default mode, and SRT files if you chose that output mode.
In the Vocalmatic FAQ states, ‘We expect an accuracy of 80 - 90% on a clear audio. If there is a lot of background noise, or if people are talking at the same time, the transcription can have less than 10% accuracy.’
Well, in the limited testing, even snippets from audiobooks, with the clearest possible audio, we failed to achieve that accuracy level.
Even on its best attempts it will often repeat part of a paragraph or words, get a name wrong that it got right the sentence before, and litter the transcription with mistakes.
When the recordings aren’t pristine, such as our historical speeches tests, the results can be incomprehensible.
What we found most troubling is that when it has a problem with a word or phrase, Vocalmatic would randomly spew out single letters. In English, the language we used to test, a letter ‘y’ on its own isn’t a word, and neither is ‘m’, “k, or ‘t’.
That the output doesn’t conform to any dictionary or defined grammatical forms appears not to be a problem to the software. It also lacks an understanding of context, as it repeatedly interpreted son as ‘Sun’.
When you factor in all the fixes and adding full stops, commas and capitalising correctly, even the better attempts require substantial adjustment to be considered accurate.
This service isn’t cheap, that quick or very accurate.
The joy of having 30 minutes of free time wears thin disturbingly quickly, even if the solution is highly accessible and very easy to use.
One advantage that the free time avails a potential customer is that it allows you to upload an example of the recordings you are likely to need transcribing and see how it performs.
Only if it returns text that generally represents what was spoken in the audio would we proceed, but try some other options before paying for this one.
Compared with some of the other options, most notably Transcribe by Wreally, Trint and Happy Scribe, this isn’t accurate enough to be worth the expense.
Maybe we’re unkind, and it's brilliant at transcribing Swahili, or other languages, but purely based on its efforts with English we’d choose something else.