Skip to main content

Sonix.ai review

A flexible service for transcribing meetings

Sonix.ai
(Image: © Sonix)

Our Verdict

An easy to use transcribing service, Sonix.ai is designed for business users who aren't concerned at the high cost of use. Not the most accurate solution, but it is quick and the interface is very straightforward.

For

  • Easy to use
  • Free trial
  • Zapier integration

Against

  • Heavily grammar driven
  • Expensive
  • Results can vary

Each tool that transcribes targets a specific niche and Sonix has chosen those that record meetings or presentations and then need a typed version to search or disseminate.

Therefore, it could be equally useful to a student as it might be to an executive, delivering transcriptions that can be easily indexed for future reference.

But to be successful for these tasks, the software needs to be accurate and affordable.
Is Sonix.ai that tool, and can an online service compete with an installed application?

Sonix.ai

(Image credit: Sonix)

Plans and pricing

As with many transcribing services, the Sonix cost model is one that’s based on time. Specifically, the length of any recording that you upload to the service to process, and the cost of that resource is $10 per hour is you use its pay-as-you-go Standard service.

The cost per hour can be dropped to $5 if you sign up to the Premium subscription service for $22 per month for each user. And, volume discounts are available for those companies that require more than 100 hours per month transcribed.

The Standard plan has the most basic features, and Premium adds additional ones like multiuser access and team sharing. And, Enterprise has everything that Premium offers with lots of administration extra and an enhanced support model.

As transcription services go, Sonix is one of the more expensive, and certainly not something that the majority of students or home users could reasonably afford.

Sonix.ai

(Image credit: Sonix)

Design

The my.sonix.ai site uses an exceptionally clean design, and to create an account is free.

At the time of writing, this is exclusively a web-based service, and Sonix has no mobile app to capture and send to processing audio recordings.

However, as we’ll cover late, there are simple ways around this issue that resolve the lack of a mobile app somewhat.

The start point for any transcribing job is the Sonix.ai dashboard, where you can see the audio that’s already been transcribed and add new ones to be processed.

Because this system was designed for multiple users, it includes a virtual folder system to organise transcriptions in whatever way is deemed suitable.

Clicking ‘upload’ takes the user to a page where multiple files can be dropped into the system, and if the account has sufficient credit they can then be processed.

All the standard audio file formats are supported including wav, mp3, mp4a, aiff, acc, ogg and wma, and you can also upload some video packaging structures. The maximum file size is 4GB, so before uploading a big 4K resolution video file, we’d recommend you used some other tool to split the audio out to make uploading faster.

A good way to speed up the upload is to use a cloud storage facility like Google Drive, One Drive, Box or Dropbox and to link that directly to the account. You can also email the system using Gmail, as a means to create a more elegant workflow than dropping files on a web page.

This automation is provided via Zapier, allowing for much wider integration if the business using it has invested in that technology to connect its business processes.

Another nice touch is that along with the audio or video file, you can include existing transcription, as a means to more quickly complete the process and improve accuracy.

Sonix.ai

(Image credit: Sonix)

Another nice touch is that along with the audio or video file, you can include existing transcription, as a means to more quickly complete the process and improve accuracy.

Due to the nature of Cloud-based processing, judging how fast or slow processing might be is impossible, but Sonix.ai is relatively fast in our experience. Typically it takes between 10% and 20% of the time to transcribe as the recording lasts. Therefore completing a 10-minute recording usually takes under 2 minutes.

You don’t need to follow the processing, as the system will send you a notification by email when the work is done, together with a link to the new transcription.

Once the file is processed you can open it within and editing page to review the results, and also export the text in a wide range of useful formats, including those defined as subtitles by some apps.

The number of languages and dialects supported by the system is 36, and that includes multiple English, French, Cantonese, Mandarin, Portuguese and Spanish speaking countries, alongside all common western and eastern European languages, together with some Asian and Arabic.

Sonix.ai

(Image credit: Sonix)

Recordings

Alongside the work that went into the AI needed to interpret the noises that humans make, probably Recordings page represents a significant coding effort here.

Here both the audio and its associated transcript can be compared and manually enhanced with details of speaker changes and fixes to misinterpretations.

For anyone working through a transcript to polish the text, this page provides the coalface location. It makes sense that some effort has gone into this part, as it’s very easy to use and follow.

In an attempt to direct the user as to where there might be issues, Sonix.ai will colour code the contents to highlight those sections that it is less confident. This feature can be useful, although Sonix.ai can make mistakes in even those parts where it considers the transcription has a ‘Very Confident’ status.

The best aspects of this page are how the audio playback and text are synchronised so that placing the cursor in the text moves the playback position to the same section.

Alongside plain editing, it is also possible to highlight sections in various styles and make notes to go alongside the transcription.

You can also tweak the timecode, especially useful if the recording starts with a long pause or unwanted preamble.

Sonix.ai

(Image credit: Sonix)

Accuracy

Sonix describes Sonix.ai as ‘The best automated transcription software powered by cutting-edge AI’.

Given our testing, we’d describe this product as highly dependent on the quality of the recording and many other factors that can’t easily be controlled.

When processing our classic historical speech recordings, it had real difficultly with some speakers even if they sounded clear to us.

These results were in marked contrast to some more concurrent recordings, where the accuracy was acceptable but hardly stellar.

We concluded that the approach taken by Sonix makes several assumptions that can work or not, depending on the speaker and the quality of the recording.

What was fascinating is that the service will colour code its transcription based on how confident it is of what is being said, and this self-analysis is very revealing.

In some circumstances, it will correctly identify that a section might be suspect, but in other parts, it is confident of section it transcribed entirely wrong.

A few common issues seem to throw it a curveball, and one of these is people who don’t speak grammatically perfect prose. In an effort to make their speech more direct, they’ve removed some words from their sentences, making for a more dramatic style. When these are transcribed by Sonix, it appears determined to add those words back to fit its internal grammar model, rather than what was actually spoken.

The transcription reads better as a document, but it isn’t truly representative of what was said.

Sonix is certainly better when the quality of the recording and the clarity of the speaker is high, as we proved with a small clip of Stephen Fry reading Harry Potter. But, it isn’t possible to always have such control over the quality of sound, and it still made mistakes with that test.

Another problem area is formal names and technical words or abbreviations. These can be addressed by adding them to the custom dictionary, but this requires work to make the system able to realise better when names or acronyms are being used.

For those that need transcription word perfect, Sonix has a selection of associated professional transcribers that can work through a recording and address those issues, but this somewhat defeats the purpose of processed transcription.

Sonix.ai

(Image credit: Sonix)

Security

The focus of Sonix.ai security is the servers where the audio is processed, and the transcriptions are held.

All traffic is encrypted using TLS (Transport Layer Security) and once the files are on the server that is protected by multiple layers of firewall, intrusion protection and all data is ringfenced by AES-256 server-side encryption. And, the company promises that employees don’t have access to recordings or transcriptions unless explicit permission is given for them to have that.

The critical problem with this approach is that a simple login and password can circumvent it all. Sonix.ai has no two-factor authentication, and it doesn’t have an easily accessible log of who accesses files and when.

From a security viewpoint, this might well be considered an over-reliance on the integrity of those using the system not to share recordings with others outside the business or retain potentially sensitive the files when they leave the company.

In short, the security needs to be better, and the tracking or users activity given greater priority.

Final verdict

Considering the relatively high cost, we expected Sonix to perform better than it did.

Perhaps we were unlucky with our choices of things to transcode, but it did still seem to make an inordinate amount of errors.

On the plus side of this equation, it’s fast, and it’s a remarkably easy system to use, although we’d recommend running a few example recordings through this solution before committing to a subscription.

The biggest issue with Sonix is the high cost, and even if it is the right tool for your particular requirement, there are cheaper means to turn audio into text elsewhere.