About the author
Jyri Huopaniemi is the Head of Technology Licensing at Nokia Technologies
Since the launch of the first smartphone over a decade ago, R&D teams, engineers and industrial designers have been locked in a battle of escalating innovation with an increasing speed.
Consumers have seen the technologies in their device rapidly change. They now have access to abilities and computing capabilities that only recently were thought not possible to achieve in such a slim form factor. We now use artificial intelligence daily to find the best route to work. We can stream the latest movies almost instantly.
Some of the biggest leaps forward have been in the camera built into these devices. From grainy images only a little more than a decade ago, we can now take 4K-quality images and video, augmented with AI to produce professional-grade content.
Increasingly, R&D teams have also awakened to the possibilities of integrating enhanced sensory technology into smartphones. This is evident in the current trends of AR and gaming, but also increasingly in traditional sensors such as microphones. In the case of audio, one of the missions today is to match the audio capabilities with the high-definition imaging and video capabilities in today’s cameras.
Audio capabilities unlocking next-gen experiences
Beyond removing the traditional audio jack, innovations in smartphone audio have been limited in the recent years.
By and large, the focus has been on improving the overall quality. Yet the audio experience hasn’t changed significantly since the days of the Walkman or the MP3 player – stereo at best, but often still mono, and some selection on adjusting the playback.
Consumers are often also reliant on purchasing external hardware to improve the clarity of playback. In recent years we have seen a welcome trend in smart speakers and improved speech and audio quality on smartphones, which is clearly a step in the right direction.
Most improvements to device audio have been largely confined towards the playback of professional content. There is an opportunity for innovative R&D teams to rethink the audio experience and match this to what can be achieved through image and video capture.
Through integrating sensory technologies and smart software, device makers can radically re-design the audio experience – giving more control to users over how they capture audio. Let’s look at two examples:
- The smart audio algorithms that enable spatial audio capture can also enable audio zoom functionality. Working just like a telephoto lens for audio, the zoom capability allows users to isolate and get closer to the desired sound source, drowning out the unwanted noise,
- The same technologies can also unlock the ability to dynamically track moving sound sources, as well as the automatic suppression of unwanted sounds, such as wind noise. Even post-capture editing of the sound scene can now be implemented, giving greater control over a captured scene, creating practically limitless possibilities when it comes to how we tell our stories.
Imagine a parent attending her child’s school play. Historically, you’d need to wrestle with poor acoustics and distracting noises of audience members, while also accepting that the quality of the captured sound would be limited by your distance from the stage (not to mention the muted delivery of nervous young actors).
Today, innovative audio technologies are able to mitigate these circumstances to provide never-before capabilities to users. While spatial audio capture will reproduce the sound scene in playback, it does not overcome the problem of ambient noise, nor does it allow you to get closer to the action. This is where the audio zoom and tracking capabilities come in.
Using the smartphone’s camera interface, a user can now zoom audio along with video, while simultaneously eliminating the ambient sound of the venue – shuffling chairs, conversations in the hall and fidgety kids in the audience. Further to this, one can select and dynamically track the key actor, allowing their performance to be captured in full, vibrant and crisp detail.
Marrying hardware and software at the R&D stage
These capabilities are achieved through software that works with the hardware. It doesn’t require a significant re-invention of the current form factors. But it does require a close relationship between device engineers and designers.
Working with the design team, software engineers and R&D teams can harmonise the algorithms that capture spatial audio to the unique specifications of the device’s form factor. This close partnership is important, as the placement of the microphones on the smartphone will contribute to the quality of the resulting applications. It will also decide what capabilities can be achieved.
While optimal placement is not always possible without compromising form factor, this can largely be addressed in the early R&D stage. Using acoustic laboratory measurements, the audio algorithms that analyse and process the multiple microphone signals can be calibrated to the specific placement. This goes a long way to preserving the integrity of the form factor and also determines what capabilities can be created.
These must also work harmoniously with the computing power of the device. This can include integration with AI engines to enable object recognition for sound, giving users the ability to focus sound or remove distracting background noise.
While democratising access to immersive audio is one half of the equation, ensuring ease-of-use for these capabilities is the other essential component. An effective user interface is another core asset – it needs to be as intuitive as how we use video capture today. Again, software designers must work closely with R&D and engineers to ensure these capabilities can be easily used.
The need for truly immersive content
Device manufacturers must consider why and how people use their smartphones to communicate today. In a digital world full of social channels on which we all daily share our lives, the importance of the technology we use to capture and share key moments cannot be overstated.
This is illustrated by the fact that nearly 60% of internet users upload and share video online today, while almost 80% of all digital video viewers consume this content via smartphones.
Delivering new experiences should not be about reacting to demand. It should be about setting the standard for innovation. Enabling more meaningful ways of connecting with digital media, be it user-generated or professional content, should be the primary focus for smartphone manufacturers.
Developing sensory technologies that capture the truest picture of our surroundings is key. This is because when we are not immersed in streaming the latest TV series, we are the storytellers. Devices that allow us to create new levels of immersion, deepening connections to our family, friends, and wider audiences, empower us as storytellers.
Original device manufacturers that understand audio’s role in the advancement of digital content will likely stay one step ahead of their competitors. They will take the lead in delivering products that offer true market differentiation.
This is increasingly crucial in future-proofing against new forms of digital content and technology trends. New mobile technologies, such as 5G, as well as the evolving capabilities of virtual and augmented reality are set to unlock ever more immersive experiences. These advanced audio technologies will be key ingredient in delivering them.
Jyri Huopaniemi is the Head of Technology Licensing at Nokia Technologies.