Improving results from recommendation engines

David Maher Roberts
The Filter's David Maher Roberts at SXSWi

Following the South by South West Interactive panel on collaborative filters, where's Anton Kast talked about why collaborative filters don't always provide optimum results, TechRadar caught up with David Maher Roberts, CEO of The Filter, after the panel.

The Filter sorts entertainment content to match its users' tastes, and we asked Maher Roberts how his service deals with the complex issues highlighted by Kast.

The first problem Kast raised was that of sparsity: "People doing the filtering are sparse compared with amount of content that needs filtering," said Kast. "If there's many more Digg stories than there are people voting in there then obviously we're not getting good coverage."

Maher Roberts explains that "it helps if you can narrow the breadth of what you are recommending to and from. This is why The Filter focuses on digital entertainment. It allows us to avoid being too broad and therefore getting too many items without any usage/activity around them.

"It also helps if you can aggregate data from many sources. Due to the partnerships we have at The Filter we aggregate over 20 million evidence clicks per day.

"So, both the focus on entertainment content and the aggregated data help in reducing the sparsity issue."

The second issue Kast raised was the "early rater problem", where something that has just been submitted by a user doesn't have enough voting information for filtering purposes.

"Digg is based on users' positive or negative confirmation," says Maher Roberts. "At The Filter we use all interactions with the content as an indication of some sort of connection. Whether it is consumed, shared, saved, or rated. It is a known fact that people consume content more than they rate it, so you are much more likely to get statistical evidence using usage than using purely voting."

Breaking the feedback loop

What Kast referred to as "the grey sheep problem" – where popular content is highlighted at the expense of material of interest to smaller groups may not be immediately apparent – is what Maher Roberts terms the 'feedback loop'.

"If a song or video is played a lot in a very short space of time (like the Britney Spears video after the MTV awards or the Crazy Frog single) then it is likely that a recommendation engine would recommend this piece of content more and more. We call that a feedback loop," says Maher Roberts.

To address the issue, he says "The Filter has altered its algorithm to automatically spot these spikes and cut it back to avoid over-recommendation."

User opposition

The final issue that Kast identified, and more of an issue for Digg than for The Filter, is that of user opposition. "Digg has this fascinating history where every once in a while a large number of people get incredibly enthusiastic about one thing and it ends up on our home page and fights goals we have to represent small groups or have diverse content," said Kast.

In the case of The Filter, "this can happen with user generated video, but is less frequent with 'official' video content and music," explains Maher Roberts.

"Ultimately, part of what an entertainment-focused recommendation engine is there to do is surprise, as well as deliver expected results. So we have designed our algorithms to include an element of serendipity to the results, not just show what is most connected or most popular in this moment in time."

Global Editor-in-Chief

After watching War Games and Tron more times that is healthy, Paul (Twitter, Google+) took his first steps online via a BBC Micro and acoustic coupler back in 1985, and has been finding excuses to spend the day online ever since. This includes roles editing .net magazine, launching the Official Windows Magazine, and now as Global EiC of TechRadar.