How to cut through 'dirty data' for social media insight

Social media is a difficult thing to manage for many companies. Most high-ranking business people think Twitter is a goldmine, and the best place to find information about their Next Big Thing™.

They hire data collection services to provide them with thousands of tweets, hoping to see a trend that tells them what their clients are looking for. The goal? Identify potential new products, or a new market.

Instead, they're confronted and horrified by what they find: thousands upon thousands of tweets about "laaaame", "luv it #selfieolympics", "#productx #cool #ninja #batman #teamfollowback #hashtag".

I've seen dirty data like this a countless number of times, and it's impossible to sort manually. You'll comb through it for hours, get frustrated, quit, then come back and spend another couple of hours before finding a couple of tweets that are relevant. Users aren't writing for your benefit: they're tweeting for their own enjoyment.

The value of text analytics

The best way to sift through the rubbish is text analytics. Don't waste time sorting; let a machine do it for you.

In a matter of seconds, the text analytics engine will give you a nice list of important topics that appear in the tweets. It'll look similar to #trends, but will actually be useful. Once you have these topics, you can run searches for them and dig deeper.

For example, ""bad service"" or ""defective"" might come up as topics when you run through the data about your company. You know they've appeared often enough that they're worth looking into. At that point, you can run manual searches to see if there are patterns of "bad service" or "defective" products.

It turns out that all of the "bad service" complaints came from the same hotel – someone's getting fired. Every "defective" product was made with the new set of cheap screws – back to the original product design.

To put it simply, text analytics is a great way to spare you of a severe headache when trying to analyze your social data.

  • Rami Nuseir is Semantria's Marketing Director, and a regular contributor to the Lexalytics corporate blog (Lexablog). Both companies specialize in text analytics and sentiment analysis technology.