Analyzing the Social Web: the techniques, and why you need to use them

Other social media technologies were coming online as well. In 2004, Flickr, a photo-sharing website, and Digg, a social bookmarking website, launched. YouTube, the video-sharing website, came online in 2005, and Twitter launched in 2006, introducing microblogging to the social media space.

At that point, most of the major technologies of social networking were up and running, but new developments still continued at a dramatic rate. Sites came online and failed every day, and successful sites' numbers of users grew at a dramatic rate.

After the first few years of the millennium, social media was posing a challenge to the dominance of "traditional" web content. User-generated content from blogs, shared links, comments, forum posts, and social media content became more common than any other type of content, prompting Time Life Magazine to declare "You" as the person of the year in 2007.

While Google reigned as the most popular and most-used website for many years, Facebook surpassed it in 2010. Although varying from month to month, social media sites often make up at least half of the top ten most popular websites as tracked by Alexa.1

Websites discussed

The techniques in this book are not designed for any specific website or type of network; they are general techniques that will work on any network regardless of its source.

We will consider networks built from all types of interactions and websites, from email to discussion boards, Facebook-style social networks to blog- ging, and including offline social networks drawn from people's behavior and even from literature. However, because the book is focused on social media, a number of popular sites and types of social media occur throughout the text. This section introduces those and provides some background.

Some of the most popular sites in 2012 will feature prominently in the book's discussions. Facebook is by far the largest of these. Launched in 2004, it has since grown to be the world's largest social network with over a billion users.

It is a traditional social networking site, where users make explicit connections to "friends" and share updates with them. Other popular social networking sites include LinkedIn, which is geared toward professional relationships, MySpace, the social network that was most popular before the rise of Facebook, and Renren, a large social network based in China.

Twitter is another dominant website in the social media space, with 200 mil- lion active users in 2012. Twitter is called a microblog. Users post messages that are limited to 140 characters. It has social networking characteristics as well.

Users can follow others they find interesting, and the posts, called "tweets," from anyone followed will appear on the user's main page. Unlike the case with many social networks, the relationship does not have to be mutual. If Alice follows Bob on Twitter, Bob does not have to approve the relationship or follow Alice back. Twitter is the main microblogging website in the United States, but Weibo in China is also extremely popular.

Twitter segues into a type of social media based on sharing certain types of information. Twitter lets people share short pieces of text, but many sites support sharing other types of media. Photo-sharing sites are popular, and one that will appear frequently in this book is Flickr.

It allows users to post photos, label them with descriptive keywords called tags, and share them in a variety of ways. It also has a social networking component. Users can be friends with others, and this feature can be used to adjust access to photos.

In addition, people can comment on the photos that others share, and this commenting behavior can also be used to form a social network. YouTube, which is owned and run by Google, is the most popular video-sharing website. Like Flickr does with photos, YouTube lets users upload, share, and comment on videos. They can also become friends with other users.

Social bookmarking sites allow users to share interesting links. Digg, del.icio.us, and Reddit are popular sites for this activity. They support tagging links, voting them up or down to indicate interest. Pinterest is another social bookmarking site growing in popularity. It is visual, where users share photos that often link back to an originating article.

Tools Used

Most of the techniques you will learn in this book require no special software and no complex calculations. However, to compute statistics about every node in a network can be time consuming, and some methods are too complex to apply by hand.

A number of tools are available that will help with social network analysis, and two in particular are discussed in this book. They are free, have many built-in methods for assisting with social network analysis, and have easy-to-use user interfaces for creating visualizations of networks and interacting with them.

The first is Gephi (Figure 1.2). It is an open-source free software package that runs in Windows, Mac OS X, and Linux. Gephi is a visualization tool with capabilities to calculate centrality, clustering, network diameter, and other metrics. Because it is open-source, there are also many plugins that add functionality to the core program.

The second tool is NodeXL (Figure 1.3), a template for Microsoft Excel 2007, 2010, and later Excel versions on Windows. It is a free download. Like Gephi, it has tools for visualizing graphs and computing many common network analysis statistics.

Both tools have features called spigots which allow users to directly import network data from other sources. Gephi comes with an email network importer, but other spigots are available as plugins. NodeXL can import email as well as queries to Twitter, Flickr, and YouTube. These spigots make it easy to get network data for analysis and experimentation.