Scientists at Cornell University have developed an algorithm that can determine where Twitter users live without the need for location data being attached to their tweets.
The research, led by IBM's Jalal Mahmud, uses the content of tweets and the tweeting behaviour of users to determine their location.
Home Location Identification of Twitter Users calculates the location of users to different levels of accuracy, including city, state, time zone or geographic region.
It uses a mixture of statistical analysis and heuristic indicators, as well as referring to a geographic gazetteer (a dictionary of place names) to make its predictions.
A hierarchical approach is used to ensure as high a level of accuracy as possible. User locations are determined by time zone, state or geographic region first followed by city.
This allows the potential location to be narrowed down. The team has used the travel movements of users to increase the accuracy of the algorithm's output.
"The benefit of developing these algorithms is two-fold," explains the research paper. "First, the output can be used to create location-based visualizations and applications on top of Twitter… Second, our examination of the discriminative features used by our algorithms suggests strategies for users to employ if they wish to micro-blog publicly but not inadvertently reveal their location."
Based on experimental testing, the Cornell team believes that the algorithm outperforms the best existing algorithms for predicting the home location of Twitter users.