By drawing on the content of users’ tweets and their tweeting behavior, a team of three IBM researchers said they have a new algorithm to infer the home location of Twitter users at different granularities, including city, state, time zone or geographic region. The algorithm makes use of the person’s last 200 tweets for tracking. The scientists described their approach as an “ensemble of statistical and heuristic classifiers” and with this approach they said they could predict locations and make use of a geographic gazetteer dictionary (USGS [United States Geological Survey] gazetteer) to identify place-name entities. They analyzed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time and used that to further improve their detection accuracy.
The paper, “Home Location Identification of Twitter Users,” submitted earlier this month on arXiv.org, is by Jalal Mahmud, Jeffrey Nichols and Clemens Drews of IBM Research. They said they had experimental evidence to suggest their algorithm works well in practice. In fact, they said it “outperforms the best existing algorithms for predicting the home location of Twitter users.”
From July 2011 to Aug 2011, they collected tweets from the top 100 cities in US by population. They invoked the Twitter REST API to collect each user’s 200 most recent tweets (less if that user had fewer than 200 total tweets). Some users discovered to have private profiles were eliminated. The final data set had 1.5 million tweets by 9551 users.
Read more at: Phys.org