geolocation

warning: Creating default object from empty value in /home/brianmok/public_html/modules/taxonomy/taxonomy.pages.inc on line 33.

Geolocating tweets using Free/Open Source Components and Data

In this posting, I will describe the sample application I created using all free/open source software and free/open data sources to create a completely client-side application for searching for a keyphrase on Twitter within a geographic boundary and clusters tweets for a given lat/long coordinate. Furthermore, I will demonstrate some HTML5 capabilities, namely the Local Storage API.

To start, I am using several free/open components:

  1. jQuery -- the immensely popular cross-browser Javascript library.
  2. OpenLayers -- an open source Javascript library for displaying map data in browsers.
  3. OpenStreetMap -- a collaborative project to create a free editable map of the world.
  4. Nominatim -- a free service for converting addresses to coordinates.
  5. Twitter Search API -- Twitter's API for searching recent tweets

You likely recognize a number of these components, though Nominatim is probably the most obscure. Nominatim is actually a very important part of this application because most tweets are not geolocated, so we need to infer the location by using their "Location" tag. This is why you will see a large number of tweets clustered in one area...the users likely shared a common location name (e.g., Rochester, N.Y.).

If you would like to just see the application now and/or view the source code, you can do so at http://code.brianmokeefe.com/TweetMap.html. I will explain key parts this application below.

HTML Local Storage

	if(typeof(Storage)!=="undefined") {
		// in IE9, you cannot use local storage for a file system webpage (file://),
		// so this is a workaround to keep it from crashing
		var storage = {};
		if (localStorage) {
			storage = localStorage;
		}

		_cache = storage;
	}

Basically, this section says (1) does the browser support HTML5 Local Storage, (2) If yes, the store the geolocation lookups for future use (e.g., Rochester, N.Y. will probably always be in the same place), (3) if no, then just use a local object (not seen here, called _cache) to store the geolocation. There is a little workaround here...IE9 does not support Local Storage for file:/ urls. The var storage={}; if (localStorage)... block takes care of this.

Twitter Search API

	$.ajax({
		url: 'http://search.twitter.com/search.json',
		type: 'GET',
		dataType: 'jsonp',
		data: {
			q: query,
			lang: 'en',
			result_type: 'recent',
			rpp: 100,
			geocode: _lat + ',' + _lon + ',' + _distance + 'mi'
		},
		success: function(data, textStatus, xhr) {
			_results = data.results;

			// if data came back, then start processing the tweets
			if (data.results.length > 0) {
				setTimeout("processTweet();", 10);
			}
		}
	});
}

This block uses the jQuery ajax function to call the Twitter Search API. There are a few variables missing in this snippet, such as the query (the keyphrase), and the lat, lon, and distance used as a geofence around the search. This isn't foolproof...for instance Twitter commonly thinks posts from York, England fall within this area. In essence, we are finding all recent tweets (max 100, English language) in the geofenced area. If this succeeds, we are storing the data in a local variable and asynchronously calling the "processTweet()" function in 10ms.

Geolocation

For each tweet, I encoded within this code block the rules for geolocating it...

	        // if the tweet was geotagged, then nothing to do but queue the tweet
		if (tweet.geo) {
	  		lat = tweet.geo.coordinates[0];
			lon = tweet.geo.coordinates[1];
			geoCode = lat + ',' + lon;
			clusterTweet(geoCode, tweet.text);
		} else if (tweet.location){
			// if the location field was really geocoordinates, then parse them and queue the tweet
			var geocoords = /-?\d{1,2}\.\d+,-?\d{1,3}\.\d+/.exec(tweet.location);
			if (geocoords) {
				lat = geocoords[0].split(',')[0];
				lon = geocoords[0].split(',')[1];
				geoCode = lat + ',' + lon;
				clusterTweet(geoCode,tweet.text);
			} else {
				// otherwise, we are going to look up the coordinates from the location name
				geoCodeLookup(tweet, _boundingBox);
			}			
		}

Basically, if the tweet was geocoded, then use those coordinates. Otherwise, if the location property of the user contains coordinates, then use those coordinates. Finally, if the location is a text name for a location, then we will translate that name into geocoordinates (see below). In any case, the tweet is clustered by coordinates via the clusterTweet call.

function clusterTweet(geoCode, text) {
	
	// if we haven't seen this geocode yet, then create an array for it
	if (!_tweetClusters[geoCode])
		_tweetClusters[geoCode] = [];
	// add the tweet text to the array of tweets for the geocoordinate
	_tweetClusters[geoCode].push(text);
	// process the next tweet "asynchronously"
	setTimeout("processTweet();", 10);
}

This function says (1) If we do not have any tweets for these coordinates, then create a bucket (cluster) for those coordinates, (2) add the tweet to the bucket, (3) process the next tweet "asynchronously". Why asynchronously you say? The reason is that if the exact coordinates could not be determined from the geo or location properties, then we need to go to the Nominatim service to look it up...which we do asynchronously. It also helps prevent overflowing the stack if you keep calling "processNextTweet()" each time the web service returns.

Translating addresses to coordinates using Nominatim

Translating addresses to coordinates using Nominatim is a little tricky even if you are used to using jQuery to make AJAX calls. First, it does not use the standard way of supporting JSONP (read about it here if you aren't familiar). To support this, we specify the properties jsonp: false to tell jQuery to not add the "?callback=" string to the url (a standard way of doing JSONP). We also specify the callback name to use with jsonpCallback: 'json_callback' + tweet.id_str. This generates a unique callback function name (via the tweet unique identifier), and allows us to share the callback name with Nominatim, as required, using the querystring parameter "json_callback" as seen under the "data" block below.

		$.ajax({
			url: 'http://nominatim.openstreetmap.org/search',
			type: 'GET',
			dataType: 'jsonp',
			jsonp: false,
			jsonpCallback: 'json_callback' + tweet.id_str,
			data: {
				format: 'json',
				q: address,
				limit: 1,
				viewbox: boundingBox[1] + ',' + boundingBox[2] + ',' + boundingBox[3] + ',' + boundingBox[0],
				json_callback: 'json_callback' + tweet.id_str			
			},
			success: function(data, textStatus, xhr) {
				var coords = null;
				// if there was a response, then cache the [location name,geocoordinate] pair
				if (data && data.length && data.length > 0) {
					coords = data[0].lat + ',' + data[0].lon;
					addGeocode(address, coords);
				}
				clusterTweet(coords, tweet.text);
			}
		});

You may notice the function "addGeocode()". Another step omitted in this example occurs before the Nominatim call is even made...basically, if we have already looked up this exact address, then we cache the coordinates locally rather than overload the web service unnecessarily. This cache uses the HTML5 Local Storage mentioned above. These functions are very simple:

// is the location name in the geocode cache?
function lookupGeocode(address) {
	return (!_cache[address]) ? null : _cache[address];
}

// add the [location, geocoordinate] pair to the cache
function addGeocode(address, value) {
	_cache[address] = value;
}

Map the Tweets

Once we have clustered all the tweets, we then need to place them on the map. In between these steps, we iterate through each geocoordinate, consolidate the text into one giant string (headed by the count of the number of tweets in the cluster), and send it all to the "addMarker()" function.

	if (geoCode) {
	    var lonLat = new OpenLayers.LonLat( geoCode.split(',')[1] ,geoCode.split(',')[0] ).transform(
	      	new OpenLayers.Projection("EPSG:4326"), // transform from WGS 1984
	      	map.getProjectionObject() // to Spherical Mercator Projection
	    );
		var feature = new OpenLayers.Feature(_markerLayer, lonLat);
        feature.closeBox = true;
        feature.popupClass = AutoSizeFramedCloud;
        feature.data.popupContentHTML = tweetText;
        feature.data.overflow = 'auto';
        var marker = feature.createMarker();
        var markerClick = function (evt) {
            if (this.popup == null) {
                this.popup = this.createPopup(this.closeBox);
                map.addPopup(this.popup);
                this.popup.show();
            } else {
                this.popup.toggle();
            }
            OpenLayers.Event.stop(evt);
        };
        marker.events.register("mousedown", feature, markerClick);
        
	    _markerLayer.addMarker(marker);
	}
}

This function creates an OpenLayers Feature (a marker and popup combination) which assigns the consolidated tweet text as the popup content and then adds a function stating to show the popup text if the marker is clicked on ("mousedown" event to be specific). It then adds the marker to the OpenLayers marker layer (referenced as _markerLayer). I won't get into the OpenLayers mapping API too much here, but it should be pretty easy to follow if you look at the source code.

Conclusion

I hope adding this example is useful to you. I licensed the code under the Creative Commons license to make it easy to use while still recognizing my effort. There is some more code in the example I didn't speak about at http://code.brianmokeefe.com/TweetMap.html, primarily around OpenLayers configuration and helper functions for converting miles to kilometers and calculating distance to figure out the bounding box for mapping and geolocation. I know there are a lot of details I omitted, so feel free to contact me via @brianmokeefe or using the email below and I'll be glad to help.

Syndicate content