2009-10-10

Watching the Watchers

One of my pet projects, that I mention from time to time in this blog (just to increase its PageRank), is my Earthquake Mashup. This application has been constantly available since October 2006, and googling for the right keywords will get you there. Another toy using Google Maps is this page visualizing the locations of visitors to my web server. I realized some time ago, that I have actually created an earthquake detector, that shows the impact earthquakes have on people on the Internet.

On April 6th this year, I noticed a lot of requests from Italy in the logs of my website (which I "tail -f" occasionally). The visitors page agreed, a lot of requests from all over Italy were coming in, more and more every hour. What had happened? The city of L'Aquila in central Italy was hit by a magnitude 6.3 temblor. The main event took place already in the night, but it seemed that with the break of day and news covering the event, people started looking for information on the web.

I always wanted to analyze this rush on my mashup after large earthquakes, but somehow never did it. But then, two weeks ago, Sophia Liu from the University of Colorado contacted me. She works in a research group called "connectivIT" and specializes in the research of information technology in crisis. During a lively mail exchange, I decided to finally take a deeper look at the data I had collected. I learned long ago never to throw away log files :) and although I could have used all data way back from 2006, I used logs from February 2008 on, because at that time I gave my mashup a major overhaul and it got its present form.

First, I needed a speedy method for geocoding the IP addresses and hostnames found in the logs. For the visitors page I had used an online API by hostip.info. But this method seemed cumbersome for the amount of data, and a lot of addresses did not resolve to a location. MaxMind, who offers a commercial solution to this problem, distributes a lightweight version of their product for free, and the results looked much better. Plus: The command line version of their geocoder is part of pkgsrc, so I was able to use this tool within minutes. The visitors page now uses this method as well.

Geocoding IP addresses is not pure science, it involves a lot of faith, hand waving and voodoo. For example: IP addresses from my university (which owns what used to be a Class B network) are correctly identified as being in Bielefeld. The IP address I got assigned by my ISP right now is located in Hamburg, probably because that's the location of the company's headquarters. But addresses are assigned to certain regions in a defined way, so at least a rough location of the origin of a request is possible.

First I wanted to know, where all the people visiting the mashup since 2008 come from:


As you can see, the majority of requests comes from Europe, North America and Japan. Other regions are present as well, but with fewer hits. Of course there a several explanations: Most requests really come from these three areas, which is probably true. And I counted only those requests the geocoder could resolve to a location. My guess is, that addresses in other regions are harder to locate, due to less information on their distribution and use. Of course this unequal distribution has to be taken into account in all the subsequent deliberations.

Then I wanted to see the temporal distribution of these requests (i.e. only those I was able to geocode). Again, first on the global scale:


Technical note: I counted the daily requests to the HTML file of the mashup, not the other parts. This means, the numbers shown are roughly actual visits to the mashup. And keep in mind that the figures are relatively small, with 800 to 900 requests per day for the largest events. I guess the USGS website receives this amount of visitors within a few minutes on a calm day.

As you can see, there are some distinct peaks. Aligning their dates with this list, one can clearly identify the Valentine's Day earthquake in Greece (very first peak), or the Xinjiang-Xizang earthquake on March 20th, 2008. The very last peak stems from the two heavy quakes in Samoa and Sumatra, that hit the edges of the Australian Plate within 24 hours on September 29th/30th. Some peaks seem not to be related to an earthquake, e.g. those on October 14th/15th and 26th, 2008. I haven't investigated further, but this might have been a rogue web browser, that requested not only the data feed every five minutes, but the complete page itself.

One interesting fact is, that the catastrophic magnitude 7.9 Sichuan earthquake on May 12th, 2008, is only barely visible in the graph. It caused less traffic than a 5.4 earthquake in Illinois one month earlier. I don't have a good explanation, I can only speculate: If I remember correctly, my first thought, when I heard about this earthquake (and the high number of fatalities was not known then): "What? Another quake in China? Didn't they just have one?" The coverage of the March event was still very present. I think another reason is, that due to the previous riots in Tibet, journalists were very restricted, having problems getting into the affected region and reporting about the event. This meant less news about the earthquake, at least less TV images from the scene. Pictures appeared weeks or months later, which is very unusual these days.

As I mentioned above, the requests after the L'Aquila earthquake came mainly from Italy. This is visible, if plotting the daily requests before and after the incident:



The first image shows all requests from April 4th. All the requests on the second image from Italy were done late in the evening of April 5th, i.e. directly after the earthquake hit. This can be seen in this visualization using Nick Rabinowitz' TimeMap library (whose apparent similarity to my earthquake mashup is no coincidence). If you zoom in on Italy and then move the timeline slowly to the left, the rush on my page is clearly visible.

These are the requests around the Xinjiang-Xizang earthquake. The first image shows all requests from March 18th:




The number of requests goes back to normal relatively fast. And the number of requests from China itself is very small - though not totally blocked. Maybe some high ranking politicians are allowed to bypass the Great Firewall.

Other events can be mined from the data, when looking at the requests from individual countries. These are all requests from Great Britain:


As you can see, the number of visitors is usually very small. Something must have happened to change no visitors on February 26th


to this rush of visitors the next day


Indeed, many people were woken by a magnitude 5.2 earthquake, the strongest in the UK for 25 years.

A similar, unexpected earthquake can be seen here, in the request history of Romania:



The cluster of requests is also visible on the map:


And again, this event made the local news. Note the second peak in the graph: This is the Andaman Islands earthquake from August 10th, which is clearly visible in the global graph as the strongest peak. It seems, that many people bookmarked my page and returned to it. On the other hand, the recent Samoa/Sumatra quakes are barely visible.

There is probably more to be found in the logs. It would be interesting to corellate the requests with more earthquakes and plot them together on a map. As I looked around at the USGS web site, I found this database, that holds all the data on past earthquakes for many years, not only the last 14 days as the RSS feeds used in the mashup. Mating this database and the earthquake mashup would be another nice project, of course.

Update: Seems the USGS had a similar idea (found via slashdot and this article). And they have very similar graphics...

1 Kommentar:

Sebastian hat gesagt…

Bin immer wieder begeistert von Deinen Einträgen hierzu.

Ich weiß es passt nicht ganz aber neulich habe ich über die Bielefeldverschwörung in der englischen WP gelesen und nur dort ist ein Hinweis auf das schlechtere Kartenmaterial in Google Maps bis Ende 2006 zu finden. Ich hab ein wenig dran rumgefummelt am Eintrag und als "Quellennachweis" fiel mir in dem Moment wirklich nur Dein Blog und der Vergleich Google Maps/Yahoo Maps von damals ein. Das Internet ist so verdammt vergeßlich und da ist es umso faszinierender, dass Du eben nicht Deine Logs wegwirfst weil man immer nochmal was aus den Sachen machen kann, wenn man andere Werkzeuge bekommt, um damit zu arbeiten.

Na egal. Hat Spaß gemacht die Karten zu studieren. Ich liebe Landkarten und Graphen :-)