2007-01-30

Love It or Leave It

As I mentioned a few days ago, a new opportunity for creating a mashup based on geocoded data was practically enforced on me: The U.S. Environmental Protection Agency (the guys who brought us monitors showing the Energy Star logo some decades ago) is distributing data on their projects to the public, for free. To quote from their website:
EPA is providing geospatial environmental information about facilities and sites that are regulated by the Agency in order to increase public access to environmental data and awareness of environmental activities.
So this is essentially a good idea, even more for a country, whose current government's only reason to mention environmental issues is to deviate from its even more catastrophic politics in Iraq and elsewhere. So I took the data, looked at it, massaged it a little bit, and created a first, rough version of a mashup. Yes, it uses frames (which will break, if you follow links on EPA's pages) and is of limited value in its current form. The main reason for this article is to document my troubles with the data and EPA. And this is the reason for writing this article in english. You knew there was something different this time, didn't you? These are my main grieves with EPA's approach:
  1. The data is encoded as an XML file, which is a reasonable idea. Unfortunately, this XML file is distributed inside a ZIP archive. This disables a potential mashup creator from accessing the data directly from EPA's website, ensuring the use of the latest data. Instead, this file has to be downloaded, unpacked and placed on a different webserver. So we might see several mashups, all providing data with different grades of staleness. Some sites may be active, some might be abandoned - who will know?
  2. The first version of the XML file was not well formed, it contained an unescaped ampersand character. While this is hopefully fixed in the meantime, this leaves some room for thoughts concerning the expertise in handling XML data and quality assurance. The use of XML Schema is another reason for concern, but enough has been said here, so I won't elaborate on this aspect.
  3. The geographical data ("coordinates", "longitudes and latitudes") in the file are provided in different coordinate systems ("datums"): NAD27, NAD83 and WGS84. While NAD83 and WGS84 are essentially the same, the difference between a point in NAD27 and the corresponding point in WGS84 with the same values for longitude and latitude is usually several tens of meters. All free mapping services (like Google Maps or Yahoo Maps) expect coordinates using the WGS84 datum. So one has to convert the NAD27 coordinates - which is not a trivial task, even more if you want to do this on-the-fly in a mashup. I did the conversion offline (the first two items on this list were another reason) and used a short Perl script and the great library PROJ.4 and its Perl interface Geo::Proj4 to do the heavy lifting.
  4. The XML file contains a lot of repetitive data, which seems to be more boiler plate text than actual information. The unpacked XML file occupies a whopping 3 MBytes, which is a little bit too much to transfer it as-is in a mashup. My Perl script broke this down to a plain text file of 190 KBytes, containing just the essential data.
  5. Every item in the XML file contains a link to a corresponding page on EPA's web site. Unfortunately, this page is full of more boiler plate text. You have to scroll down to the end of this page and find a link to the projects real page. While it is understandable that EPA wants to put their work into perspective, this detour is an annoyance for a user looking for information. And these pages sport a "Close window" button, probably because somewhere these pages are used inside popups. But guess what? I hate popup windows, and I hate websites dictating users how and where to read content. So in my application, this button is unnecessary (and, thank God, non-functional).
But the main reason for this article is another one: I wrote to the EPA staff members responsible for this XML file, detailing my experiences while creating the mashup. I am not an american citizen, and I couldn't care less how the U.S. of A. cope with their environmental sins. But being a responsible netizen, I want to help creating valuable mashups. The reply I got?
I see that you have not signed up for the email alert service?
That's all. Just this one line, and the link to the EPA page my mashup shows on startup (so yes, I probably have seen it before). HELLO??? What has registering to an announcement (i.e. one-way) mailing list to do with my findings? Does this solve any of the problems I described? I was so taken aback by this meagre response, that I was unsure how to answer. Consider this article my return by open letter. So this experience with american buerocracy leaves me somewhat disappointed. I don't think I'll invest any more time in this project. But I wanted to document all this, maybe someone will find it useful, or had similar encounters with the EPA.

Keine Kommentare: