hexadiam / CASS_Geocoding_Eval

this project produce a test case of 150,000 geocoded addresses for evaluation of new geocoder's precision. The addresses are from the USPS CASS stage 1 data set. Lat/Lon are obtained from Yahoo, Bing, Google, MapQuest for more comprehensive evaluation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a project for developing a geocoded addresses, for the purpose of evaluating new geocoders. The CASS stage 1 test set is used (from USPS, CASS is a test set for evaluating address normalization software, however, it doesn't offer lat/lon). 150,000 address are included.

Author: Sen Xu
Website: senxu.net

Motivation:
Currently there are no geocoding evaluation test set out there. 


There are a selection of web geocoding services:

current web geocoding service provide APIs for geocoding (yahoo, bing, google, mapquest are major players). some examples calls are listed as below:

*Yahoo 

REST call: http://where.yahooapis.com/geocode?q=320+vairo+Blvd,+Apt+C,+State+College,+PA+16803
Response:
<ResultSet version="1.0">
 <Error>0</Error>
 <ErrorMessage>No error</ErrorMessage>
 <Locale>us_US</Locale>
 <Quality>87</Quality>
 <Found>1</Found>
 <Result>
  <quality>87</quality>
  <latitude>40.810964</latitude>
  <longitude>-77.888236</longitude>
  <offsetlat>40.810864</offsetlat>
  <offsetlon>-77.888137</offsetlon>
  <radius>500</radius>
  <name/>
  <line1>320 Vairo Blvd, Apt C</line1>
  <line2>State College, PA  16803-2826</line2>
  <line3/>
  <line4>United States</line4>
  <house>320</house>
  <street>Vairo Blvd</street>
  <xstreet/>
  <unittype>Apt</unittype>
  <unit>C</unit>
  <postal>16803-2826</postal>
  <neighborhood/>
  <city>State College</city>
  <county>Centre County</county>
  <state>Pennsylvania</state>
  <country>United States</country>
  <countrycode>US</countrycode>
  <statecode>PA</statecode>
  <countycode/>
  <uzip>16803</uzip>
  <hash>97E22C2D6AF571BC</hash>
  <woeid>12764455</woeid>
  <woetype>11</woetype>
 </Result>
</ResultSet>

*Google 

REST call: http://maps.googleapis.com/maps/api/geocode/xml?sensor=true&address=1600+Amphitheatre+Parkway,+Mountain+View,+CA

*Bing 

REST call: http://dev.virtualearth.net/REST/v1/Locations/US/WA/98052/Redmond/1%20Microsoft%20Way?o=xml&key=BingMapsKey

replace the BingMapsKey with your own key, generated here: go to: https://www.bingmapsportal.com/, go to my account -> Create or view keys, fill in the form and get the key. it should look something like this: ArHg1cMweFLBZD6p0TMspBnCfW7kG2_PdNd4xaE6CxgZbbqNFPj9OCglIpTLSpeL

Response:

*Mapquest 

REST call: http://where.yahooapis.com/geocode?q=320+vairo+Blvd,+Apt+C,+State+College,+PA+16803

Response:

Note that the response not only include lat/lon, but also "formatted address", "precision", and most geocoding service has a daily call cap. As of Jan 25h, 2012, the daily cap for the above mentioned geocoding services are listed in the following table: 

geocoding web service	geocoding call daily cap	require key
google	2,500	optional
yahoo	50,000	optional
bing	30,000	yes
mapquest	5,000	yes 

In essense, it will take google, bing and mapquest a long time to finish geocoding the 150,000 address from CASS (60 days for google, 30 days for mapquest) from one IP address. Yahoo and Bing are more flexible and more generous on API usage.

there are also some geocoding service provided by academic institution, such as the one from USC: https://webgis.usc.edu/Services/Geocode/WebService/GeocoderWebService.aspx 

About

this project produce a test case of 150,000 geocoded addresses for evaluation of new geocoder's precision. The addresses are from the USPS CASS stage 1 data set. Lat/Lon are obtained from Yahoo, Bing, Google, MapQuest for more comprehensive evaluation.


Languages

Language:HTML 49.1%Language:Java 40.2%Language:JavaScript 8.1%Language:CSS 2.5%