Looking for a free geocoding solution and frustrated by the lack of options?
Though it’s poorly publicized (and documented), the US Census Bureau maintains a RESTful API for real-time and batch geocoding that’s free, fast, and accurate. It doesn’t even require an API key. Essentially, it’s everything you’d get out of a roll-your-own PostGIS Census/TIGER solution, without the hassle of having to set it up.
The only limitation is that the API mainly provides geocoding for residential addresses. Because the Census Bureau’s primary task is, well, conducting the national census, they’re a lot more interested in where people live than where they work. So, if you need to map business addresses, you may have to go elsewhere. But if you’re mapping customers (or students, or volunteers, or mailing list recipients, or whatever), the service is pretty on-point. In my work, I’ve seen about a 90% match rate for customer addresses when submitted to Census. Not too shabby.
So, how do you hit the API? Well, it would be good to start by reading the documentation, even though it’s a bit lacking. You’ll get a high-level overview of what’s possible, including features I won’t talk about in this post.
The API has several different endpoints. Endpoints depend, first, on whether you want to code a single address or a batch of up to 1000. They also depend on whether you want to get only lat/longs, or if you also want Census geography information (e.g., what census block the address is in). For this guide, we’ll assume you just want to get the lat/long.
When making requests, you’ll also want to specify the “benchmark” to query. This is essentially what year of data you want to use for your mapping. We’ll be using the “Public_AR_Current” benchmark, which is the most up-to-date information, but there are some situations where it might make sense to use an outdated benchmark to code old addresses. Do what makes sense.
Hitting the endpoint for a single address is pretty easy. Simply use query string parameters to specify address parts (you can do one-line address as well), benchmark, and output format. That’s all there is to it!
Here’s how to code a single address using PHP and cURL.
And here’s the output!
Every once in a while, you may get more than one match. In my experience, the matches are usually close enough together to be virtually equivalent for most purposes, but your mileage may vary.
Of course, if you need to code, say, 800,000 addresses, going one at a time would be silly. Thankfully, there’s also an endpoint for batch geocoding. You simply fire off a POST request, specifying the benchmark and uploading a CSV file of up to 1000 addresses. The CSV file should simply include unique ID (you’ll get the results in a random order), street, city, state, and zip.
We’ll use this for a test CSV.
And here’s some code for mapping a file of addresses, again using PHP and cURL.
And the output.
Note that the output includes the unique ID we supplied, the supplied address, whether there was a match, the match type, the matched address, the lat long, the street ID number, and whether the address is on the left or right of the street.
Though it’s not specified in the documentation, there’s two things to note here. First, you’ll only get one match per address – presumably, the best. Second, “exact” matches are those where everything in the address lines up perfectly with the Census data. A “non-exact” match means that something was off… for example, the zip code you supplied doesn’t match the zip code in their data, even though everything else is lining up. In my experience, “non-exact” matches are still very reliable.
Of course, if you wanted to geocode more than 1000 addresses, you’d need to chunk them into batches of 1000 and submit each batch separately. I’ll let you write your own code for that… it’s pretty straightforward.
The Census geocoding API isn’t perfect, especially if you need to map mostly non-residential addresses. But it’s one of the best free tools out there. I’ve been making pretty regular use of it in my work, and I’ve been consistently impressed by the speed and accuracy… not to mention the price.