Pretty much any language commonly used for data analysis (R, SAS, Python) can calculate the distance between two geographic coordinates with relative ease. But always having to pull your data out of your data warehouse any time you want to do some basic geographic analysis can be frustrating - sometimes it's nice to keep simple queries all in one system. If you've got a spatially enabled version of Postgres or SQL Server, you're in business. But if not, you'll need to roll your own SQL solution.
In today's post, we're going to write our own code in vanilla SQL to calculate the distance between two latitude and longitude coordinates.
I'm going to start out with a couple of areas of explanation. If you just want to know how to make this work, you can scroll down to the SQL at the bottom...
First area of explanation: Latitude and longitude are basically nothing more than angles. Latitude is measured as your degrees north or south of the equator. Longitude is your degrees east or west of the prime meridian. The combination of these two angles pinpoints an exact location on the surface of the earth.
Second bit of explanation: As shown in the image above, the quickest route between two points on the surface of the earth is a "great circle path" - in other words, a path that comprises a part of the longest circle you could draw around the globe that intersects the two points. (It's counter-intuitive, but it works if you think about it.) And, since this is a circular path on a sphere using coordinates expressed in angles, all of the properties of the distance will be given by trigonometric formulas. Fair enough.
Final bit of explanation: The shortest distance between two points on the globe can be calculated using the Haversine formula. All of the math behind it is beyond the scope of this post but (surprise, surprise), it's just a bunch of trig. The great circle distance can be calculated using the Haversine formula shown below.
Note that Phi 1 and Phi 2 are latitudes and Lambda 1 and Lambda 2 are longitudes, while R is the radius of the earth (which we'll call 3961 miles). All we need to do now is implement this in SQL!
The SQL code
Here's the code for implementing the Haversine formula in SQL, plus a dummy table for trying it out. This code is specifically tailored to Redshift (since this is an analytics blog), but it should be relatively easy to translate to other flavors of SQL...
The demo calculation puts these points about 88.7 miles away from each other, which just happens to be the same result I get from Andrew Hedges' great circle distance calculator. Awesome.
Implementing algorithms in SQL is always a good time. If you've got any comments about this one, be sure to leave them below! Also, feel free to check out my other posts on implementing various algorithms in standard SQL code:
Thanks for stopping by!