Pretty much any language commonly used for data analysis (R, SAS, Python) can calculate the distance between two geographic coordinates with relative ease. But always having to pull your data out of your data warehouse any time you want to do some basic geographic analysis can be frustrating - sometimes it's nice to keep simple queries all in one system. If you've got a spatially enabled version of Postgres or SQL Server, you're in business. But if not, you'll need to roll your own SQL solution.

In today's post, we're going to write our own code in vanilla SQL to calculate the distance between two latitude and longitude coordinates.

## Basic concepts

I'm going to start out with a couple of areas of explanation. If you just want to know how to make this work, you can scroll down to the SQL at the bottom...

First area of explanation: Latitude and longitude are basically nothing more than angles. Latitude is measured as your degrees north or south of the equator. Longitude is your degrees east or west of the prime meridian. The combination of these two angles pinpoints an exact location on the surface of the earth.

Second bit of explanation: As shown in the image above, the quickest route between two points on the surface of the earth is a "great circle path" - in other words, a path that comprises a part of the longest circle you could draw around the globe that intersects the two points. (It's counter-intuitive, but it works if you think about it.) And, since this is a circular path on a sphere using coordinates expressed in angles, all of the properties of the distance will be given by trigonometric formulas. Fair enough.

Final bit of explanation: The shortest distance between two points on the globe can be calculated using the Haversine formula. All of the math behind it is beyond the scope of this post but (surprise, surprise), it's just a bunch of trig. The great circle distance can be calculated using the Haversine formula shown below.

Note that Phi 1 and Phi 2 are latitudes and Lambda 1 and Lambda 2 are longitudes, while R is the radius of the earth (which we'll call 3961 miles). All we need to do now is implement this in SQL!

## The SQL code

Here's the code for implementing the Haversine formula in SQL, plus a dummy table for trying it out. This code is specifically tailored to Redshift (since this is an analytics blog), but it should be relatively easy to translate to other flavors of SQL...

The demo calculation puts these points about 88.7 miles away from each other, which just happens to be the same result I get from Andrew Hedges' great circle distance calculator. Awesome.

## Conclusion

Implementing algorithms in SQL is always a good time. If you've got any comments about this one, be sure to leave them below! Also, feel free to check out my other posts on implementing various algorithms in standard SQL code:

Thanks for stopping by!

Jude MarcApril 10, 2017 / 11:32 amHi Dayne,

I’m not seeing the SQL Code and table on this one. I think you meant to post a link but it didn’t go through.

Thanks,

daynebattenApril 10, 2017 / 1:05 pmThere’s a large GitHub Gist right in the middle. Do you have JavaScript disabled or anything?

bikashJuly 8, 2017 / 5:32 amgot follwing error:

Msg 402, Level 16, State 1, Line 2

The data types float and int are incompatible in the ‘^’ operator.

daynebattenJuly 10, 2017 / 9:18 amMaybe cast, then?

KurtJuly 11, 2017 / 10:44 amHey All,

I had the same error “The data types float and int are incompatible in the ‘^’ operator.”

This is the solution I found, if you use the Power func, https://docs.microsoft.com/en-us/sql/t-sql/functions/power-transact-sql.

select

2 * 3961 * asin(sqrt( power((sin(radians((lat2 – lat1) / 2))) , 2) + cos(radians(lat1)) * cos(radians(lat2)) * power((sin(radians((lon2 – lon1) / 2))) , 2) )) as distance

from

the_data;

Seems to work great.

@daynebatten This is Amazing. THANK YOU VERY MUCH you have saved me a lot of time.