Classifying Names by Gender with Naive Bayes in SQL

Naive Bayes is an extraordinarily diverse algorithm for categorizing things… it can separate fraudulent from non-fraudulent credit card purchases, spam from legitimate email, or dinosaurs from fictional monsters. It’s most frequent application is in “bag of words” models… statistical models that classify a block of text based on the words that appear within it.

Godzilla

This is not a dinosaur…

In this post, we’ll explore the basics of Naive Bayes classification, and we’ll illustrate using a relatively common problem – assigning genders to a group of people when all you have is their names. And, just for kicks and giggles, we’re going to do it in SQL. Because nobody trains models in SQL.

Continue reading