Recently, I found myself wanting to be able to make real-time, online predictions using a random forest classifier trained in R. Of course, there are many ways to make that happen – I could have used yhat’s ScienceOps product, for example. But, for project-specific reasons, I decided that the best route to go in this case was to get my hands dirty and build my own RESTful API for making predictions using my model.
In this post, we’ll walk through all of the code necessary to export a random forest classifier from R and use it to make real-time online predictions in a PHP script.
Exporting a randomForest model
Suppose you had a simple random forest classifier trained on the commonly-used iris example data using R’s randomForest package. Like so:
As the randomForest documentation describes, the package provides a function called “getTree,” which returns a matrix or dataframe describing a single decision tree in the trained ensemble. The format of the results looks something like this:
Decision making starts at the split described by the first row – if the value of the variable listed in “split var” is less than or equal to the value of “split point”, the algorithm moves on to the split described by the “left daughter” row. Otherwise, it goes to the “right daughter” row. This process continues until the algorithm hits a row where “status” is -1, at which point the value of “prediction” is returned as the prediction. Easy peasy.
Note: as the above suggests, this post assumes a model trained on numeric features. It would be relatively simple to expand the code to work with categorical features, but that process is not explored here.
So, to export our forest in a way that we can consume in other applications, we can simply request each tree from the model, and bind them all together, with an additional index to number the trees. Something like this:
Of course, as you may have noticed above, the “prediction” variable lists the numeric index of the prediction’s value… But we want our application to know about the actual names of the categories. Let’s replace the numeric index with the actual name, then output to CSV.
That’s it! Our randomForest has been exported!
Making decisions in PHP
Now, we simply need to read this information into PHP, and implement the logic to make decisions using the forest. The logic here is actually pretty easy – when making a prediction, we simply need to loop through all the trees in the forest, and leverage the split rules defined by the model to arrive at a prediction at the bottom of the tree. Then, we add up the predictions, see which one got the most votes, and return that as the model result. We’ll break ties (more or less) at random – which is the same logic used by the original package.
I won’t go into all the gory details of how the PHP code works, but you’re certainly welcome to explore it yourself. Here’s the classes you’d need to actually make some decisions in PHP:
This script is extraordinarily simple to leverage for making decisions. Here, we load the iris model, and make a prediction with it:
It’s really that easy. If you run this script, you’ll see that it outputs “versicolor” as its result – which is good, since the test data is directly copied from a versicolor flower in the original iris dataset!
Of course, there’s a few ways this could undoubtedly be improved. For example:
- This script only supports numeric features, not categorical features. However, it would be relatively easy to add additional support for categorical data.
- This script only supports classification, not regression. Again, an easy modification.
- In production, you may want to store the actual tree data in something like memcached – loading all of that data straight out of a file is by far the most time-consuming part of making predictions with large classifiers in PHP.
Ultimately, this is a great way to get started leveraging your R classifiers in PHP – and a good code base for anybody interested in expanding with the features I’ve described above. I’d love to hear about any ways you’re using this script, or any updates and optimizations you make to it.