Random Forest Classifiers as a Web Service in PHP

Recently, I found myself wanting to be able to make real-time, online predictions using a random forest classifier trained in R. Of course, there are many ways to make that happen – I could have used yhat’s ScienceOps product, for example. But, for project-specific reasons, I decided that the best route to go in this case was to get my hands dirty and build my own RESTful API for making predictions using my model.

Apparently, back in 2011, Disney debuted a show called "So Random." Thankfully, it only ran for a single season...

Apparently, back in 2011, Disney debuted a show called “So Random.” Thankfully, it only ran for a single season…

In this post, we’ll walk through all of the code necessary to export a random forest classifier from R and use it to make real-time online predictions in a PHP script.

Exporting a randomForest model

Suppose you had a simple random forest classifier trained on the commonly-used iris example data using R’s randomForest package. Like so:

As the randomForest documentation describes, the package provides a function called “getTree,” which returns a matrix or dataframe describing a single decision tree in the trained ensemble. The format of the results looks something like this:

Decision making starts at the split described by the first row – if the value of the variable listed in “split var” is less than or equal to the value of “split point”, the algorithm moves on to the split described by the “left daughter” row. Otherwise, it goes to the “right daughter” row. This process continues until the algorithm hits a row where “status” is -1, at which point the value of “prediction” is returned as the prediction. Easy peasy.

Note: as the above suggests, this post assumes a model trained on numeric features. It would be relatively simple to expand the code to work with categorical features, but that process is not explored here.

So, to export our forest in a way that we can consume in other applications, we can simply request each tree from the model, and bind them all together, with an additional index to number the trees. Something like this:

Of course, as you may have noticed above, the “prediction” variable lists the numeric index of the prediction’s value… But we want our application to know about the actual names of the categories. Let’s replace the numeric index with the actual name, then output to CSV.

That’s it! Our randomForest has been exported!

Making decisions in PHP

Now, we simply need to read this information into PHP, and implement the logic to make decisions using the forest. The logic here is actually pretty easy – when making a prediction, we simply need to loop through all the trees in the forest, and leverage the split rules defined by the model to arrive at a prediction at the bottom of the tree. Then, we add up the predictions, see which one got the most votes, and return that as the model result. We’ll break ties (more or less) at random – which is the same logic used by the original package.

I won’t go into all the gory details of how the PHP code works, but you’re certainly welcome to explore it yourself. Here’s the classes you’d need to actually make some decisions in PHP:

This script is extraordinarily simple to leverage for making decisions. Here, we load the iris model, and make a prediction with it:

It’s really that easy. If you run this script, you’ll see that it outputs “versicolor” as its result – which is good, since the test data is directly copied from a versicolor flower in the original iris dataset!

Conclusion

Of course, there’s a few ways this could undoubtedly be improved. For example:

  • This script only supports numeric features, not categorical features. However, it would be relatively easy to add additional support for categorical data.
  • This script only supports classification, not regression. Again, an easy modification.
  • In production, you may want to store the actual tree data in something like memcached – loading all of that data straight out of a file is by far the most time-consuming part of making predictions with large classifiers in PHP.

Ultimately, this is a great way to get started leveraging your R classifiers in PHP – and a good code base for anybody interested in expanding with the features I’ve described above. I’d love to hear about any ways you’re using this script, or any updates and optimizations you make to it.

7 Responses

  1. Fernando Henrique da Silva August 29, 2016 / 10:03 am

    I would like to find a CART (classification and regression tree) algorithm for PHP. You know how to get?

    • daynebatten August 30, 2016 / 9:48 am

      A Random Forest is just a group of decision trees… so I suppose you could train a single tree with RandomForest in R if that’s all you wanted.

      It seems you might be looking for code to actually train the decision tree in PHP (instead of R), though? If so, I’m not sure such a thing exists.

      • Collin Paran January 16, 2017 / 9:12 am

        I know this is late, but you could use a Shiny server to get CART to run then put a button on PHP to run the Shiny server script.

        Or you could use the exec() or the system() functions in PHP to run your R script.

        Looks like an interesting problem, I may explore it.

        Great read Dayne!

        • daynebatten January 17, 2017 / 8:12 pm

          Yep, absolutely!

          I was looking for a PHP-only solution, but this would certainly work.

  2. Senghort Kheang October 19, 2017 / 5:55 am

    I try to run you script PHP but it has a problem. Can you give me an iris.csv data for testing your script PHP? I afraid my data wrong format.

    • daynebatten October 25, 2017 / 9:16 pm

      Hello… did you use my R code to generate the trained random forest model and output it as iris.csv? That should give you everything you need.

Leave a Reply

Your email address will not be published. Required fields are marked *