How to get started with a decision tree learning structure

In order to garner the highest return on investment on your ad spend, knowing the best audience to target is essential.

Data gleaned from using a decision tree can help you choose the best attributes to use in modeling an audience of prospects.

A decision tree represents a means of prediction that can be described as follows: You're deciding whether you want to play outside or not. Your outlook includes three options: sunny, overcast and rainy. A linear model would say if it is sunny, then your answer is yes. A more complex model would expand to include additional factors, such as if it is sunny with a temperature of more than 75 degrees, your answer is yes. With decision trees, you can consider these additional attributes in order to help you find which of them is the most predictive.

Decision tree learning: capitalizing on information gain

From trees to forests

In order to demonstrate how decision tree learning can contribute to clustering your leads, we'll go over an example. Let's say we're selling home-remodeling services. We typically have a data set with 200 attributes, which can include household income, garage area and more. What the decision tree algorithm does is run these data points among your leads to find the most-predictive data point among them. For example, household income may be the 1 attribute of 200 that separates the data into "yes" and "no" responses. If this is the case, you can select it as the first variable by which you separate your leads moving forward.

In order to accomplish this, you have to have data at the ready. Historical data of your current customers and leads will allow you to know not only their attributes, but also whether they invested in your product or service (that "yes" or "no" we mentioned above). The accuracy with which you can then predict if a given person will buy if you're looking at that first data point—for example, household income—is called an information gain.

Once you have determined that highest information gain, you can move on to the next branch. Here would be the next-most-predictive variables of those you entered into the algorithm, creating a decision tree in the process, which becomes part of the decision forest.

Red flags and special considerations

One of the most important things to remember when it comes to decision trees is that even if your product is very similar to a competitor's and you sell within the same area, your highest information gain is not necessarily the same as your competitor's. For example, the algorithm may show that one business's most predictive attributes were tax amount, garage area and building age, while another business's decision forest may pull zoning type and heating-system type instead.

In marketing, we have seen that it is never the case that any small number of attributes predicts 100 percent of the outcome. Instead, it is always a combination of a large number of attributes that yields the best outcome. This is a good indication that even though filters are very important and commonly used by marketers, adding propensity models that can bring more attributes and nonlinear interactions among them to the decision making process can largely improve the overall outcome.

Thus, in a perfect scenario, you would partner with a data solution that puts both filters and prediction models to use on your behalf. If you have the capacity to run these tests yourself, remember that attributes can be more than they seem. It may seem obvious to tell you to make sure that you understand your data and what it represents before you use it in the algorithm, but this can be more difficult than it may seem. For one thing, you have to establish what your "yes" and "no" responses really mean. If you're looking at response rate, for example, you will not want to include those who responded negatively. Yes, they responded, but their response included a request for you to remove them from your marketing list. Thus, these individuals should be labeled accordingly.

Beyond correct labeling, another issue you may run into of you're handling these calculations on your own is noisy data. Real data always has "noise," which we use to refer to incorrect data, missing data and data that is presented in different formats (December 1, 2015 vs. 12/1/15 vs. 1/12/15, for example). Be prepared to handle those cases properly by collecting, cleaning and cross-referencing your data to verify its accuracy.

NarineManukyan / blog.github.io

How to get started with a decision tree learning structure

Decision tree learning: capitalizing on information gain

From trees to forests

Red flags and special considerations

About