The decision tree classifier — An overview
Article by: Olga Berezovsky
Supervised machine learning offers a variety of regression, classification, and clustering algorithms. In my previous articles, I focused on describing the most popular classification algorithms such as logistic regression and support vector machines. I went over each algorithm’s advantages, common use cases, and walked through the steps of the model implementation. In this article I want to introduce another popular classification commonly used in supervised machine learning: the decision tree.
Basics of the decision tree classifier
The decision tree model can be used for predicting categorical and continuous variables. Like SVM, it can be used for regression or ranking as well. Therefore, there are two types of trees: classification decision trees and regression decision trees. Here, I’m focusing only on the classification and will walk you through a binary classification problem..
The structure of decision tree is simple — it hierarchically splits data down into subsets which are then split again into the smaller partitions or branches until they become “pure”, meaning the features inside the branch all belong to the same class. Such classes are called “leaves”. You can read more about decision trees here. Basically, a tree classification has the following flow:
- • The top node represents the root
- • The internal node represents features
- • The leaves represent the outcome
The tree’s branches contain the logic for a decision rule, meaning your data is continually split given the input features.
Original Source: Rajesh S. Brid, Medium
The decision tree classifier is commonly used for image classification, decision analysis, strategy analysis, in medicine for diagnosis, in psychology for behavioral thinking analysis, and more.
Advantages of decision trees
The biggest advantage of decision trees is that they make it very easy to interpret and visualize nonlinear data patterns. They also work very fast, especially for exploratory data analysis. If your dataset is small, decision trees deliver the high accuracy score. Also, if your data is messy and not normalized (outliers), decision trees help you ignore or exclude non-essential features. Another advantage of classification decision trees is the possibility to improve their accuracy by setting the logic for the branches split.
To learn more about predictive models with a decision tree (+ examples) view the full article here: https://www.logic2020.com/insight/tactical/decision-tree-classifier-overview?utm_source=social&utm_medium=Medium&utm_campaign=Tree_Classifier