A Review on Classification of Various Types of Decision Trees with Merits and Demerits

Data mining includes several important technologies such as classification, clustering, regression, etc. Categorical mining technology, among data mining technologies, is becoming the most mature and active research direction enabling successful applications. Categorical mining can be applied to uncover useful information from large amounts of data stored in a large number of fields such as hospitals, inventory, banking, etc. Decision tree methods, neural network methods and statistics exploratory methods. In this paper a large number of decision tree algorithms such as Iterative Dichotomizer 3, CART (Classification and Regression Tree), C4.5, SLIQ and SPRINT are used in different fields. These algorithms have been explained clearly with merits and demerits along with some applications. Most of the decision tree methods are developed from the ID3 method.


Introduction
Data mining is a step within the information discovery method in a database that consists of the application of information discovery and analysis algorithms that, within perfect limits of computational performance, generate a table that lists specific patterns on the records [1].
Data Mining is a sequence of tactics to discover the delivered price of a facts set of know-how that has now not been known manually. Remember that the word mining itself way the attempt to get a few valuables from a massive variety of basic substances. Consequently statistics Mining truely has long roots from fields together with artificial intelligent, gadget gaining knowledge of, statistics and databases. Data mining is the process of applying this approach to information which will uncovering hidden patterns. With other meanings information mining is the process for extracting patterns from the statistics. Statistics mining will become an increasingly critical device for converting statistics into data.
It is often utilized in diverse profile practices, such as advertising, surveillance, fraud detection and scientific discovery. It has been used for years by means of businesses, scientists and governments to filter the extent of statistics including flight Passenger travel records, census data and supermarket information scanners to generate market research reports. The main cause for the usage of information mining is to assist inside the analysis of behavioral statement collections. The facts is vulnerable to co linearity due to known association together with described into figure  1.

Data Mining Functions
Data mining features or obligations may be used to specify the types of styles or information to be discovered for the duration of records mining. Some of the main records mining features are classification, clustering, category, outlier analysis, regression and prediction, and so forth. [1]

a) Classification
A classification algorithm is used to predict the data classes [6]. To date, a large collection of classification algorithms (or classifiers) have been proposed by researchers [2,7]. Some popular classification algorithms are summarized.

b) Outlier analysis
The outlier's location unit typically discarded by way of maximum of understanding mining strategies as noise or exceptions. Every so often, outliers would possibly have additional facts compared to opportunity data objects. So outlier analysis is very crucial for some application regions like intrusion detection, fraud detection, anomaly detection, etc.
Numerous records processing techniques generally use bunch to hit upon the outliers as a noise.
The outlier detection techniques is assessed as classification-based strategies, applied arithmetic techniques, clustering-primarily based techniques, supervised, semi-supervised and unsupervised strategies, deviation-primarily based strategies and proximity-primarily based strategies [2] c) Regression Regression predicts the really worth of characteristic supported regression technique(s) over time. The long run values of variables area unit foretold with the help of historical statistic plot. Analytic wondering (also called as evolution analysis) discovers captivating patterns in the evolution history of the items. Identification of patterns in accomplice degree item's evolution and matching of the items' dynamical tendencies place unit the 2 predominant aspects of analytic wondering. A trend of the items, whose behavior evolves over the years, is delineated victimization analytic wondering and regression Volume 5, Issue 3, May-June 2023 4 fashions. Analytic wondering exposes time-varying developments of the statistics items at intervals the dataset. The association analysis can even be used for evolution evaluation [5].

d) Prediction
Regression evaluation is accustomed model the hyperlink among one or extra freelance or predictor variables and a mounted or reaction variable (that's non-stop-valued

Classification by Decision Trees
Decision trees are the most typically used due to its ease of implementation and simplicity in expertise compared to different class algorithms. Decision Tree type algorithm may be applied in a serial or parallel fashion based totally on the quantity of statistics, reminiscence area to be had at the pc resource and scalability of the algorithm. [8] Selection tree set of rules is a information mining induction strategies that recursively walls a facts set of data using intensity first grasping method or breadth-first approach till all the records gadgets belong to a particular class. [9] A decision tree shape is manufactured from root, internal and leaf nodes.
The tree shape is utilized in classifying unknown records statistics. At every inner node of the tree, a choice of great break up is made using impurity measures. The tree leaves is made up of the magnificence labels in which the statistics items had been grouped. Choice tree may be a nonparametric supervised studying formula, that is used for each type and regression responsibilities. It's a gradable, tree shape that consists of a root node, branches, internal nodes and leaf nodes. As we are now capable of see from the subsequent diagram.

Types of decision trees
Hunt's set of rules, developed within the Sixties to version human studying in psychology, is the basis of many famous choice tree algorithms, which include the following: ID3: Ross Quinlan is credited with growing ID3, which stands for "Iterative Dichotomiser 3". This algorithm exploits entropy and gain information as metrics to assess candidate splits. You can locate a number of Quinlan's research in this algorithm from 1986 C4.5: This algorithm is taken into consideration a later version of ID3, also advanced with the aid of Quinlan. It may use gain or gain fee records to evaluate cut up factors inside the decision tree.

CART:
The term CART is an acronym for "classification and regression tree" and was brought by Leo Breiman. This algorithm commonly uses Gini impurities to decide the suitable assets for separation. The Gini impurity measures how regularly a randomly decided on attribute is misclassified. Whilst comparing the use of Gini impurities, a lower fee could be more best. The subsequent table suggests the contrast of parameters between distinct selection tree algorithms. Those algorithms are some of the most influential data mining algorithms in the studies community [4].

SLIQ:
SLIQ is a selection tree classifier that can manage both numeric and categorical attributes. It uses a singular pre-sorting approach inside the tree-increase section. This sort-ing method is included with a breadth-fist tree growing approach to allow classification of disk-resident datasets.

Sprint:
Dash algorithm is a classical set of rules for building a selection tree that is a widely used approach of information class. But, the sprint algorithm has high computational cost within the calculation of characteristic segmentation.
The dash set of rules has many advantages. This set of rules is unrestricted by means of reminiscence, and it is a type of scalable and parallel method of building choice trees. But there are also a few shortcomings. For instance, finding the nice segmentation factor of discrete attributes needs a big quantity of calculation, and the partition of non-stop attributes is unreasonable.  attribute. The set of rules makes use of a grasping search, that is, it alternatives the first-class attribute and in no way seems lower back to reconsider earlier choices.
The significant precept of ID3 algorithm is based on data concept.

ALGORITHM
Step 1: Begin with calculate class entropy.
Step 2: select the attributes and for every characteristic, calculate information gain Step 3: maximum statistics benefit attributes are found out.
Step 4: put off node attribute, for future calculation. Repeat steps 2-four until all attribute were used. T_A-target attribute is the attribute whose fee is to be expected with the aid of the tree. [5] A-Attributes are the listing of attributes which can be examined by using the discovered selection tree.  2. It takes the much less reminiscence to huge program execution.
3. It takes much less version build time.
4. It has short searching time.

CART Algorithm
CART is a predictive set of rules utilized in gadget mastering and it explains how the goal variable's values may be anticipated based totally on different subjects. It is a decision tree in which every fork is split into a predictor variable and each node has a prediction for the goal variable on the give up.
In the selection tree, nodes are break up into sub-nodes on the basis of a threshold price of an attribute. The foundation node is taken because the schooling set and is break up into by thinking about the first-class attribute and threshold fee. Similarly, the subsets also are split using the identical good judgment. This keeps until the ultimate natural sub-set is discovered within the tree or the maximum variety of leaves feasible in that developing tree.
The CART set of rules works via the subsequent manner:  The exceptional split factor of each enter is obtained. CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by searching for the best homogeneity for the sub nodes, with the help of the Gini index criterion.

Gini index/Gini impurity
The Gini index is a metric for the type responsibilities in CART. It shops the sum of squared chances of every magnificence. It computes the diploma of probability of a particular variable this is wrongly being categorized while selected randomly and a variation of the Gini coefficient. It really works on specific variables, gives effects both -a success‖ or -failure‖ and therefore conducts binary splitting only.
The degree of the Gini index varies from zero to at least one,  wherein zero depicts that each one the factors are allied to a certain elegance, or handiest one magnificence exists there.
 The Gini index of fee 1 signifies that all the factors are randomly dispensed throughout diverse lessons, and  A fee of zero.Five denotes the factors are uniformly disbursed into a few lessons.
Mathematically, we will write Gini Impurity as follows:: where pi is the probability of an object being classified to a particular class.

CART model illustration
CART models are shaped by choosing input variables and evaluating cut up points on the ones variables until the right tree is produced.
Steps to create a decision Tree the usage of the CART set of rules:  Greedy Algorithm: on this The enter area is divided using the grasping method which is referred to as a recursive binary spitting. This is a numerical method inside which all the values are aligned and numerous other split points are attempted and assessed using a price characteristic.
 Preventing Criterion: because it works its manner down the tree with the schooling statistics, the recursive binary splitting method defined above should know when to stop splitting. The maximum common halting method is to make use of a minimum amount of schooling statistics allotted to each leaf node. If the depend is smaller than the specified threshold, the cut up is rejected and additionally the node is considered the remaining leaf node.
 Tree pruning: decision tree's complexity is described as the variety of splits within the tree.
Trees with fewer branches are recommended as they're easy to grasp and less susceptible to cluster the statistics. Operating through each leaf node inside the tree and evaluating the effect of deleting it the usage of a keep-out test set is the fastest and only pruning approach.

Merits of CART
 Results are simplistic.
 Category and regression timber are nonparametric and nonlinear.
 Category and regression trees implicitly perform feature selection.
 Outliers have no meaningful effect on cart.
 It requires minimal supervision and produces clean-to-apprehend fashions.

Demerits of CART
 Overfitting.
 Low bias.  The tree shape can be risky.

Applications of the CART algorithm
 For brief records insights.
 In Blood Donors class.
 For environmental and ecological information.
 Within the monetary sectors.

Supervised learning in quest (SLIQ) algorithm
SLIQ is a classifier of decision tree, which could take each numerical and express attributes it builds compact and correct bushes. Pre-sorting technique is used in the tree growing section and an less expensive pruning set of rules. It is suitable for classification of big disk-resident datasets, separately of the wide variety of instructions, attributes and information [21].

Tree Building
Make tree (education facts t)

Partition (facts s)
If (all points in s are within the identical class)

Then go back;
Compare splits for every characteristic a; Use first-class break up to separation s into s1 and s2; Partition (s1); Partition (s2); The gini-index is used to evaluate the -goodness‖ of the opportunity splits for an characteristic If a information set t includes examples from n training, gini(t) is given as Where pj is the relative frequency of sophistication j in t. After splitting t into subset t1 and t2 the gin index of the break up statistics is defined The primary technique applied by using SLIQ is a scheme that removes the need to sort statistics at each  [20].
A selection tree classifier is built in two levels [3] [2]: a boom segment and a prune segment. Within the boom phase, the tree is built by way of recursively partitioning the facts till each partition is both -pure‖ (all participants belong to the same class) or small enough (a parameter set through the user).
This procedure is shown in discern. The shape of the break up used to partition the statistics relies upon on the form of the characteristic used within the split. Splits for a continuous attribute A are of the form cost(A) < c in which t is a cost in the area of A. Splits for a specific characteristic A are of the form price(A) E X where X C domain(A). We don't forget only binary splits because they typically cause greater correct timber; but, our strategies may be prolonged to deal with multi-manner splits. Once the tree has been completely grown, it is pruned inside the 2nd segment to generalize the tree by casting off dependence on statistical noise or variation that can be precise best to the training set. The tree growth section is computationally lots extra high-priced than pruning, for the reason that facts is scanned more than one instances on this part of the computation. Pruning requires get right of entry to best to the fully grown selection tree. Our experience based on our preceding paintings on SLIQ has been that the pruning segment usually takes much less than 1% of the total time needed to construct a classifier. We consequently awareness most effective at the tree-boom section. For pruning, we use the set of rules utilized in SLIQ, that's based totally on the minimum Description length principle.
Remember, as an instance, the credit score trouble, wherein a credit score employer desires to classify clients based on a schooling database containing information approximately them. The classification tree is generated in a top down fashion as follows: The facts is recursively Partitioned until either every partition is suciently ‗natural'(parameterized by way of a person designated self assurance ), or is too small to yield statistically sizeable effects. If neither of The above two standards keep, the quality viable cut up is choosen (as an example, education degree (e-stage) at root node in and facts is partitioned in keeping with that split. We shall see in Segment 2. The famous CART [18] and C4.5 [17] classifiers, for example, develop timber depth-first and repeatedly kind the records at every node of the tree to reach on the first-rate splits for numeric attributes. SLIQ, however, replaces this repeated sorting with one-time sort by the usage of separate lists for every characteristic (see [19] for info). SLIQ uses a statistics structure called a class listing which have to stay reminiscence resident always. The dimensions of this structure is proportional to the variety of parent three: example of attribute lists enter information, and this is what limits the Wide variety of input facts that SLIQ can deal with. Sprint addresses the above two issues in another way from preceding algorithms; it has no restriction on the dimensions of input and but is a Volume 5, Issue 3, May-June 2023 15 quick set of rules. It stocks with SLIQ the benefit of a one-time kind, but uses specific-facts structures.
Particularly, there is no shape just like the class list that grows with the size of enter and wishes to be reminiscence resident.

Merits
 Removes all of the memory restrictions  Fast and scalable and it can easily parallelized.

Conclusions
Decision trees are truly responding to a problem of discrimination is one of the few methods that can be supplied fast sufficient to a non-specialist audience data processing with out getting misplaced in hard to understand mathematical formulations. Here in this paper, we discussed about the various types of decision trees and the importance of its algorithms along with the merits and demerits. Some of the applications are also given for the decision trees.