ID3 Desition Tree
ID3 stands for Iterative Dichotomiser 3 and is so named because the algorithm iteratively (repeatedly) dichotomizes (divides) features into two or more groups at each step. Generally,ID3 is only used for classification problems with nominal features only. In this project, some databases will be used from which the gain and entropy will be calculated to build the id3 decision tree, which will tell us what action should be taken based on certain conditions.
Datasets
The databases that were used in this project were a well-known climate, and playing golf, added with one on heart disease
Data preparation
Class balance and missing values
Due to the early stage of the course, databases that did not have missing data or that their classes were balanced were selected. For the three databases, they did not show missing data, however, the class balance was unbalanced. On this occasion, synthetic data were not created because this topic was not yet known.
Data preparation
Heart Desease
For the heart disease database, a categorization from numeric to nominal values had to be performed, since the decision tree works better with categorical variables. For the two databases they were used without modifications.
Results
After calculating the entropy and gain at each iteration of a recursive algorithm, a dictionary-shaped decision tree was obtained. To know if the result is correct, a coverage test must be carried out to verify if the tree, based on the conditions of the database, does generate the expected result.