top of page

ID3 Desition Tree

ID3 stands for Iterative Dichotomiser 3 and is so named because the algorithm iteratively (repeatedly) dichotomizes (divides) features into two or more groups at each step. Generally,ID3 is only used for classification problems with nominal features only. In this project, some databases will be used from which the gain and entropy will be calculated to build the id3 decision tree, which will tell us what action should be taken based on certain conditions.

dataset.png

Datasets

The databases that were used in this project were a well-known climate, and playing golf, added with one on heart disease

Data preparation

Class balance and missing values

Due to the early stage of the course, databases that did not have missing data or that their classes were balanced were selected. For the three databases, they did not show missing data, however, the class balance was unbalanced. On this occasion, synthetic data were not created because this topic was not yet known.

bases de datos id3.png

Data preparation

preparacion-heart desease.png

Heart Desease

For the heart disease database, a categorization from numeric to nominal values had to be performed, since the decision tree works better with categorical variables. For the two databases they were used without modifications.

Results

After calculating the entropy and gain at each iteration of a recursive algorithm, a dictionary-shaped decision tree was obtained. To know if the result is correct, a coverage test must be carried out to verify if the tree, based on the conditions of the database, does generate the expected result.

bottom of page