02 visualizing a classifier – a decision tree

We know that the key part of ML is to train a classifier – to create a function that contains a box of rules that were learnt from examples. We’ll visualize a classifier to show how it works under the hood:

how to create rules from examples (subsequent episodes)
what the rules look like in the box (in the classifier)
how to classify a new input

The following example shows the ML program trains a classifier for the Iris flower. It just loads the data set, split into two part: training data and testing data, train the classifier and then test the classifier with the testing data.

import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()
test_idx = [0, 50, 100]

# training data
train_data = np.delete(iris.data, test_idx, axis=0)
train_target = np.delete(iris.target, test_idx)

# testing data
test_data = iris.data[test_idx]
test_target = iris.target[test_idx]

clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_target)

print("the meta data: features and lables")
print(iris.feature_names)
print(iris.target_names)
print("the testing data & target and the prediction from the classifier")
print(test_data, test_target)
print(clf.predict(test_data))
print("the testing data of index 1 to show mechanism manually")
print(test_data[1])
print(test_target[1]

from sklearn.tree import export_graphviz
with open(r".\tree.dot", 'w') as f:
    export_graphviz(clf,
                     out_file=f,
                     feature_names=iris.feature_names[:],
                     class_names=iris.target_names,
                     rounded=True,
                     filled=True)

### the following is to run the program
(base) D:\learning\machine learning\ML recipes\codes>python viz.py
the meta data: features and lables
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
['setosa' 'versicolor' 'virginica']
the testing data & target and the prediction from the classifier
[[5.1 3.5 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.3 3.3 6.  2.5]] [0 1 2]
[0 1 2]
the testing data of index 1 to show mechanism manually
[7.  3.2 4.7 1.4]
1

we can convert the dot file to png: “dot -Tpng tree.dot -o tree.png”. from the picture, we can see the structure of the classifier. Each node contains a rule with a YES or NO question. The “predictor” takes an input, traverses through the nodes tree and stops in a leaf which gives the result. For example, for the testing_data[1], we have its petal length – 4.7 and petal width – 1.4. Let’s traverse the nodes tree manually:

pl <= 2.45 ? No –> goto right
pw <= 1.75 ? Yes –> goto left
pl <= 4.95 ? Yes –> goto left
pw <= 1.65 ? Yes –> goto left and we get the result leaf

we have the label - versicolor – the same result with the ML program. The predictor works the same way as the above steps.

visualizing the classifier episode

video of this episode