Randomized methods are a recent trend in practical machine learning [1]. They enable the high performance of complex non-linear methods without the high computational cost of their optimization. Current most prominent examples are randomized neural networks, in both feed-forward [2] and recurrent [3] forms. For the latter, the randomized approach provided an efficient training method for the first time, and enabled achieving state-of-the-art performance in multiple areas [4].
Random forest [5] is one of the best methods for Big Data processing due to its adaptive nearest neighbour behavior [6]. The forest predicts an output based only on local data samples. Such an approach works the better the more training data is available, thus making for a perfect supervised method for Big Data. Knearest neighbors algorithm benefits from more data as the data itself is the model, but Random Forest avoids the quadratic scaling of k-Nearest neighbors in terms of the data samples, that makes it prohibitively slow for large-scale problems.
Decision tree [7] is a building block of Random Forest. A deep decision tree has high variance but low bias. An ensemble of multiple such trees reduces variance, and improves the prediction performance. Additional measures are taken to make the trees in an ensemble as different as possible, including random subsets of features and boosting [8].
The paper proposes a merge between random methods and a decision tree, called an Extreme Learning Tree (ELT). The method builds a tree using expanded data features from an Extreme Learning Machine [9], by splitting nodes on a random feature at a random point. The result is an Extremely Randomized Tree [10]. Then a linear observer is added to the leaves of the tree, that learns a linear projection from the leaves to the target outputs. Each tree leaf is represented by its index, in the one-hot encoding format.
Extreme Learning Tree consists of three parts. First, it generates random data features using an Extreme Learning Machine (ELM) [11]. Second, it builds a random tree from these features, similar to Extremely Randomized Trees [10]. Each data sample is then represented by the index of its leaf from the tree, in one-hot encoding. Third, a linear regression is learned from the dataset in that one-hot encoding to the target outputs.
ELT follows the random methods paradigm as it has an untrained random part (the tree), and a learned linear observer (a linear regression model from leaves of the tree to the target outputs).
An ELT tree has two hyper parameters: the minimum node size, and the maximum thee depth. A node data is split by a random feature using a random split point. Split points that generates nodes under the minimum size are rejected. Nodes that reach the maximum depth or under twice the minimum size become leafs. Node splitting continues until there are non-leaf terminal nodes.
The Extreme Learning Tree is tested on well-known Iris flower dataset [12], in comparison with a Decision Tree, an L2 regularized ELM [13], and Ridge regression. Decision Tree implementation is from the Scikit-Learn library.
The random tree in the ELT method splits data samples into groups of similar ones. The resulting structure in the original data space is shown on Figure 1. The tree works as a adaptive nearest neighbour, combining together similar samples. Then the target variable information from these samples is used by a linear observer to make predictions.
A formal performance comparison is done on Iris dataset. The data is randomly split into 70% training and 30% test sets, and the test accuracy is calculated for all the methods. The whole experiment is repeated 100 times. Mean accuracy and its standard deviation are presented in Table 1.
Table 1: Average accuracy and its standard deviation on a test subset of Iris dataset. Method Accuracy std, %
In this experiment, an Extreme Learning Tree performs under ELM and Decision Tree methods. However, it outperforms a linear model (in the form
Figure 1: Leaf structure of an ELT, each color represents a different leaf. The random tree works as an approximated nearest neighbour method, joining together similar data samples.
of Ridge regression) by a significant margin. Outperforming a linear model is an achievement for a single ELT, as it represents each data sample by a single number – an index of its leaf in the tree.
Decision surface of ELT is visualized on Figure 2. The boundaries between classes have complex shape, but the classes are unbroken. Class boundaries of the original Decision Tree (shown on Figure 3) break into each other creating false predictions. They are always parallel to an axis, while ELT learns class boundaries of an arbitrary shape.
The paper proposes a new version of decision tree, that follows the random methods paradigm. It consists of an untrained random non-linear tree, and a learned linear observer. The method provides decision boundaries of a complex shape and with less noise than an original decision tree. It outperforms a purely linear model in accuracy despite representing the data samples only by a corresponding tree leaf index.
Future works will examine an application of Extreme Learning Tree to an ensemble method similar to Random Forest.
[1] Gallicchio, C., Martin-Guerrero, J.D., Micheli, A., Soria-Olivas, E.: Ran- domized machine learning approaches: Recent developments and challenges. In: ESANN 2017 Proceedings, European Symposium on Artifi-cial Neural Networks, Computational Intelligence and Machine Learning. d-side publi., Bruges, Belgium (26-28 April 2017) 77–86
[2] Huang, G.B.: What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Computation 7(3) (2015) 263–278
1
2
3
4
0
1
2
3
4
5
6
Figure 2: Decision surface of an ELT on Iris dataset, using different pairs of features. Different colors correspond to the three different classes of Iris flowers.
1
2
3
4
0
1
2
3
4
5
6
Figure 3: Decision surface of a Decision Tree on Iris dataset, using different pairs of features. Note that all decision boundaries are parallel to axes.
[3] Lukoˇseviˇcius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Computer Science Review 3(3) (August 2009) 127–149
[4] Jaeger, H., Haas, H.: Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 304(5667) (April 2004) 78
[5] Tin Kam Ho: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8) (August 1998) 832–844
[6] Lin, Y., Jeon, Y.: Random Forests and Adaptive Nearest Neighbors. Jour- nal of the American Statistical Association 101(474) (June 2006) 578–590
[8] Breiman, L.: Random Forests. Machine Learning 45(1) (2001) 5–32
[9] Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 42(2) (April 2012) 513–529
[10] Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63(1) (2006) 3–42
[11] Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: Theory and applications. Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN ’04)7th Brazilian Symposium on Neural Networks 70(1–3) (December 2006) 489–501
[12] FISHER, R.A.: THE USE OF MULTIPLE MEASUREMENTS IN TAX- ONOMIC PROBLEMS. Annals of Eugenics 7(2) (1936) 179–188
[13] Miche, Y., van Heeswijk, M., Bas, P., Simula, O., Lendasse, A.: TROP- ELM: A double-regularized ELM using LARS and Tikhonov regularization. Advances in Extreme Learning Machine: Theory and Applications Biological Inspired Systems. Computational and Ambient Intelligence Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009) 74(16) (September 2011) 2413–2421