Dynamic selection (DS) has become an active research topic in the multiple classifier systems literature in recent years. In this paradigm, one or more base classifiers1 are selected for each query instance to be classified. Such techniques have demonstrated improvements over traditional (static) combination approaches, such as majority voting and Boosting (Cruz et al., 2018). DS techniques work by estimating the competence level of each classifier from a pool of classifiers. Only the most competent, or an ensemble containing the most competent classifiers is selected to predict the label of a specific test sample. The rationale for such techniques is that not every classifier in the pool is an expert in classifying all unknown samples; rather, each base classifier is an expert in a different local region of the feature space.
In this paper, we introduce a library for dynamic ensembles in python: DESlib. The library contains the implementation of the key dynamic selection techniques in the literature. It also provides static ensemble methods which are often used as baseline comparisons for dynamic ensembles. The following sections present the project organization, the API design, currently implemented methods and future directions for the API.
DESlib was developed with two objectives in mind: to make it easy to integrate Dynamic Selection algorithms to machine learning projects, and to facilitate research on this topic, by providing implementations of the main DES and DCS methods, as well as the commonly used baseline methods. Each algorithm implements the main methods in the scikit-learn API (Pedregosa et al., 2011): fit(X, y), predict(X), predict_proba(X) and score(X, y). Any classifier from scikit-learn (or from other libraries that follow this API) can be used as base classifiers, making the library easy to use and to integrate in other projects.
The implementation of the DS methods is modular, following a taxonomy defined in Cruz et al. (2018). This taxonomy considers the main characteristics of DS methods, that are centered in three components: (1) the methodology used to define the local region, in which the competence level of the base classifiers are estimated (region of competence); (2) the source of information used to estimate the competence and (3) the selection approach to define the best classifier (for DCS) or the best set of classifiers (for DES). This modular approach makes it easy for researchers to implement new DS methods, in many cases requiring only the implementation of methods estimate_competence and select.
The library is written in pure python, working on any platform, and depends on the following python packages: scikit-learn, numpy and scipy. The project follows these guidelines:
• Development: All development is performed collaboratively using GitHub and Gitter, which facilitates code integration, communication between collaborators and issue tracking. External contributions are encouraged.
• Code quality: The code was written following the PEP 8 standards. We use Codacy2 to measure and track code quality. The library is also covered by unit tests (py.test), using Travis CI. Moreover, Codacy and Travis CI are used to automatically check each new contribution according to the code quality and test coverage.
• Documentation: The code of DESlib is fully documented, including detailed instructions and examples for using the API. The documentation is provided based using numpydoc and sphinx, being automatically updated with new developments. It is available online at http://deslib.readthedocs.io/en/latest/
• Bugs and new features: Bugs and new feature requests are tracked through the project’s GitHub page: https://github.com/scikit-learn-contrib/DESlib/issues. This environment allows a discussion between the collaborators to find the best solution for the problem. New users can check whether the problems they found or new requests are already being addressed.
• Project relevance: At the edition time, the library is on its third release (v0.3), counts with 7 contributors (2 main and 5 external), and attracts about 500 new visitors weekly. Moreover, it is part of the scikit-learn-contrib supported projects.
The library is divided into three modules:
• Dynamic Classifier Selection (DCS): This module contains the implementation of techniques in which only the base classifier that attained the highest competence level is selected for the classification of the query.
• Dynamic Ensemble Selection (DES): Dynamic ensemble selection strategies refer to techniques that select an ensemble of classifier rather than a single one. All base classifiers that attain a minimum competence level are selected to compose the ensemble of classifiers.
• Static Ensembles: This module provides the implementation of static ensemble techniques that are usually used as a baseline for the comparison of DS methods: Single Best (SB), Static Selection (SS), Oracle and Stacked Generalization.
Tables 1 and 2 list the implemented DS and baseline methods, respectively.
Table 1: Implemented DES and DCS methods
Table 2: Implemented baseline methods
The library also provides several state-of-the-art improvements to DS techniques, such as the online Dynamic Frienemy Pruning (DFP) algorithm used in the FIRE-DES framework (Oliveira et al., 2017; Cruz et al., 2019), as well as dynamic weighting and hybrid selection + weighting Cruz et al. (2015b) versions of DES techniques.
The latest stable version of the library can be installed using pip (Python package manager): pip install deslib. Alternatively, the master branch, which contains features that will be included in future releases, can be installed directly from the GitHub address: pip install git+https://github.com/scikit-learn-contrib/deslib. New features are only merged to the master branch after code review and the creation of unit tests.
4.1 Usage
Each implemented method receives as an input a list of classifiers. This list can be either homogeneous (i.e., all base classifiers are of the same type) or heterogeneous (base classifiers of different types). The library supports any type of base classifiers from scikit-learn.
After instantiation, the method fit(X, y) is used to fit the Dynamic Selection method. Predictions for new examples can then be obtained with predict(X) and predict_proba(X). In the example below, we show how to use the library, with a given Training (X_train, y_train), and Testing (X_test, y_test) datasets. The META-DES (Cruz et al., 2015a) technique is used in this example:
As of version 0.3, each implemented method comes with a list of default values, not requiring a trained list of classifiers as input. In such case, the pool of classifiers is trained together with the DS algorithm inside the fit method. More examples of using different aspects of the library can be found on https://deslib.readthedocs.io/en/latest/auto_examples/index.html.
In this paper, we introduced the DESlib, a Python library with the implementation of the state-of-the-art dynamic classifier and ensemble selection techniques. The project is fully compatible with the scikit-learn API and is part of the scikit-learn-contrib supported projects. Future work on this library includes the implementation of dynamic selection methods in different contexts, such as One-Class-Classification (OCC) and regression.
B. Antosik and M. Kurzynski. New measures of classifier competence – heuristics and application to the design of multiple classifier systems. In Computer Recognition Systems 4, pages 197–206. 2011.
P. R. Cavalin, R. Sabourin, and C. Y. Suen. Dynamic selection approaches for multiple classifier systems. Neural Computing and Applications, 22(3-4):673–688, 2013.
R. M. O. Cruz, R. Sabourin, G. D. C. Cavalcanti, and Tsang Ing Ren. META-DES: A dynamic ensemble selection framework using meta-learning. Pattern Recognition, 48(5):1925–1935, 2015a.
R. M. O. Cruz, R. Sabourin, and G. D.C. Cavalcanti. Dynamic classifier selection: Recent advances and perspectives. Information Fusion, 41:195 – 216, 2018.
Rafael M. O. Cruz, Robert Sabourin, and George D. C. Cavalcanti. META-DES.H: A dynamic ensemble selection technique using meta-learning and a dynamic weighting approach. In International Joint Conference on Neural Networks, pages 1–8, 2015b.
Rafael M. O. Cruz, Dayvid V. R. Oliveira, George D. C. Cavalcanti, and Robert Sabourin. FIRE-DES++: Enhanced online pruning of base classifiers for dynamic ensemble selection. Pattern Recognition, 85: 149–160, 2019.
L. Didaci et al. A study on the performances of dynamic classifier selection based on local accuracy estima- tion. Pattern Recognition, 38(11):2188–2191, 2005.
Salvador García, Zhong-Liang Zhang, Abdulrahman Altalhi, Saleh Alshomrani, and Francisco Herrera. Dy- namic ensemble selection for multi-class imbalanced datasets. Information Sciences, 445:22–37, 2018.
G. Giacinto and F. Roli. Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognition, 34:1879–1881, 2001.
A. H. R. Ko, R. Sabourin, and A. S. Britto Jr. From dynamic classifier selection to dynamic ensemble selection. Pattern Recognition, 41:1735–1748, 2008.
Ludmila I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2):281–286, 2002.
D. V.R. Oliveira, G. D.C. Cavalcanti, and R. Sabourin. Online pruning of base classifiers for dynamic ensem- ble selection. Pattern Recognition, 72:44 – 58, 2017.
F. Pedregosa et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825–2830, 2011.
D. Ruta and B. Gabrys. Classifier selection for majority voting. Inf. Fusion, 6(1):63–81, 2005.
M. Sabourin, A. Mitiche, D. Thomas, and G. Nagy. Classifier combination for handprinted digit recognition. Intl. Conf. on Document Analysis and Recognition, pages 163–166, 1993.
Paul C Smits. Multiple classifier systems for supervised remote sensing image classification based on dy- namic classifier selection. IEEE Trans. on Geoscience and Remote Sensing, 40(4):801–813, 2002.
M. C. P. Souto et al. Empirical comparison of dynamic classifier selection methods based on diversity and accuracy for building ensembles. In International Joint Conference on Neural Networks, pages 1480– 1487, 2008.
Mariana A Souza, George DC Cavalcanti, Rafael MO Cruz, and Robert Sabourin. Online local pool genera- tion for dynamic classifier selection. Pattern Recognition, 85:132–148, 2019.
T. Woloszynski and M. Kurzynski. On a new measure of classifier competence applied to the design of multiclassifier systems. In International Conference on Image Analysis and Processing (ICIAP), pages 995–1004, 2009.
T. Woloszynski and M. Kurzynski. A probabilistic model of classifier competence for dynamic ensemble selection. Pattern Recognition, 44:2656–2668, October 2011.
T. Woloszynski et al. A measure of competence based on random classification for dynamic ensemble selec- tion. Information Fusion, 13(3):207–213, 2012.
David H Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992.
K. Woods, W. P. Kegelmeyer, and K. Bowyer. Combination of multiple classifiers using local accuracy estimates. IEEE Trans. on PAMI, 19:405–410, April 1997.