Multiple Classifier System for Handling Imbalanced and Overlapping Datasets on Multiclass Classification


  • Dessy Siahaan IPB University
  • Anwar Fitrianto IPB University
  • Khairil Anwar Notodiputro IPB University



Multiple Classifier System (MCS), imbalanced datasets, overlapping datasets, multiclass classification


The performance of classification models suffer when the dataset contains imbalanced and overlapping data. These two conditions are already challenging separately and even more complex if they occur together. In the research, an ensemble method called a Multiple Classifier System was proposed to address these issues by combining K-Nearest Neighbour and Logistic Regression. The Synthetic Minority Oversampling Technique (SMOTE) method was also applied to balance the dataset. The One Versus One (OVO) decomposition technique helped the multiclass classification process. A simulation with 18 scenarios proves that the MCS-SMOTE model can handle these problems by providing good performance. The model’s performance is also tested using empirical data on Poverty in West Java in 2021. Empirical data also show that the proposed method performs well, with an accuracy rate of 80.09%, an F1 score of 0.782, and a G-Mean of 0.242. The areas with the highest poverty rates are Bogor, Bekasi City, Bandung City, Bekasi Regency, and Depok City, located near DKI Jakarta, the capital city. Based on existing predictor variables, poor households in West Java are more likely to occur when they do not have access to credit, the number of household members is more than three, multiple families live in one building, and the head of the household has not graduated from elementary school.


Plum Analytics

Author Biographies

Dessy Siahaan, IPB University

Statistics and Data Science, Department of Statistics, Faculty of Mathematics and Natural Sciences

Anwar Fitrianto, IPB University

Statistics and Data Science, Department of Statistics, Faculty of Mathematics and Natural Sciences

Khairil Anwar Notodiputro, IPB University

Statistics and Data Science, Department of Statistics, Faculty of Mathematics and Natural Sciences


Aldania, A. N. A., Soleh, A. M., & Notodiputro, K. A. (2023). A comparative study of CatBoost and double random forest for multi-class classification. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 7(1), 129–137.

Badan Pusat Statistik Kabupaten Pesisir Selatan. (2023). Garis kemiskinan (Rupiah), 2020-2022.

Brereton, R. G. (2021). Contingency tables, confusion matrices, classifiers and quality of prediction. Journal of Chemometrics, 35(11), 1–5.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

De Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J., & Moguerza, J. M. (2022). General performance score for classification problems. Applied Intelligence, 52, 12049–12063.

Djamaluddin, S. (2017). How to lower the poverty?: Population control and increase of asset ownership. Signifikan: Jurnal Ilmu Ekonomi, 6(2), 267–288.

Esteves, V. M. S. (2020). Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. Universidade Do Porto.

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Imbalanced classification with multiple classes. In Learning from imbalanced data sets (pp. 197–226). Springer International Publishing.

Galar, M., Fernández, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition, 44(8), 1761–1776.

Ishak, N. A., Ng, K. H., Tong, G. K., Kalid, S. N., & Khor, K. C. (2022). Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system. F1000Research, 11, 11–71.

Kalid, S. N., Ng, K. H., Tong, G. K., & Khor, K. C. (2020). A multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access, 8, 28210–28221.

Lango, M., & Stefanowski, J. (2022). What makes multi-class imbalanced problems difficult? An experimental study. Expert Systems with Applications, 199.

Meidianingsih, Q., & Meganingtyas, D. E. W. (2022). Analisis perbandingan performa metode ensemble dalam menangani imbalanced multi-class classification. Jurnal Aplikasi Statistika dan Komputasi Statistik, 14(1), 13–21.

Ongko, E., & Hartono. (2021). Hybrid approach redefinition-multi class with resampling and feature selection for multi-class imbalance with overlapping and noise. Bulletin of Electrical Engineering and Informatics, 10(3), 1718–1728.

Pensasaran Percepatan Penghapusan Kemiskinan Ekstrem. (2022). Tanya/jawab layanan data P3KE. Kementerian Koordinator Bidang Pembangunan Manusia dan Kebudayaan.

Pradana, Z. H., Nafi’ah, H., & Rochmanto, R. A. (2022). Chatbot-based information service using RASA open-source framework in Prambanan Temple tourism object. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6(4), 656–662.

Rosita, A. A., Kurnia, A., & Djuraidah, A. (2022). Evaluation of ensemble method for multiclass classification on unbalanced data. In AIP Conference Proceedings (Vol. 2662, No. 1). AIP Publishing.

Sharma, N., Chakrabarti, A., Balas, V. E., & Martinovic, J. (Eds.). (2021). Data management, analytics and innovation: Proceedings of ICDMAI 2020 (Vol. 2). Springer.

Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: An experimental review. Journal of Big Data, 7, 1–47.

Vuttipittayamongkol, P., Elyan, E., & Petrovski, A. (2021). On the class overlap problem in imbalanced data classification. Knowledge-Based Systems, 212.






Abstract 83  .
PDF downloaded 62  .