Question Categorization using Lexical Feature in Opini.id
Keywords:text classification, Bag of Concept, Naïve Bayes, Support Vector Machine (SVM), J48 Tree
This research aimed to categorize questions posted in Opini.id. N-gram and Bag of Concept (BOC) were used as the lexical features. Those were combined with Naïve Bayes, Support Vector Machine (SVM), and J48 Tree as the classification method. The experiments were done by using data from online media portal to categorize questions posted by user. Based on the experiments, the best accuracy is 96,5%. It is obtained by using the combination of Bigram Trigram Keyword (BTK) features with J48 Tree as classifier. Meanwhile, the combination of Unigram Bigram (UB) and Unigram Bigram Keyword (UBK) with attribute selection in WEKA achieves the accuracy of 95,94% by using SVM as the classifier.
Desilia, Y., Utami, V. T., Arta, C., & Suhartono, D. (2017). An attempt to combine features in classifying argument components in persuasive essays. In 17th Workshop on Computational Models of Natural Argument (CMNA). London, United Kingdom.
Garcia, M. M., Rodriguez, R. P., & Anido, L. (2015). Bag of concepts document representation for textual news classification. International Journal of Computational Linguistics and Applications, 6(1), 173-188.
Gunawan, A. A. S., Tania, & Suhartono, D. (2016). Recommender system for product offering by personalized email. In 1st International Workshop on Big Data and Information Security (IWBIS). Jakarta, Indonesia.
Hanafi, A., Whidiana, R., & Dayawati, R. N. (2009). Pengenalan bahasa suku bangsa Indonesia berbasis teks menggunakan metode N-Gram (Skripsi). Bandung: Telkom University.
Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions on Computers, 4(8), 966-974.
Jovita, Linda, Hartawan, A. & Suhartono, D. (2015). Using vector space model in question answering system. Procedia Computer Science, 59, 305-311.
Kaur, G., & Chhabra, A. (2014). Improved J48 classification algorithm for the prediction of diabetes. International Journal of Computer Applications, 98(22), 13-17.
Movementi, S. (2015). Opini.id unggulkan fitur polling. Retrieved from https://tekno.tempo.co/read/news/2015/02/26/072645583/opini-id-unggulkanfitur-polling
Nazief, B., & Adriani, M. (1996). Confixstripping: Approach to stemming algorithm for Bahasa Indonesia. Jakarta: Faculty of Computer Science, University of Indonesia.
Nugroho, A. S., Witarto, A. B., & Handoko, D. (2003). Application of support vector machine in Bioinformatics. In Indonesian Scientific Meeting in Gifu, Central Japan.
Ozer, P. (2008). Data mining algorithm for classification (Bachelor Thesis). Redbound University Nijimegan Permadi, Y. (2008). Kategorisasi teks menggunakan N-Gram untuk dokumen berbahasa Indonesia (Skripsi). Bogor: Institut Pertanian Bogor.
Rahmoun, A., & Elberricihi, Z. (2007). Experimenting N-Grams in text categorization. The International Arab Journal of Information Technology, 4(4), 377-385.
Sahlgren, M., & Coster, R. (2004). Using bag of concepts to improve the performance of support vector machines in text categorization. In Proceedings of the 20th International Conference on Computational Linguistics Article No 487.
Stab, C. & Gurevych, I. (2014). Identifying argumentative discourse structures in persuasive essays. In Conference on Empirical Methods on Natural Language Processing (EMNLP).
Täckström, O. (2005). An evaluation of bag-of-concepts representations in automatic text classification (Master Thesis). Swedia: Royal Institute of Technology Sweden.
Wei, Z., Miao, D., Chauchat, J. H., & Zhong, C. (2008). Feature selection on Chinese text classification using character N-grams. In International Conference on Rough Sets and Knowledge Technology (pp. 500-507). Springer.
Wongso, R., Luwinda, F., Trisnajaya, B., Rusli, O., & Rudy. (2017). News article text classification in Indonesian language. In The 2nd International Conference on Computer Science and Computational Intelligence (ICCSCI 2017) (pp. 137-143). Elsevier.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: