Cross-Prompt Based Automatic Short Answer Grading System
DOI:
https://doi.org/10.21512/commit.v19i2.13423Keywords:
Cross-Prompt, Automatic Short Answer Grading (ASAG), Prompt-SpecificAbstract
Research on Automatic Short Answer Grading (ASAG) has shown promising results in recent years. However, several important research gaps remain. Based on the literature review, the researchers identify two critical issues. First, the majority of ASAG models are trained and tested on responses to the same prompt which raises concerns about their robustness accross different prompts. Second, many existing approaches typically treat grading task as a binary classification problem. The research aims to bridge these gaps by developing an ASAG system that closely reflects real-world assessment scenarios through multiclass classification approach and cross-prompt evaluation. It is implemented by training the proposed models on 1,505 responses across 9 prompts and testing on 175 responses from 3 distinct prompts. The grading task is addressed using regression and classification techniques, including Linear Regression, Logistic Regression, Extreme Gradient Boosting (Xg-Boost), Adaptive Boosting (AdaBoost), and K-Nearest Neighbors (as a baseline). The grades are categorized into five classes that are represented by grade A to E. Both manual and algorithmic data augmentation techniques, including Syntactic Minority Oversampling Technique (SMOTE), are employed to address class imbalance in the sample data. Across multiple testing scenarios, all five models demonstrate consistent performance, with Linear Regression outperforming others. During the validation process, it achieves a high accuracy of 0.93, indicating its ability to correctly classify the responses. In the testing phase, it achieves a weighted F1-Score of 0.79, a macroaveraged F1-Score of 0.75, and an RMSE of 0.45. The results suggest relatively low prediction error
Plum Analytics
References
[1] D. Ifenthaler, “Handbook of open, distance and digital education.” Singapore: Springer, 2022, ch. Automated essay scoring system, pp. 1057–1071.
[2] S. Burrows, I. Gurevych, and B. Stein, “The eras and trends of automatic short answer grading,” International Journal of Artificial Intelligence in Education, vol. 25, no. 1, pp. 60–117, 2015.
[3] A. Sahu and P. K. Bhowmick, “Feature engineering and ensemble-based approach for improving automatic short-answer grading performance,” IEEE Transactions on Learning Technologies, vol. 13, no. 1, pp. 77–90, 2019.
[4] B. Cho, Y. Jang, and J. Yoon, “Rubricspecific approach to automated essay scoring with augmentation training,” 2023. [Online]. Available: https://arxiv.org/abs/2309.02740
[5] E. Del Gobbo, A. Guarino, B. Cafarelli, and L. Grilli, “GradeAid: A framework for automatic short answers grading in educational contexts—Design, implementation and evaluation,” Knowledge and Information Systems, vol. 65, no. 10, pp. 4295–4334, 2023.
[6] A. A. Septiandri, Y. A. Winatmoko, and I. F. Putra, “Knowing right from wrong: Should we use more complex models for automatic short-answer scoring in Bahasa Indonesia?” in Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing. Online: Association for Computational Linguistics, November 2020, pp. 1–7.
[7] M. R. R. Susanto, H. Thamrin, and N. A. Verdikha, “Performance of text similarity algorithms for essay answer scoring in online examinations,” Jurnal Teknik Informatika (JUTIF), vol. 4, no. 6, pp. 1515–1521, 2023.
[8] R. A. Rajagede, “Improving automatic essay scoring for indonesian language using simpler model and richer feature,” Kinetik: Game technology Information System, Computer Network, Computing, Electronics, and Control, vol. 6, no. 1, pp. 11–18, 2021.
[9] L. Zhang, Y. Huang, X. Yang, S. Yu, and F. Zhuang, “An automatic short-answer grading model for semi-open-ended questions,” Interactive Learning Environments, vol. 30, no. 1, pp. 177–190, 2022.
[10] J. Liu, “Importance-SMOTE: A synthetic minority oversampling method for noisy imbalanced data,” Soft Computing, vol. 26, no. 3, pp. 1141–1163, 2022.
[11] N. U. Niaz, K. M. N. Shahariar, and M. J. A. Patwary, “Class imbalance problems in machine learning: A review of methods and future challenges,” in Proceedings of the 2nd International Conference on Computing Advancements, 2022, pp. 485–490.
[12] F. F. Lubis, A. Putri, D. Waskita, T. Sulistyaningtyas, A. A. Arman, and Y. Rosmansyah, “Automated short-answer grading using semantic similarity based on word embedding,” International Journal of Technology, vol. 12, no. 3, pp. 571–581, 2021.
[13] M. Chen and Y. Dong, “Design of exercise grading system based on text similarity computing,” Mobile Information Systems, vol. 2022, pp. 1–7, 2022.
[14] L. D. Krisnawati, A. W. Mahastama, S. C. Haw, K. W. Ng, and P. Naveen, “Indonesian-English textual similarity detection using Universal Sentence Encoder (USE) and Facebook AI Similarity Search (FAISS),” CommIT (Communication and Information Technology) Journal, vol. 18, no. 2, pp. 183–195, 2024.
[15] U. Hasanah, T. Astuti, R. Wahyudi, Z. Rifai, and R. A. Pambudi, “An experimental study of text preprocessing techniques for automatic short answer grading in Indonesian,” in 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE). Yogyakarta, Indonesia: IEEE, Nov. 13–14, 2018, pp. 230–234.
[16] N. H. Hameed and A. T. Sadiq, “Automatic short answer grading system based on semantic networks and support vector machine,” Iraqi Journal of Science, vol. 64, no. 11, pp. 6025–6040, 2023.
[17] D. Wilianto and A. S. Girsang, “Automatic short answer grading on high school’s e-learning using semantic similarity methods,” TEM Journal, vol. 12, no. 1, pp. 297–302, 2023.
[18] F. Li, X. Xi, Z. Cui, D. Li, and W. Zeng, “Automatic essay scoring method based on multiscale features,” Applied Sciences, vol. 13, no. 11, pp. 1–18, 2023.
[19] J. S. Tan, I. K. T. Tan, L. K. Soon, and H. F. Ong, “Improved automated essay scoring using Gaussian multi-class SMOTE for dataset sampling,” in Proceedings of the 15th International Conference on Educational Data Mining, England, UK, July 24–27, 2022.
[20] M. Tornqvist, M. Mahamud, E. M. Guzman, and A. Farazouli, “ExASAG: Explainable framework for automatic short answer grading,” in Proceedings of the 18th workshop on innovative use of NLP for building educational applications (BEA 2023). Toronto, Canada: Association for Computational Linguistics, July 2023, pp. 361–371.
[21] H. Funayama, Y. Asazuma, Y. Matsubayashi, T. Mizumoto, and K. Inui, “Reducing the cost: Cross-prompt pre-finetuning for short answer scoring,” in International Conference on Artificial Intelligence in Education. Tokyo, Japan: Springer, July 3–7, 2023, pp. 78–89.
[22] H. Do, Y. Kim, and G. G. Lee, “Promptand trait relation-aware cross-prompt essay trait scoring,” 2023. [Online]. Available: https://arxiv.org/abs/2305.16826
[23] R. Ridley, L. He, X. Y. Dai, S. Huang, and J. Chen, “Automated cross-prompt scoring of essay traits,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 15. Virtual: Association for the Advancement of Artificial Intelligence, Feb. 2–9, 2021, pp. 13 745–13 753.
[24] Tim Kurikulum, Panduan akademik kurikulum 2021 revisi 2023. Fakultas Teknologi Informasi, Universitas Kristen Duta Wacana, 2023.
[25] T. O. Hodson, “Root Mean Square Error (RMSE) or Mean Absolute Error (MAE): When to use them or not,” Geoscientific Model Development Discussions, vol. 2022, pp. 1–10, 2022.
[26] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, 2021.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Lucia Dwi Krisnawati, Aditya Wikan Mahastama, Su Cheng Haw

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
Â
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)














