Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)

Authors

  • Hans Christian Bina Nusantara University
  • Mikhael Pramodana Agus Bina Nusantara University
  • Derwin Suhartono Bina Nusantara University

DOI:

https://doi.org/10.21512/comtech.v7i4.3746

Keywords:

automatic text summarization, natural language processing, TF-IDF

Abstract

The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (Term
Frequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.
Dimensions

Plum Analytics

Author Biographies

Hans Christian, Bina Nusantara University

School of Computer Science

Mikhael Pramodana Agus, Bina Nusantara University

School of Computer Science

Derwin Suhartono, Bina Nusantara University

School of Computer Science

References

Al-Hashemi, R. (2010), Text Summarization Extraction System (TSES) Using Extracted Keywords, International Arab Journal of E-Technology, 1(4), 164- 168.

Bird, S., Klein, E., & Loper, E. (2009) Natural language processing with Python. United States: O'Reilly Media.

Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. Literature Survey for the Language and Statistics, 3(3), 1-12.

Kao, A., & Poteet, S. R. (2007). Natural Language Processing and Text Mining. United States:

Springer Media.

Kulkarni, A. R., & Apte, S. S. (2013). A domain-specific automatic text summarization using Fuzzy Logic. International Journal of Computer Engineering and Technology (IJCET), 4(4), 449-461.

Lahari, E., Kumar, D. S., & Prasad, S. (2014). Automatic text summarization with Statistical and Linguistic Features using Successive Thresholds. In IEEE International Conference on

Advanced Communications, Control and Computing Technologies 2014.

Munot, N., & Govilkar, S. S. (2014). Comparative study of text summarization methods. International Journal of Computer Applications, 102(12), 33-37. 294 ComTech Vol. 7 No. 4 December 2016: 285-294

Nedunchelian, R., Muthucumarasamy, R., & Saranathan, E. (2011). Comparison of multi document summarization techniques. International Journal of Computer Applications. 11(3), 155-160.

Petrov, S., Das, D., & McDonald R. (2012). A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086.

Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational Linguistics, 28(4), 399–408

Salton, G., & Buckley, C. (1988). Term-Weighting approaches in Automatic Text Retrieval. Information Processing and Management, 24(5), 513–523.

Steinberger, J., & Ježek, K. (2008). Automatic Text Summarization (The state of the art 2007 and new challenges). Znalosti, 30(2), 1-12.

Yohei, S. (2002). Sentence extraction by TF/IDF and Position Weighting from newspaper articles. In Proceedings of the Third NTCIR Workshop.

Downloads

Published

2016-12-31

Issue

Section

Articles
Abstract 8070  .
PDF downloaded 5017  .