Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)

Hans Christian; Mikhael Pramodana Agus; Derwin Suhartono

doi:10.21512/comtech.v7i4.3746

Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)

Authors

Hans Christian Bina Nusantara University
Mikhael Pramodana Agus Bina Nusantara University
Derwin Suhartono Bina Nusantara University

DOI:

https://doi.org/10.21512/comtech.v7i4.3746

Keywords:

automatic text summarization, natural language processing, TF-IDF

Abstract

The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (Term
Frequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.

Dimensions

Plum Analytics

Author Biographies

Hans Christian, Bina Nusantara University

School of Computer Science

Mikhael Pramodana Agus, Bina Nusantara University

School of Computer Science

Derwin Suhartono, Bina Nusantara University

School of Computer Science

References

Al-Hashemi, R. (2010), Text Summarization Extraction System (TSES) Using Extracted Keywords, International Arab Journal of E-Technology, 1(4), 164- 168.

Bird, S., Klein, E., & Loper, E. (2009) Natural language processing with Python. United States: O'Reilly Media.

Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. Literature Survey for the Language and Statistics, 3(3), 1-12.

Kao, A., & Poteet, S. R. (2007). Natural Language Processing and Text Mining. United States:

Springer Media.

Kulkarni, A. R., & Apte, S. S. (2013). A domain-specific automatic text summarization using Fuzzy Logic. International Journal of Computer Engineering and Technology (IJCET), 4(4), 449-461.

Lahari, E., Kumar, D. S., & Prasad, S. (2014). Automatic text summarization with Statistical and Linguistic Features using Successive Thresholds. In IEEE International Conference on

Advanced Communications, Control and Computing Technologies 2014.

Munot, N., & Govilkar, S. S. (2014). Comparative study of text summarization methods. International Journal of Computer Applications, 102(12), 33-37. 294 ComTech Vol. 7 No. 4 December 2016: 285-294

Nedunchelian, R., Muthucumarasamy, R., & Saranathan, E. (2011). Comparison of multi document summarization techniques. International Journal of Computer Applications. 11(3), 155-160.

Petrov, S., Das, D., & McDonald R. (2012). A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086.

Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational Linguistics, 28(4), 399–408

Salton, G., & Buckley, C. (1988). Term-Weighting approaches in Automatic Text Retrieval. Information Processing and Management, 24(5), 513–523.

Steinberger, J., & Ježek, K. (2008). Automatic Text Summarization (The state of the art 2007 and new challenges). Znalosti, 30(2), 1-12.

Yohei, S. (2002). Sentence extraction by TF/IDF and Position Weighting from newspaper articles. In Proceedings of the Third NTCIR Workshop.

Downloads

Published

2016-12-31

Issue

Vol. 7 No. 4 (2016): ComTech

Section

Articles

License

Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.

b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.

c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

USER RIGHTS

All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows:

• Creative Commons Attribution-Share alike (CC BY-SA)

Abstract 7305 .
PDF downloaded 4539 .