Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)
DOI:
https://doi.org/10.21512/comtech.v7i4.3746Keywords:
automatic text summarization, natural language processing, TF-IDFAbstract
The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.
Plum Analytics
References
Al-Hashemi, R. (2010), Text Summarization Extraction System (TSES) Using Extracted Keywords, International Arab Journal of E-Technology, 1(4), 164- 168.
Bird, S., Klein, E., & Loper, E. (2009) Natural language processing with Python. United States: O'Reilly Media.
Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. Literature Survey for the Language and Statistics, 3(3), 1-12.
Kao, A., & Poteet, S. R. (2007). Natural Language Processing and Text Mining. United States:
Springer Media.
Kulkarni, A. R., & Apte, S. S. (2013). A domain-specific automatic text summarization using Fuzzy Logic. International Journal of Computer Engineering and Technology (IJCET), 4(4), 449-461.
Lahari, E., Kumar, D. S., & Prasad, S. (2014). Automatic text summarization with Statistical and Linguistic Features using Successive Thresholds. In IEEE International Conference on
Advanced Communications, Control and Computing Technologies 2014.
Munot, N., & Govilkar, S. S. (2014). Comparative study of text summarization methods. International Journal of Computer Applications, 102(12), 33-37. 294 ComTech Vol. 7 No. 4 December 2016: 285-294
Nedunchelian, R., Muthucumarasamy, R., & Saranathan, E. (2011). Comparison of multi document summarization techniques. International Journal of Computer Applications. 11(3), 155-160.
Petrov, S., Das, D., & McDonald R. (2012). A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086.
Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational Linguistics, 28(4), 399–408
Salton, G., & Buckley, C. (1988). Term-Weighting approaches in Automatic Text Retrieval. Information Processing and Management, 24(5), 513–523.
Steinberger, J., & Ježek, K. (2008). Automatic Text Summarization (The state of the art 2007 and new challenges). Znalosti, 30(2), 1-12.
Yohei, S. (2002). Sentence extraction by TF/IDF and Position Weighting from newspaper articles. In Proceedings of the Third NTCIR Workshop.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: