The Performance of Boolean Retrieval and Vector Space Model in Textual Information Retrieval
DOI:
https://doi.org/10.21512/commit.v11i1.2108Keywords:
Boolean Retrieval, Vector Space Model, Information Retrieval, Inverted Index, Querying Index, CorpusAbstract
Boolean Retrieval (BR) and Vector Space Model (VSM) are very popular methods in information retrieval for creating an inverted index and querying terms. BR method searches the exact results of the textual information retrieval without ranking the results. VSM method searches and ranks the results. This study empirically compares the two methods. The research utilizes a sample of the corpus data obtained from Reuters. The experimental results show that the required times to produce an inverted index by the two methods are nearly the same. However, a difference exists on the querying index. The results also show that the number
of generated indexes, the sizes of the generated files, and the duration of reading and searching an index are proportional with the file number in the corpus and the
file size.
Plum Analytics
References
S. Brin and L. Page, “Reprint of: The anatomy of a large-scale hypertextual web search engine,” Computer networks, vol. 56, no. 18, pp. 3825– 3833, 2012.
A. Gomathi, J. Jayapriya, G. Nishanthi, K. Pranav, and G. Praveen Kumar, “Ontology based semantic information retrieval using particle swarm optimization,” International Journal of Applied Information Communication Engineering, vol. 1, no. 4, pp. 5–8, 2015.
M. O. Nassar, F. A. Mashagba, and E. A. Mashagba, “Improving the user query for the boolean model using genetic algorithms,” Inter-national Journal of Computer Science Issues, vol. 8, no. 1, pp. 66–70, 2011.
E. Al Mashagba, F. Al Mashagba, and M. O. Nas-sar, “Query optimization using genetic algorithms in the vector space model,” International Journal of Computer Science Issues (IJCSI), vol. 8, no. 5, 2011.
A. Depeursinge, S. Duc, I. Eggel, and H. Muller, “Mobile medical visual information retrieval,” IEEE Transactions on information technology in biomedicine, vol. 16, no. 1, pp. 53–61, 2012.
R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Computing Surveys (Csur), vol. 40, no. 2, p. 5, 2008.
X. Peng, D. Ke, Z. Chen, and B. Xu, “Automated chinese essay scoring using vector space models,” in Universal Communication Symposium (IUCS), 2010 4th International. IEEE, 2010, pp. 149– 153.
X. Peng, D. Ke, and B. Xu, “Automated essay scoring based on finite state transducer: towards asr transcription of oral english speech,” in Pro-ceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012, pp. 50–59.
W. Maitah, M. Al-Rababaa, and G. Kannan, “Improving the effectiveness of information re-trieval system using adaptive genetic algorithm,” International Journal of Computer Science & Information Technology, vol. 5, no. 5, p. 91, 2013.
S. Sarkar, A. Roy, and B. S. Purkayastha, “Clus-tering of documents using particle swarm opti-mization and semantics information,” Int. J. of Comput. Sci. & Inform. Technologies, vol. 5, no. 3, 2014.
M. Sharma and R. Patel, “A survey on infor-mation retrieval models, techniques and applications,” International Journal of Emerging Tech-nology and Advanced Engineering, ISSN, vol. 3, no. 11, pp. 542–545, 2013.
S. Hassan, M. Rafi, and M. S. Shaikh, “Com-paring svm and naive bayes classifiers for text categorization with wikitology as knowledge en-richment,” in Multitopic Conference (INMIC), 2011 IEEE 14th International. IEEE, 2011, pp. 31–34.
K. Zubrinic,´ M. Miliceviˇc,´ and I. Zakarija, “Com-parison of naive bayes and svm classifiers in categorization of concept maps,” International journal of computers, vol. 7, no. 3, pp. 109–116, 2013.
C. D. Manning, P. Raghavan, and H. Schutze,¨ An Introduction to Information Retrieval. Cambridge University Press, 2009.
D. Dertat. (2012) Accessed on March 30, 2015. [Online]. Available: http://www.ardendertat.com/ 2012/01/11/implementing-search-engines/
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, “Rcv1: A new benchmark collection for text cate-gorization research,” Journal of machine learning research, vol. 5, no. Apr, pp. 361–397, 2004.
J. Dean and S. Ghemawat, “Mapreduce: simpli-fied data processing on large clusters,” Communi-cations of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
D. Hiemstra and C. Hauff, “Mirex: Mapreduce information retrieval experiments,” CTIT, Tech. Rep., 2010.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License - Share Alike that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
USER RIGHTS
All articles published Open Access will be immediately and permanently free for everyone to read and download. We are continuously working with our author communities to select the best choice of license options, currently being defined for this journal as follows: Creative Commons Attribution-Share Alike (CC BY-SA)