INDONESIAN EAP STUDENTS’ VOCABULARY LEVEL AND SIZE: AN EMPIRICAL INVESTIGATION

The research aimed to know to what extent Indonesian English for Academic Purposes (EAP) students master high and midfrequency words (4.000-5.000). Besides, it aimed to know the vocabulary size of Indonesian EAP students. To fill the gap, the research examined 128 Indonesian EAP students from two private universities in Indonesia. To gather its data, the research employed the Vocabulary Level Test of Webb, Sasao, & Ballance, and the Vocabulary Size Test of Nation and Beglar. The research findings indicate that the participants have not yet mastered the high-frequency words and the mid-frequency words from 4.000-5.000 word-families. The finding also reveals that the mean scores of the students’ vocabulary range between 6.000 and 10.000. It implies that the previous learning of the participants has not yet facilitated them to learn important vocabulary from 1.000 to 5.000 word-families. Thus, although they have a big vocabulary size, they might face problems when trying to understand some texts. The research findings are expected to increase English teachers’ awareness in general and EAP teachers’ awareness specifically of the importance of facilitating their students to learn high-frequency words.


INTRODUCTION
English for Academic Purposes (EAP) programs in Indonesia are context-dependent. For example, the learning aim of English for Economics might be students' mastery of English grammar; however, the English for Chemistry program's goal might be students' high TOEFL score (Kusni, 2013). In the research of Poedjiastutie and Oliver (2017), some employers and teachers believe that reading is an important skill to develop because there is a need for students to be able to read English journals and books to support them in their studies and when they write their thesis at the end of their studies. Although these EAP programs' objectives and the beliefs of the stakeholders are different, increasing students' vocabulary knowledge seems to be the answer to make sure the different goals to be attained. As previous studies have found that vocabulary knowledge correlates with reading comprehension; therefore, it is a significant predictor of reading comprehension (Laufer & Aviad-Levitzky, 2017;Li & Kirby, 2015;Schmitt et al., 2017) and a good predicator for L2 proficiency (Miralpeix & Muñoz, 2018). Vocabulary knowledge can also be used to predict students' performance in productive language skills (Kilic, 2019).
Although it is useful to know the students' vocabulary knowledge to predict their language ability, the limited research on EAP students' vocabulary size and vocabulary level that focuses on high-frequency words is noticeable. Regarding the vocabulary size of EAP learners, an example is the research of Khodabakhshi, Daroonshad, and Moini (2014) that investigates the vocabulary size of Iranian EAP students from three faculties (Engineering, Sciences, and Humanities) at the University of Kashan. It is found that the students of the Engineering Faculty obtain the mean score, which was 4,593,75 or the highest mean score. The mean scores of the students from the Sciences Faculty and the Humanities Faculty respectively are 3,188 and 3,432. In addition to that, the findings of the previous studies indicate that the high-frequency word knowledge of EAP students is inadequate. For example, Akbarian (2010) has investigated 112 Iranian EAP learner by measuring their receptive vocabulary knowledge. He has found that only 24% of the participants have acquired the first 2,000 word-families. In other words, more than three-quarter of the students fail to master the words. In a similar vein, the research of Cheng and Matthews (2018) that examines 167 Chinese EAP students. It is found that they only know about 77% of the most frequent 2,000 word-families. Recently, Dang (2020) has investigated the rates of high-frequency words that present in academic spoken and written English as well as exploring 66 Vietnamese EAP students' vocabulary knowledge of the words. The findings show that despite the fact that a significant role of high-frequency words presents in academic spoken English, most participants in the research have not yet mastered the words.
In the Indonesian context, studies on EAP students' vocabulary knowledge seem to be limited. Only several previous research projects examine the vocabulary knowledge of Indonesian English as a Foreign Language (EFL) learners who major in English. Also, most of them investigate either students' highfrequency word knowledge or vocabulary size and do not examine both of them in a single research. The researches that examined EFL students' knowledge of high-frequency words reveal that most participants have not mastered high-frequency words. For example, Kurniawan's (2017) research has examined 290 EFL undergraduates at UIN Raden Intan. It reveals that 11 students of the participants have not yet mastered 1,000 word-level. Sudarman and Chinokul (2018) have examined EFL students at Kutai Kartanegara University that also find that the participants have not yet mastered both 2,000 and 3,000 word-levels. Thus, these studies' findings are similar to the findings of other studies with EAP students outside Indonesia.
Regarding previous researches that examined Indonesian EFL students' vocabulary size, the findings of these researches show that averagely the students' mean scores are between 5,000 and 8,700. For example, the average vocabulary size of the EFL students in the research of Umam (2016) is 5873 wordfamilies. The highest and the lowest scores of the research participants, respectively, are 8,800 and 2,800 word-families. Another research of Kusumarasdyati and Ramadhani (2018), which examines 216 EFL students from the first to the fourth years, have found that the mean scores of vocabulary size of the first to the fourth-year participants respectively are 5, 425, 5,641,8, 5,987,8, and 6,141,3 word-families. Research by Romadloni (2019) that researches the vocabulary size of 242 EFL students have found that the average vocabulary size for the 2015-2018 batch respectively were 6, 519,78, 7,028,13, 7,040,91, and 8,202,33 wordfamilies. In other words, the previous researches have found that averagely the students have a quite high vocabulary size. Although having a big vocabulary size is important, Clark and Ishida (2005) have argued that it is important to pay attention to high-frequency words, and the students cannot learn 'any random 5,000 words'.
Some previous researches have found that high-frequency words are important. Recent research of Noreillie et al. (2018) has revealed that knowing the first 1,000 and the second 1,000 most frequent word-family is crucial for L2 learners because they equal 91% and 97% coverage of a text. Peters and Webb (2018) have stated that when someone wants to understand 90% of the running words in the documentary, he/she needs to have 90% coverage of the most frequent 2,000 words. The research of Dang, Coxhead, and Webb (2017) has found that 70% of the most frequent words in academic spoken English are from high-frequency words. The finding of the research of Nurmukhamedov (2017) also corroborates the research of Dang, Coxhead, and Webb (2017). Nurmukhamedov (2017) has explained that before teachers use TED Talks presentations, they need to ensure that their students have mastered the first 2,000 word-families because these words together with plus proper nouns and marginal words account for 92,17% coverage of the TED Corpus that he examines. Moreover, in Masrai's (2019) research, high and mid-frequency words are also found as important elements for L2 reading comprehension. Liu and Chen's (2019) research has also found that students need to master 3,000-word families to reach 95% coverage of TED talks and know 6,000-word families to reach their 98% coverage. Their findings indicate that to understand TED talks well, learners need to know high and mid-frequency vocabulary words.
Taken together, the findings of the previous researches that have been reviewed suggest that to be able to comprehend texts well, not only do EAP students need to have a big vocabulary size, but they must have a good knowledge of high and midfrequency words. Thus, having a big vocabulary size but not yet mastering high-frequency words will be ineffective. Also, no researches have attempted to measure Indonesian EAP students' vocabulary level and size in the same research, as mentioned earlier. Therefore, the present research project aims to fill this gap. The research investigates the students' vocabulary level as well as their vocabulary size. While the former is to know which frequency bands are required the most attention in the students' learning later on, the latter is to identify learners' lexical readiness. Specifically, the research examines the vocabulary level and size of Indonesian learners who enrolled in EAP programs at two private universities in Indonesia. The research questions are (1) to what extent do Indonesian EAP students master high and mid-frequency words (4,000-5,000)? (2) What is the vocabulary size of Indonesian EAP students?

METHODS
In total, there are 128 students who participated in the research. They are second-semester students at two private universities in Indonesia. There are 54 students that are from A University (pseudonym) majoring in Management. Furthermore, there are 74 students that are from B University (pseudonym) majoring in Business Administration.
In the research project, two vocabulary tests are employed as instruments for collecting data. The first instrument is the Vocabulary Level Test (VLT) of Webb, Sasao, & Ballance (2017). Nation and Waring (2019) has suggested that the test is an appropriate test for assessing students' vocabulary level. This test is employed to get information about students' vocabulary level (1,000-5,000). When creating VLT, Webb, Sasao, & Ballance (2017) have used the British National Corpus and Corpus (BNC) of Contemporary American English (COCA). In the test, each level (1,000-5,000) has ten clusters. The students have to match the given definitions with three correct words (see Table 1). The tests can be accessed at the following link https://vuw.qualtrics.com/jfe/form/ SV_6Wrb5aUvXjIAs6h?Q_JFE=qdg.
The test result is measured using the cutting points that Webb, Sasao, & Ballance (2017) recommend. Thus, the cutting point for mastering 1,000 to 3,000 word-level is set at 97%, or it is similar to 27 correct answers out of 30 questions. Furthermore, mastering 4,000 and 5,000 word-levels is set at 80%, or it is similar to 24 correct answers out of 30 questions. The second test is the Vocabulary Size Test (VST) of Nation and Beglar (2007). This test is widely used with many bilingual versions. However, there is no bilingual version in Indonesian. Thus, the research uses its English monolingual version. The test has two versions: 14,000 (A) or 20,000 (B). Unlike the VLT contains words from COCA and BNC, the VST only consists of word lists from BNC. The VST format is a four-option multiple-choice with an additional "I don't know" choice that can be chosen if the test takers have never seen the word before. The question example is, "Write: Please write it here. Then, it has to be matched with one of these choices: make words on paper; cut into pieces; make something better; move to a new place; and I don't know." The A and B tests respectively contain 140 and 100 questions. The tests can be accessed at the following link https://my.vocabularysize.com/. The correct answers in the former are multiplied by 100, and the correct answers in the latter are multiplied by 200 when counting the results of the tests. Thus, 50 correct answers in the A test equal 6,000 words, but the B test equals 12,000 words.
The research is assisted by the teacher of the courses to administer the VLT and the VST to the students, respectively. When doing the VST, the students are asked to count how many "I don't know" option they made and how many guesses they made. After doing the tests, the students have to insert the information into a short demographic questionnaire. The information is valuable for interpreting the data.

RESULTS AND DISCUSSIONS
In the research project, 128 EAP students from two private universities in Indonesia have completed two vocabulary tests: Vocabulary Level Test (VLT) and Vocabulary Size Test (VST). Table 2 and 3 respectively present the results of the vocabulary level test at A University (AU) and B University (BU). They answer the first research question about EAP learners' receptive vocabulary knowledge. Overall, the findings from both universities show that the students' mean scores of 1,000-5,000 word-levels have not reached the cutting points (97%-100% for the first 3,000 wordlevel, and 80%-100% for the next 2,000 word-levels). It can also be noticed that the higher the word level is, the bigger the standard deviation of the students' mean scores of AU and BU is. In other words, the higher the word level is, the wider the students' vocabulary knowledge range is.
Also, it can be seen that only one of BU's students has mastered 1,000-5,000 word-levels, and none of AU's students has mastered all the levels. There are more students who have mastered each level (1,000-5,000 word-levels) in BU than in AU. Regarding the high-frequency words in 1,000-2,000 word-levels, the cutting points for passing the 1,000 to 2,000 word-level only are from 97% to 100%. The findings show that about 16% of AU's students have mastered the first 1,000 word-level, and less than 2% of AU's students have acquired the second 1,000 word-level. The results of BU's students are better. Almost 60% of BU's students have mastered the first 1,000 word-families, and about 16% of their students have mastered the second 1,000 word-families. None of AU's students has mastered 3,000 word-families, and only about 4% of BU's students have mastered the level. It means that most of the students of both universities failed to master this level. The higher the word-level is, the lower the mean score of the students' VLT scores of AU is. However, it is different from the mean score of the students of BU. At BU, the lowest mean score is in 3,000 word-families. Table 2 and 3 respectively also show the results of mid-frequency words that AU and BU students have and have not mastered. The cutting points for the 4 th x Piece of music x 1,000 word-families and the 5 th 1,000 word-families are from 80% to 100%. As shown at AU, more students have mastered the 5 th 1000 word-families (about 14%) than the 4 th 1,000-word-families (about 9%). While at BU, the percentage of students who have mastered both levels is the same (50%). Table 4 presents the vocabulary size of AU's and BU's students. Overall, the students' mean score is above 6,000. The highest mean score is 10,707,3. Although the mean score is high, the standard deviation (SD) is also high. It means that the range of students' vocabulary knowledge is high. The big vocabulary size difference can be seen clearly in the highest score and the lowest score in each group. The highest and the lowest scores in AU's groups correspondingly are 12,400 and 1,000 (for students who answer 100 questions), and 9,400 and 1,600 (for students who answer 100 questions). The highest and the lowest scores in BU's groups correspondingly are 16,400 and 2,297 for students who answer 100 questions, and 12,400 and 4,700 for students who answer 100 questions. The percentages of students' guesses and their "I don't know" answers are relatively high. The highest guesses percentage is 36,83%, and the highest "I don't know" answer percentage is 23,75%.
Regarding the vocabulary level, the findings of the research clearly indicate that only one of BU's students has mastered 1,000-5,000 word-levels, and none of AU's students has mastered all the levels. Most of the participants fail to master the highfrequency words (the 1 st 1,000 word-families and the 2 nd 1,000 word-families). These findings are similar to the findings of the previous studies (Akbarian, 2010;Cheng & Matthews, 2018;Dang, 2020) with EAP students from other countries as well as with the EFL students in Indonesia in the studies of Kurniawan (2017), and Sudarman and Chinokul (2018). Also, the fact that none of AU's students has mastered 3,000 word-families, and only about 4% of BU's students have mastered the level is alarming. Knowing limited words from most frequent the first 1,000 word-families to the third 1,000 word-families will cause the students to have a comprehension problem. As Noreillie et al. (2018) have found, knowing the 1 st 1,000 and the 2 nd 1,000 most frequent word-family is crucial for L2 learners because they equal 91% and 97% coverage of a text. In the same line, Nation (2006) has stated that 86 % of the running words in the texts are from the 1 st 1,000 word-families and the 2 nd 1,000 word-families, and the students need a 98% threshold to be able to read a wide range of texts. The findings of the research corroborate the argument of Akbarian (2010). It has been said that the low vocabulary proficiency level of all of the ESP/EAP learners raises great concern for their academic future and a formidable challenge for the language instructors.
With respect to the second research question about the vocabulary size of the Indonesian EAP students in this research, averagely the students have above 6,000 vocabulary size, and the biggest mean score is about 10,000. The former is almost similar to the mean score of the third-year students in Kusumarasdyati and Ramadhani (2018) research. The latter is higher than the mean score found in these previous studies (Kusumarasdyati & Ramadhani, 2018;Romadloni, 2019;Umam, 2016).
Based on the two vocabulary tests results, it is noticeable that despite the high mean score of the students' vocabulary size, only one student has mastered the first 2,000 high-frequency words and the first 3,000 mid-frequency words. It suggests that although the students have a big vocabulary size, they might still have a problem in comprehending texts. Thus, it is important to make sure that students will be able to learn frequent vocabulary in their learning. As argued by Sun and Dang (2020), if learners have excellent coverage of high-frequency words, it would recognize a considerable percentage of words in various kinds of discourses (e.g., movies, television programs, newspapers, and general conversation) and improve their comprehension quickly. Also, Clark and Ishida (2005) have argued that it is important to pay attention to high-frequency words, and people cannot learn any random 5,000 words. In other words, it is crucial to sequentially learn words from the most frequent word list to the least frequent one. Also, Dang (2020) has observed that a number of EAP courses tend to neglect the learning words of high-frequency word lists but focus more on vocabulary for academic word lists. It should not be the case, as revealed in the research of Dang (2020). It is said that high-frequency words are essential for comprehending academic spoken English.
Students who have not yet mastered the most frequent 3,000 can learn the words from graded readers. After they have mastered the words, they can take advantage of English language television programs for their vocabulary input (Sun & Dang, 2020). Feng and Webb's (2020) research has revealed that extensive viewing might positively result in vocabulary growth. Also, using graded readers with an audio-assisted material can also relatively enlarge vocabulary learning gains (Webb & Chang, 2015). Previous studies in Japan (Hagley, 2017) and in America (Ro, 2016) have shown that EAP students can benefit from reading graded readers. The finding of recent research in Indonesia has shown that graded readers are beneficial to increasing Indonesian students' vocabulary (Hadiyanto, 2019). Besides that, the research of Dang, Webb, and Coxhead (2020) have suggested that it will be useful to learn words from the British National Corpus or Corpus of Contemporary American English 2,000 than other high-frequency word lists for the second language (L2) learners. Except for the one student who has mastered all the words from the 1 st until the 5 th 1,000 wordlevels, the rest of the participants are not yet ready to learn English from TED talks. As Nurmukhamedov (2017) explains, students have to master the first 2,000 word-families plus proper nouns and marginal words before learning from TED Talks presentations. In the same vein, Liu and Chen (2019) have also argued that students need to master 3,000-word families to reach 95% coverage of TED talks. In addition, the participants of the current research have made many guesses when completing the VST. Thus, their high vocabulary size mean score seems to suggest their partial knowledge of low-frequency words, as explained by Nguyen and Nation (2011) that learners might be able to correctly guess the meaning of the less frequent-used words in VST when they have obtained partial knowledge of words.

CONCLUSIONS
To conclude, the present findings of the current research show that most of the EAP students have not yet mastered the high-frequency words and the midfrequency words from 4,000 to 5,000 word-families in the Vocabulary Level Test. The mean score of the students' vocabulary size is big; however, they also inform that they make many guesses when completing the test. Taken together, the findings indicate that the students' previous learning has not yet facilitated them to learn important vocabulary from 1,000 to 5,000 word-families. The students' high vocabulary sizes might be due to their vocabulary learning that focuses on low-frequency word lists and their impartial vocabulary knowledge of the low-frequency words. The impartial knowledge enables them to make correct guesses in the VST. Consequently, despite the big vocabulary size, the students might have difficulties in understanding texts.
The current research has some limitations. Although the research involves participants from two universities, both universities are private universities, and the students belong to a similar field, which is Economics. As mentioned previously in the research, that EAP programs in Indonesian universities are context-dependent. Thus, future studies should involve participants from different faculties at private and public universities to yield rich information for EAP stakeholders. Also, this research only tests the students' receptive vocabulary knowledge. Future research projects can include both receptive and productive vocabulary tests to get a complete picture of the students' vocabulary knowledge. Future research can also use interviews to get more information regarding students' decision-making process when answering the vocabulary test items. As argued by Michel and Plumb (2019), vocabulary assessment is complex; therefore, it is very crucial to investigate it with multiple perspectives and modalities. Despite the limitations that the study has, the research findings are expected to make EAP teachers aware of the importance of facilitating their students to learn highfrequency words and encourage them to inform their students that it is ineffective to learn words randomly. Therefore, when learning new words, it is crucial for them to pay attention to the frequency level of words.