K-Nearest Neighbors Method for Recommendation System in Bangkalan’s Tourism

- The more tourist objects are in an area, the more challenging it is for local governments to increase the selling value of these attractions. The government always strives to develop tourist attraction areas by prioritizing the beauty of tourist attractions. However, visitors often have difficulty in determining tourist objects that match their criteria because of the many choices. The research developed a tourist attraction recommendation system for visitors by applying machine learning techniques. The machine learning technique used was the K-Nearest Neighbor (KNN) method. Several trials were conducted with a dataset of 315 records, consisting of 11 attributes and 21 tourist attractions. Based on the dataset, the preprocessing stage was previously carried out to improve the data format by selecting data where the data were separated based on existing criteria, then calculating the closest distance and determining the value of k in the KNN method. The results are divided into five folds for each classification method. The highest system accuracy obtained at KNN is 78% at k =1. It shows that the KNN method can provide recommendations for three tourist attraction classes in Bangkalan. Applying the KNN method in the recommendation system determines several alternative tourist objects that tourists can visit according to their criteria in natural, cultural, and religious tourist objects.

Abstract -The more tourist objects are in an area, the more challenging it is for local governments to increase the selling value of these attractions. The government always strives to develop tourist attraction areas by prioritizing the beauty of tourist attractions. However, visitors often have difficulty in determining tourist objects that match their criteria because of the many choices. The research developed a tourist attraction recommendation system for visitors by applying machine learning techniques. The machine learning technique used was the K-Nearest Neighbor (KNN) method. Several trials were conducted with a dataset of 315 records, consisting of 11 attributes and 21 tourist attractions. Based on the dataset, the preprocessing stage was previously carried out to improve the data format by selecting data where the data were separated based on existing criteria, then calculating the closest distance and determining the value of k in the KNN method. The results are divided into five folds for each classification method. The highest system accuracy obtained at KNN is 78% at k=1. It shows that the KNN method can provide recommendations for three tourist attraction classes in Bangkalan. Applying the KNN method in the recommendation system determines several alternative tourist objects that tourists can visit according to their criteria in natural, cultural, and religious tourist objects.

I. INTRODUCTION
Tourism is the largest industry in the world that contributes to the development of the world economy. Tourism development in Indonesia is directed at increasing the role of tourism to increase the number of visitors. Specifically, the tourism industry in Bangkalan has grown rapidly, as evidenced by the large number of tourists visiting and many new tourist attractions (Peng et al., 2010).
Bangkalan is the center of tourism on Madura Island because the place is easy to reach through the Suramadu bridge and has some beautiful landscapes that can be used as a tourist destination (Peng et al., 2010). There are many interesting tourist spots that are not inferior to other areas. Some tourist attractions in Bangkalan that can be recommended are Arosbaya Limestone Hill, Bangkalan Geger Hill, Jaddih Hill Bangkalan, Bangkalan Nine Beach, Labuhan Mangrove Education Park, Siring Kemuning Beach, and Bangkalan's religious tourism.
New tourist attractions usually become a trend, so they often appear on social media and printed media. However, it causes the old tourist attractions to be abandoned. Conditions like this make visitors prioritize trending tourist objects, while tourist objects that match visitor characteristics are ignored. Tourist objects that are in accordance with the visitors' characteristics greatly determine the feasibility of these tourist objects being visited. The benefit is that visitors can develop inspiration, and existing facilities at tourist attractions can be adjusted to visitors' needs. In addition, it can help the management and tourism office to continue to develop existing facilities to meet the needs of visitors and beautify the nuances of these attractions so that they can improve the economy of the people of the area. Therefore, in the research, it is important to develop a system to provide recommendations for tourist attractions as an alternative to visitors according to their characteristics.
Recommendations can be used to estimate the destination of the tourist attraction chosen by the tourists or verify some of the possible attractions as the destination chosen by the tourist (Hu & Zhou, 2017). So, the recommendation system can be used to analyze data about tourism objects or interactions between visitors and attractions to find the relationship between visitors and these attractions (Adikara, Sari, Adinugroho, & Setiawan, 2021). The results obtained are displayed as recommendations. Then, the results can increase the promotion of interest and build visitors' loyalty to the tourist attraction (Zhu, Cao, & Weng, 2018).
The selection of the right tourism object also has an effect in this case. A system in the tourism sector is needed to obtain information and make decisions on the selection of tourism objects effectively to choose the right tourism object (Parmawati, Imaniyah, Rokani, Rajaguni, & Kurnianto, 2018;Xie & Zhang, 2021). Bangkalan is one of the areas on the island of Madura. It has many tourist attractions that have become a destination for tourists who want to enjoy the beauty of nature and culture on their vacation (Parmawati et al., 2018). However, many tourists do not know about interesting tourist objects, such as natural, religious, and cultural attractions. The information obtained is only from travel agents and usually does not include accurate data. This situation makes tourists feel confused about determining which place is their destination according to their wishes. So, several previous studies have applied several methods but have not provided the right decision-making solution.
One of the developed computational methods is machine learning to perform alternative retrieval as a result of recommendations. Machine learning is very helpful in solving problems and making it easy to do things (Satvika, Nasution, & Nugrahaeni, 2018). In information technology, the recommendation system is part of a computer-based information system that is used to support decision-making (Cui, Huang, & Wang, 2019). In the recommendation system, several methods are often used in previous research, such as ant colony optimization algorithm, collaborative filtering, and content-based as a basis for determining recommendations (Jia, Yang, Gao, & Chen, 2015;Cui et al., 2019;Alharbe, Rakrouki, & Aljohani, 2023). However, the performance of these methods in the classification process is still less competitive in many domains (Yang & Shi, 2019). For example, in previous research, the K-Nearest Neighbor (KNN) method can determine the ranking of tourist attractions (Soares, Suyoto, & Santoso, 2017). This system is very effective and efficient in helping tourists to choose their destination plan in Timor Leste. However, if there are new attributes, it is necessary to calculate the similarity value in the old case to find the level of similarity with the new case. Thus, it requires a lot of training data to provide a recommendation model. In addition, in another previous research, KNN is a data mining classification method that can predict the accuracy of a student's graduation (Salim, Laksitowening, & Asror, 2020). So, KNN has advantages that can improve the quality of students' thesis services. The test results show the resulting accuracy value is 78,20%.
The research designs a recommendation system using the KNN method. The KNN method has several advantages: ease, effectiveness, intuitiveness, and competitive classification performance in many domains (Kbaier, Masri, & Krichen, 2017). In designing, besides requiring methods that can solve problems for a tourist attraction, it is essential to understand the characteristics of visitors (He, Ma, & Yang, 2015). According to (Ding & Ma, 2018), understanding visitors' characteristics is useful for planning and developing them. It is influenced by several factors, such as people's higher motivation and needs to travel. Circumstances like these cause problems for visitors to be increasingly confused in determining the choice of tourist destinations according to the tourists' characteristics or the desired criteria.
The contribution of the research is to design a tourist attraction recommendation system that visitors can visit as an alternative to selecting tourist objects in three groups of tourist objects according to the characteristics of visitors using the KNN. In addition, the manager of the tourist attraction can also assist in analyzing the visitors' interest in the development of these attractions, which are scattered in Bangkalan, Madura.

II. METHODS
Based on the recommendation system that will be built, there are several stages in solving the problems in the research, as shown in Figure 1. Designing and building a recommendation system for decision-making consider several alternative tourist destinations according to the characteristics of visitors. The first stage in building the system is conducting a literature review on the method and design of the recommendation system (Ding & Ma, 2018). At the scientific literature stage, it involves several stages of the process, including collecting various data sources from journals or library books related to research to be transferred, evaluating literature review sources, identifying themes and gaps between theory and field conditions (if any), creating an outline structure, and compiling a literature review (Pavlidis, 2018). Therefore, for the initial stage, a literature review process has been carried out to dig up information from various previous studies to assist the validation testing process after obtaining data before being processed by the KNN method.
In the recommendation stage, there are several basic types of recommendation system design, such as content-based recommendations that match something previously liked (Peng et al., 2010), collaborative filtering recommendations utilizing information by combining user desires (Chen, Xia, & Shi, 2012), and item-based collaboration based on user-supplied item similarity scores (Gupta & Katarya, 2019;Smirnov et al., 2013;Utama et al., 2017;Alrasheed, Alzeer, Alhowimel, & Althyabi, 2020). Researchers in modeling increasingly use several machine learning methods for prediction and recommendation systems. The classification model in the research predicts the data, and the value of the results has been found from different data. The research recommends several alternative tourist destinations according to the visitors' characteristics using samples from the dataset obtained from the Bangkalan Tourism Office (Linasari, Sumarah, & Andayani, 2016), such as the number of visitors, ticket types, ticket prices, day of the visit, and 21 tourist attractions (Badan Pusat Statistik Kabupaten Bangkalan, 2021). In addition, there are three types of tourism (religious, natural, and cultural), three classes of management services (low, medium, and high levels), and several other measurements (facilities, gender, age, occupation, educational background, and marital status). Hence, the results can recommend several alternative tourist destinations according to the visitors' characteristics.
Machine learning methods continue to be developed by several studies. The algorithm of machine learning in the research uses the KNN method, as seen in Figure 2 in general. In the data preprocessing process, it improves the data format by conducting data selection. The data are separated based on existing criteria.
The criteria in this preprocessing process have 11 indicators and 21 attractions located in Kwanyar, Konang, Galis, Socah, Bangkalan, Arosbaya, Geger, Kokop, Tanjung Bumi, and Sepulu Districts. The weighting values have been obtained for each indicator aiming to simplify the system in the calculation phase, as seen in Figure 3. Data preprocessing is done on the testing data. Testing data are new data from visitors' input to tourist objects. Next, data cleaning is to remove erroneous data and resolve data inconsistencies. Meanwhile, data integration combines data from various data storage sources to become one coherent data unit. Then, data transformation is the process of normalizing data, while data reduction is the acquisition of a volume reduction representation to produce a similar analysis. Next is the calculation with the KNN algorithm to make recommendations for tourist objects according to the visitors' characteristics.  The KNN method is included in the category of supervised learning that divides two data (Sun & Huang, 2010;Muliono, Lubis, & Khairina, 2020), namely training data, to make basic predictions. At the same time, testing data are used to predict by calculating the Euclidean distance into a vector (Okfalisa, Gazalba, Mustakim, & Reza, 2017). There are steps of the KNN method for recommendations (Satvika et al., 2018). First, it determines the value of k or the number of data on the nearest neighbors. Second, it calculates the distance from the training data to all testing data. Equation (1) is used to calculate the Euclidean distance. Third, sorting the distance is based on the smallest value of k. Last, it determines the testing data group based on the majority label on k (Hidayati & Hermawan, 2021). (1) In Equation (1), a i is the total variable of a, and b i is the total variable of b. The level of accuracy of tourist attraction type with the KNN method is strongly influenced by the magnitude of the value of k. The value of k states how many neighbors or data are closest to an object. The number of different neighbors will certainly affect the classification results of one object. The best value of k for KNN depends on the data.
Next, implementation is the design results that have been made previously. The implementation stage aims to produce an information system that fits the needs. At the implementation stage, it is an application that has been completed by the data needs that have been previously analyzed and designed. The application of the KNN method can be operated to recommend tourism objects in Bangkalan, Madura, to web-based visitors. The web is a responsive technique that is effective and supports human activities to optimize time better (Ningrum, Suherman, Aryanti, Prasetya, & Saifudin, 2019).
For the application testing stage, the system black-box test is carried out by finding several errors in several questions from the questionnaire results, such as point of view of the interface, data structure, or from external database access, as well as performance and termination (Ningrum et al., 2019;Jaya, 2018). This testing makes the quality of the software better. The first stage in this testing is identifying and testing the input to find the error. This test is used to complete the previous test, which measures the accuracy value of the KNN method's ability to recommend destinations according to the tourists' characteristics. Hence, this test is done only to observe the results of execution through testing data and check the functionality of the system without knowing what is happening in the detailed process (only knowing the inputs and outputs).
Testing the performance of the KNN method will recommend tourist objects to be used based on the accuracy calculation. The accuracy test is one of the validation tests with the closeness value between the results of the analysis and the actual Reference Standard Material (RSM) (Aprilia & Fachrurrozi, 2016;Salim et al., 2020). The performance of a classification algorithm is determined by testing the model formed with the testing data. This calculation is carried out by comparing the predicted value and the actual value provided by the user and the item with a percentage approach by comparing the correct data with the overall data shown in Equation (2)  The number of correct predictions is 1-((k-x)/(y-x)). Then, it has k as the Euclidean distance to k, x as the initial Euclidean distance, and y as the Tate Euclidean distance. (2)

III. RESULTS AND DISCUSSIONS
The research is conducted on 21 tourist objects in Bangkalan, Madura. The data collected are in the form of visitor's data with 315 datasets in the first stage of the data preprocessing process with indicator data, as shown in Table 1 (see Appendices). It shows the identity of the criteria and sub-criteria, with 11 criteria, and the weight value of each sub-criterion. The weight value shows the value of a tourist object densely packed with visitors from several sub-criteria as indicators in tourist object recommendations. There are three subcategories in the tourist object type feature: natural, cultural, and religious. Natural tourism contains natural elements, such as beaches and mountains. Cultural tourism has cultural features in Madura, such as the Palace and Karapan Sapi. Meanwhile, religious tourism refers to places of worship and tombs of the role model of the Madurese community and around them. For other criteria, the same rule applies. Again, there are several weight values.
Next, there are several types of tourist objects in Table 2 (see Appendices). Of the four regencies with Madura's most popular tourist objects, the first is Bangkalan, and the second is Sumenep. Table 2 (see  Appendices) shows the names of tourist objects in Madura, especially Bangkalan. The local community frequently visits 21 tourist objects in Bangkalan almost every holiday. This tourist spot is packed with visitors from various backgrounds and several types of tourist objects. For example, Jaddih Hill is a tourist attraction of limestone mountains, which are similar to Cappadocia, a popular tourist area in Turkey. The brown color of the Jaddih limestone hill is also similar to historic stones in the Cappadocia region. Moreover, the Jaddih Hill area is surrounded by trees, giving an excellent impression of this tourist area. On the other hand, the Cakraningrat Museum is a cultural tourism object with a precise location on Jalan Soekarno-Hatta, allowing people to learn some of the histories of this island.
The preprocessing data process converts raw data collected from various sources into cleaner information for further processing. The results of preprocessing data can be seen in Table 3 (see Appendices). It shows the development of the data processing process by carrying out the preprocessing process for data cleaning. Some data still require special handling, such as missing values and outliers. As a result of the analysis at the previous data preprocessing stage, many data need to be processed for missing values and outliers. Missing values are empty data in the data set. Meanwhile, outliers are data with characteristics or values that differ significantly from others and appear extreme. Then, the recommendation process is carried out using the KNN method by calculating the Euclidean Distance with the nearest k number. Next, the researchers transform the value for the qualitative value into a quantitative value so that calculations can be carried out to determine the distance.
Next, the implementation of a recommendation system for Bangkalan's tourism can be seen in Figure 4 (see Appendices). It is the interface display of the training data input before processing with the KNN model for tourist object recommendations. Training data are used to train or build a model. Then, the validation dataset is used to optimize the model's training with the preprocessing process. It aims to see the model's ability during training to recognize patterns in general. Dataset validation can also be used to see the accuracy of the model created.
The result of the recommendation can be seen in Figure 5 (see Appendices). It displays the results of the classification process with KNN in the form of requests for tourist objects consisting of several alternative tourist destinations that visitors can visit. The first alternative is a recommendation with a higher degree of certainty than the next alternative. The confidence level is obtained from the value of the nearest neighbor distance from the results of a model that has been adequately trained and can recognize patterns in general through a high score accuracy. The next step is to calculate the distance of each data object. About 247 data are identified as recommendations for natural tourism objects with the suitability of visitors' characteristics. Meanwhile, 130 data are identified as other types of tourist objects with 68 accurate data at a value of k = 1. The results can be seen in Figure 6 (see Appendices).
The accuracy result for k=1 is ((315-68)/315)×100%=78%. Then, in k=2, value of accuracy is ((315-76)/315)×100%=76%. Meanwhile, in k=3, it has ((315-80)/315)×100%=75%. Then, the accuracy value of k=4 is ((315-105)/315)×100%=66%. For k=5, the accuracy is ((315-133)/315)×100%=57%. These results can be seen in Table 4. So, the best value for k in this test is when the system uses a value of k = 1 with a value of 78%. The results show that the higher the k value is, the lower the accuracy will be. However, there are conditions where certain k values increase the accuracy. It is because the greater the value of k is, the more data is taken in producing the classification, so the more data are irrelevant and affect prediction errors. In addition to measuring the accuracy value, it tests the KNN method's ability to recommend tourist objects based on visitors' characteristics. The research also tests the application of web-based recommendations to see whether it is running well or not. Therefore, it is necessary to measure usability. Applications with low usability test results cause users do not want to reuse the application. The test is carried out by testing the application using a number of respondents as application users. They fill out a questionnaire to obtain a level of satisfaction in operating the application. A usability measurement approach using usability testing is the best way to do direct testing by users to identify problems in the design or service and study users' behavior and preferences. The questionnaire adapted from (Satvika et al, 2018;Shah, Patel, Sanghvi, & Shah, 2020) can be seen in Table 5.

Item Question
Easy of Use 1. Is this application easy to understand and use? 2. Can the application avoid errors in its use quickly and easily? 3. Users do not notice any inconsistencies during the use. 4. Is the menu display in the application easy to recognize? Ease of Learning 5. Is the application easy to learn? 6. Is this application easy to remember while using it? Satisfaction 7. Does the app work as expected?
8. Is the application comfortable to use? Usefulness 9. Is the application useful for users?
10. Does the application have the capabilities and functions as expected? Then, the questionnaires are distributed to 63 respondents from 21 tourist attractions in Bangkalan, with three types of tourism (natural, religious, or cultural). In addition, the questionnaire results are measured using a Likert scale calculation with a vulnerable scale of 1 to 5, as seen in Table 6. It is used to measure a person's opinion and perception that has been specifically determined by the research. The answers to each instrument item (questionnaire) on a Likert scale from very positive to very negative are in the form of words and are given a score. If the end user is satisfied, the software can be used effectively by the user.
The results of the usability level of the recommendation system of tourist attractions in Bangkalan based on visitors' characteristics are satisfied with 79%. The feasibility category value is between 61−80%. It means that the overall tourist attraction recommendation system is suitable for visitors. This application is easy to use, easy to learn, satisfying, and useful for tourists. The results of the questionnaire data analysis to measure the level of effectiveness can be seen in Table 7 (see Appendices). From 56 respondents, the average of all statement items is 4. It is in the agree category. Hence, the design of the segmentation system in the research is very effectively developed to help the tourism office and tourist object managers.

IV. CONCLUSIONS
From the results of the analysis that has been carried out, it can be concluded that the methods implemented in the tourist attraction recommendation system provide solutions for visitors in getting recommendations for tourist objects according to their characteristics. Using the KNN method, the recommendation results have also achieved accuracy and followed the rating value given by the visitors to the tourist attraction. With the system accuracy calculation formula, the predicted value of visitor ratings for tourist objects is obtained with a high accuracy value of around 78% with k=1. The addition of the number of k in each distance calculation reduces the accuracy of the classifier because the greater the number of k is, the more data do not have the same class compared to the correct class.
In the future, the researchers will develop the research by applying other classification methods.