Journal of Artificial Intelligence and Digital Business Dice Similarity and TF-IDF for New Student Admissions Chatbot

CS is one of the most important functions of any client-related organization, whether a business or a school (customer service). Notably from the committee responsible for student selection, CS, on the other hand, has a very limited capacity to be handled by humans, which can reduce university satisfaction. Therefore, we require technological assistance, which in this case takes the form of an AI-based chatbot. The objective of this study is to design and develop a chatbot system utilizing NLP (natural language processing) to aid the CS of the new student admissions committee at Pahlawan Tuanku Tambusai University in answering questions from prospective new students. The employed method is dice similarity weighted by TFIDF. The results of the conducted tests indicated that the recall rate was 100 percent and the precision reached 76.92 percent. The evaluation results indicate that the chatbot can effectively respond to questions from prospective students.


Introduction
As a result of the rapid advancement of technology and information in this age of globalization, numerous fields of labor, including education, are required to adapt to the changes brought about by these developments. Technology is being used by an increasing number of educational institutions across the globe in order to enhance the standard of the services they provide, with the true objective of drawing in more students. In addition to the quality of the education that is offered, a university needs to make it easier for prospective students to enroll at the college by providing information services that are both prompt and accurate. This is necessary in order to win the satisfaction of prospective students [1].
There are a multitude of precautions in place to expedite the delivery of information to prospective new students, including telephone service and live chat support from customer service. The next step in disseminating this information is crucial so that the message reaches prospective students as soon as possible, allowing them to immediately prepare everything needed to enroll in college. This service requires customer service to be capable of responding to questions from prospective students within 24 hours [2]. However, the capacity of customer service itself is also limited, and representatives of the department require sufficient time to respond to inquiries from prospective students [1]. In addition, an increase in prospective students during the admissions process will almost certainly lead to an increase in the number of questions asked, as well as an increase in the amount of time spent answering those questions; this, in turn, may lead to a decrease in overall satisfaction with the university [3].
The use of an artificial intelligence program known as a chatbot is one of the steps that can be taken to make it simpler for a customer service representative to respond to these questions. The Chatbot itself is a piece of software that utilizes Natural Language Processing (NLP), which is a subset of Artificial Intelligence (AI). The model for the Chatbot was derived from Human Computer Interaction (HCI), which allows computers to communicate with humans through text [4]. Because questions will be answered by the chatbot around the clock, prospective students who are looking for information will have a much simpler time finding what they need thanks to the creation of this chatbot [3].
For this study purpose, a chatbot was developed that can respond to messages from prospective new students using human everyday language. In this study, a chatbot will be developed that will act as an agent to assist a customer service assistant in answering questions from Journal of Artificial Intelligence and Digital Business (RIGGS) Vol. prospective students 24 hours a day, seven days a week. The chatbot will respond to messages from prospective students using NLP, which will include preprocessing and a weighting process called TFIDF to generate a value that will be used in the dice similarity process. Dice Similarity is useful so that the chatbot can find answers from user input by comparing the answers with all documents stored in the knowledge base, so that the chatbot application itself can assist customer service in automatically and flexibly serving prospective students. Figure 1 displays the multiple plots utilized in this study.

Collecting Data
For the purpose of this study, the data collected are comprised of information taken from a number of questions and responses that have been sent to customer service by Pahlawan Tuanku Tambusai University since 2017. There are a total of 42 questions that were compiled after being asked repeatedly by prospective students to the Computer Science program at Pahlawan Tuanku Tambusai University. The questions that prospective students have asked and the responses that they have received are presented in Table 1.

Preprocessing
During the text mining process, raw data is meaningless and of no use. It is necessary to process the raw data before it can be read by a computer [5]. Preprocessing is the name given to the method that is used to process the raw data itself [6]. The preprocessing stage of this research makes use of the following four processes:

Case Folding
The process of changing all of the uppercase letters and symbols in the message into lowercase letters and symbols is referred to as case folding [7]. Table 2 displays an example of the use of case folding that was performed for the purpose of this research.

Tokening
The process of separating a piece of text into its component sentences is known as tokening [5]. Table 3 illustrates an application of tokening that was performed for the purpose of this research.

Filtering (Stopword Removal)
A filtering procedure is used to eliminate words that were deemed unimportant during the preceding procedure [6]. Table 4 contains examples of the use of filtering in this study.

Stemming
Stemming is the final step of the preprocessing that is being done for this research. The act of "stemming" in and of itself refers to the process of removing an affix from the text. The use of stemming in this study can be seen in Table 5. TF-IDF (Term Frequency-Inverse Document) is a step in document weighting that will be used to extract information from a document. This algorithm is one that is frequently used to convert text into a meaningful value [8].. The TF-IDF equation is shown in the following equation (1).
!,# is a representation of the total number of occurrences of the i-th term in the document, ! is a representation of the entirety of the document that contains the i-th term, and N is a representation of the total number of documents.

Dice Similarity
One of the ways that the degree of similarity between two things can be determined is through the use of the dice similarity method. The value of k-grams is calculated for documents that are compared with dice similarity in order to determine their level of similarity. The query and the document were measured against one another, and the returned document is the document that was obtained from that measurement [6]. The equation that describes the similarity between dice is shown down below in equation (2).
Where s is the total similarity value, A and B are each document's inputs.

Evaluation
Evaluation is an essential part of this research project, as it allows the researchers to check whether or not the developed bot system is functioning appropriately and in line with their aims. In the course of this research, the test scenario was developed by posing 42 questions to the chatbot. Recall and precision methods are utilized in the process of self-measurement. The level of success that a system has in finding information is referred to as its recall, and the matching of a piece of data with the necessary information is referred to as its precision [1] = -. -.,/$ = -. -.,/.
In this study, the term True Positive (TP) refers to the number of correct answers provided by the chatbot, False Negative (FN) is an answer that the chatbot is unable to answer, and False Positive (FP) is the answer provided by the chatbot, but the results do not match with what you want.

Results and Discussions
This research makes use of Python version 3.9 as its programming language. Additionally, the research makes use of the Telegram application and BotFather as a supplier of bots for use in Telegram. Python functions are used for preprocessing, TF-IDF weighting, dice similarity, and for bots to respond to messages. These functions also allow for dice similarity. Calculating the TF-IDF requires the following coding, which is as follows:
return doc_ans When everything is done, each of these functions will be saved in one file. Following the execution of the file, the bot that had been previously created and given the name @penerimaan_pahlawan will immediately reply to the message automatically

Weighting Results with TF-IDF and Dice Similarity
After the text provided by the prospective student has been preprocessed, the program will now process the sentence that is the most comparable to the one in the query. In this instance, the sentence will be the data that was previously entered. The actual test was administered 42 times, and in the table that was provided as an example, there were 15 different possible responses. Table 6 presents the findings of the examinations that were carried out, including the results of the measurement of the text's degree of similarity as well as the weight that was assigned to the calculation of dice similarity.   The Confusion Matrix method was utilized in order to carry out the evaluation, and the parameters that were looked at were precision and recall. Recall rates of one hundred percent and precision rates of 76.92 percent were found to be the most impressive aspects of the chatbot's performance in the evaluation based on the results of several questions asked of it.

Conclusion
For the purpose of this investigation, a chatbot application was developed with the assistance of Telegram for the procedure of Pahlawan Tuanku Tambusai University's admission of new students.
The conclusion that can be drawn from the outcomes of the tests that were conducted using the chatbot and the responses that were obtained from it is that the recall rate reaches 100% while the precision level is approximately 76.92%. In light of these findings, the development of a chatbot as part of the procedure for admitting new students to Pahlawan Tuanku Tambusai University can be utilized to provide responses to questions posed by prospective new students.