Artificial and Digital Business N-Gram Feature for Comparison of Machine Learning Methods on Sentiment in Financial News Headlines

Sentiment analysis is currently widely used in natural language processing or information retrieval applications. Sentiment analysis analysis can provide information related to outstanding financial news headlines and provide input to the company. Positive sentiment will also have a good impact on the development of the company, but negative sentiment will damage the company's reputation. This will affect the company's development. This study compares machine learning methods on financial news headlines with n-gram feature extraction. The purpose of this study was to obtain the best method for classifying the headline sentiment of the company's financial news. The machine learning methods compared are Multinomial Naïve Bayes, Logistic Regression, Support Vector Machine, multi-layer perceptron (MLP), Stochastic Gradient Descent, and Decision Trees. The results show that the best method is logistic regression with a percentage of f1-measure, precision, and recal of 73.94 %, 73.94 %, and 74.63 %. This shows that the n-gram and machine learning features have successfully carried out sentiment analysis.


Introduction
Information and communication technology are currently developing very rapidly. Various things can be done with technology, especially with the presence of artificial intelligence. One example that can be done is conducting a sentiment analysis on news headlines. Sentiment analysis on text can be done using text mining.
Sentiment analysis is the process of categorizing opinions or topics in the text. This process is to determine whether opinions or topics are positive, negative, or neutral [1]. Sentiment analysis research has been carried out in various sectors, such as finance [2], politics [3], companies [4], and so on. Sentiment analysis on financial and company news will have an effect on the credibility of a company. Sentiment has a big influence on companies that are open or listed on the stock exchange.
This study compares machine learning methods for sentiment analysis with the n-gram feature in the case of financial news and corporate headlines. Data from research [5] with a total of 4846 titles was used. The number of neutral sentiments is 2879, the number of positive sentiments is 1363, and the number of negative sentiments is 604. The data is preprocessed by deleting things that are not needed. Preprocessing results are given weight by the n-gram method. The n-gram features are analyzed by sentiment analysis using several machine learning methods, including Multinomial Nave Bayes, Logistic Regression, Support Vector Machine, multi-layer perceptron (MLP), Stochastic Gradient Descent, and Decision Trees. Sentiment analysis performance is calculated using precision, recall, and f1-measure.
Related research is [6] conducting sentiment analysis on news articles based on lexicon. The data used is BBC News from 2004 to 2005 (http://mlg.ucd.ie/datasets/bbc.html).
The study concluded that articles on business and sports topics had positive sentiments, while topics related to entertainment and sports had negative sentiments.
Research [7] investigated the metrics of measuring sentiment analysis in news articles. The weighting method used is term frequency-inverse document frequency (TF-IDF). The classification method uses Gaussian Naive Bayes classifiers and linear support vector machines (linear SVM). The best results were obtained using the linear SVM method, with 62 percent, while Gaussian Bayes got 61 percent. Journal of Artificial Intelligence and Digital Business (RIGGS) Vol. Research [9] conducted a comparison of analytical methods on news headlines in the Netherlands. The comparisons are between manual annotation, crowd coding, numerous dictionaries, machine learning, and deep learning. The conclusion is that the best results are done by humans or crowd coding. Numerous dictionaries indicate that the level of validity is acceptable, and machine learning and deep learning are substantially better than numerous dictionaries but are still far from human performance.

Research Methods
The research process includes data collection, preprocessing, weighting using n-grams, sentiment classification, and performance measurement. Figure 1 shows the stages of the research carried out.

Data
The data is the text of financial news and company press releases. Sentiment marking was carried out by a group of 16 annotators with adequate business education backgrounds. The data uses research [5] with a total of 4846 news titles. The number of neutral sentiments is 2879, the number of positive sentiments is 1363, and the number of negative sentiments is 604.

Pre-processing
This stage is carried out to prepare data that can be used in the next stages. Preprocessing cleans text data from common words (tokens) that do not have the necessary meaning. This is done to reduce noise in text data. Figure 1 shows the processes contained in the preprocessing, namely case folding, tokenization, stopword removal, and steaming. Case folding is the first step to making letters uniform by changing all uppercase letters to lowercase letters. For example, "According to Gran, the company has no plans to move all production to Russia, although that is where the company is growing" becomes "according to gran, the company has no plans to move all production to Russia, although that is where the the company is growing". Tokenization is the stage of cutting sentences into smaller tokens (terms). For example "according to gran, the company has no plans to move all production to russia, although that is where the company is growing." be ["according", "to", "gran", "the", "company", "has", "no", "plans", "to", "move", "all", "production", "to", "russia", "although", "that", "is", "where", "company", "is", "growing"]. Stop word removal is the stage of removing general terms and has no effect. Examples of deleted terms are 'its', 'again', 'for', 'myself', 'his', and so on [10]. Steaming is the process of returning terms to their basic words [11]. For example, "plans" becomes "plan", "growing" becomes "grow".

Term Weighting
An N-gram is a cut-up sentence or string into smaller ones according to the specified N characters. Gram is defined as a sub sequence of N characters that are worked out [12]. The n-gram method is used with the aim of giving the meaning of a sequence of words or characters in a sentence. The n-gram method is used to take n character pieces from sentences that are continuous. The continuity in question is from the beginning to the end of the document. An N-gram is classified based on n characters. In general, the n-gram is done by adding additional blanks at the beginning and at the end [13]. For example, the sentence "accord gran" is processed by n-grams, the blank is symbolized by "_", resulting in n-grams in Table 1.
N-Gram has the advantage that matching every word can be used even if there is an interpretation of certain codes, such as postal codes. In natural language processing applications, certain numbers and codes are considered "noise". This is considered an advantage because of the decomposition of small parts that still have meaning. This textually has no effect.
In this study, the n-gram used is not broken down into small characters, but the n-gram is used to break sentences into words. An example of the sentence "according to the grand company plan to move production in Russia although the company grows" is carried out by the n-gram process and the results can be seen in Table 2. The term weighting in this study uses the number of grams that appear, or better known as term frequency (TF).

Sentiment Classification
This study compares several classification methods.

Performance Measurement
Performance evaluation in this study uses a confusion matrix. Performance evaluations based on the confusion matrix include precision, recall, and f1measure [11] [29].

Results and Discussions
The data used in this study amounted to 4846 financial news headlines. The number of neutral sentiments is 2879, the number of positive sentiments is 1363, and the number of negative sentiments is 604. The data was divided into training data and test data with percentages of 70:30, 80:20, and 90:10. Table 3 shows the results of the distribution of training data and test data.
The data is pre-processed to clean text data. The preprocessing processes include case folding, tokenization, stop word removal, and steaming. Table 4 shows the results of the pre-processing.
The results of preprocessing are sentiment classification with several machine learning methods. Tables 5, 6, and  7 show the classification results.
The multinomial method gets the best results when the percentage division is 70:30, which is f1-measure 69.79 percent. The logistic regression method got the best results in the 80:20 division, namely an f1-measure of 73.94 percent. The support vector machine method gets the best results in the 80:20 division, which is 70.07 percent.
The multi-layer perceptron method got the best results at 90:10 division, which is 72.76 percent. This is because the multi-layer perceptron method requires a lot of training data in order to maximize its hidden layer to find out its classification. If the amount of training data is small, the hidden layer multi-layer perceptron will not run optimally. In the future, if you want to use multi-layer perceptrons, please increase the training data, either by adding manuals or by using data augmentation methods.
The stochastic gradient descent method gets the best results at a 90:10 division, which is 71.16 percent. The decision tree method got the best results in the 70:30 division, which was 67.99 percent. Overall, the best results were obtained by the logistic regression method. The difference in the results that are not far shows that the process method is appropriate and there is no underfitting or overfitting. This good result can be optimized again by using a better feature extraction method such as using fasttext weighting, glove or : 'according gran', 'gran company', 'company plan', 'plan move', 'move production', 'production russia', 'russia although', 'although company', 'company grow' Trigram : 'according gran company', 'gran company plan', 'company plan move', 'plan move production', 'move production russia', 'production russia although', 'russia although company', 'although company grow' Quadgram : 'according gran company plan', 'gran company plan move', 'company plan move production', 'plan move production russia', 'move production Russia although', 'production russia although company', 'russia although company grow' others. This is, of course, by adding the transfer learning that has been done.

Conclusion
This study compares machine learning methods for sentiment classification with n-gram feature extraction. The data is used for a total of 4868 news headlines. The n-gram weighting and classification with machine learning showed good results. The highest results were obtained, namely f1-measure, precision, and recall of 73.94, 73.94, and 74.63. Based on the results of the tests that have been carried out, n-gram is able to extract features from existing data.