N-Gram Feature for Comparison of Machine Learning Methods on Sentiment in Financial News Headlines

Arif Mudi Priyatno; Fahmi Iqbal Firmananda

doi:10.31004/riggs.v1i1.4

Authors

Arif Mudi Priyatno Bisnis Digital, Fakultas Ekonomi dan Bisnis, Universitas Pahlawan Tuanku Tambusai https://orcid.org/0000-0003-3500-3511
Fahmi Iqbal Firmananda Bisnis Digital, Fakultas Ekonomi dan Bisnis, Universitas Pahlawan Tuanku Tambusai

DOI:

https://doi.org/10.31004/riggs.v1i1.4

Keywords:

N-Gram, Multinomial Naïve Bayes, Logistic Regression, Support Vector Machine, multi-layer perceptron, Stochastic Gradient Descent, Decision Trees, sentiment analyst

Abstract

Sentiment analysis is currently widely used in natural language processing or information retrieval applications. Sentiment analysis analysis can provide information related to outstanding financial news headlines and provide input to the company. Positive sentiment will also have a good impact on the development of the company, but negative sentiment will damage the company's reputation. This will affect the company's development. This study compares machine learning methods on financial news headlines with n-gram feature extraction. The purpose of this study was to obtain the best method for classifying the headline sentiment of the company's financial news. The machine learning methods compared are Multinomial Naïve Bayes, Logistic Regression, Support Vector Machine, multi-layer perceptron (MLP), Stochastic Gradient Descent, and Decision Trees. The results show that the best method is logistic regression with a percentage of f1-measure, precision, and recal of 73.94 %, 73.94 %, and 74.63 %. This shows that the n-gram and machine learning features have successfully carried out sentiment analysis.

Downloads

Download data is not yet available.

References

S. Kurniawan, W. Gata, D. A. Puspitawati, N. -, M. Tabrani, and K. Novel, “Perbandingan Metode Klasifikasi Analisis Sentimen Tokoh Politik Pada Komentar Media Berita Online,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 3, no. 2, pp. 176–183, 2019, doi: 10.29207/resti.v3i2.935.

K. Mishev, A. Gjorgjevikj, I. Vodenska, L. T. Chitkushev, and D. Trajanov, “Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers,” IEEE Access, vol. 8, pp. 131662–131682, 2020, doi: 10.1109/ACCESS.2020.3009626.

E. Kušen and M. Strembeck, “Politics, sentiments, and misinformation: An analysis of the Twitter discussion on the 2016 Austrian Presidential Elections,” Online Soc. Networks Media, vol. 5, pp. 37–50, 2018, doi: 10.1016/j.osnem.2017.12.002.

S. M. Shuhidan, S. R. Hamidi, S. Kazemian, S. M. Shuhidan, and M. A. Ismail, “Sentiment Analysis for Financial News Headlines using Machine Learning Algorithm,” in Advances in Intelligent Systems and Computing, vol. 739, A. M. Lokman, T. Yamanaka, P. Lévy, K. Chen, and S. Koyama, Eds. Singapore: Springer Singapore, 2018, pp. 64–72. doi: 10.1007/978-981-10-8612-0_8.

P. Malo, A. Sinha, P. Korhonen, J. Wallenius, and P. Takala, “Good debt or bad debt: Detecting semantic orientations in economic texts,” J. Assoc. Inf. Sci. Technol., vol. 65, no. 4, pp. 782–796, 2014, doi: 10.1002/asi.23062.

S. Taj, B. B. Shaikh, and A. Fatemah Meghji, “Sentiment Analysis of News Articles: A Lexicon based Approach,” in 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Jan. 2019, pp. 1–5. doi: 10.1109/ICOMET.2019.8673428.

P. R. Nagarajan, M. T, S. R. A. M. Hari, K. K, and M. G, “Certain Investigation On Cause Analysis Of Accuracy Metrics In Sentimental Analysis On News Articles,” in 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Dec. 2021, pp. 1033–1037. doi: 10.1109/ICECA52323.2021.9675846.

A. Agarwal, “Sentiment Analysis of Financial News,” in 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), Sep. 2020, pp. 312–315. doi: 10.1109/CICN49253.2020.9242579.

W. van Atteveldt, M. A. C. G. van der Velden, and M. Boukes, “The Validity of Sentiment Analysis: Comparing Manual Annotation, Crowd-Coding, Dictionary Approaches, and Machine Learning Algorithms,” Commun. Methods Meas., vol. 15, no. 2, pp. 121–140, Apr. 2021, doi: 10.1080/19312458.2020.1869198.

J. Nothman, H. Qin, and R. Yurchak, “Stop Word Lists in Free Open-source Software Packages,” in Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 2018, pp. 7–12. doi: 10.18653/v1/W18-2502.

A. M. Priyatno, M. M. Muttaqi, F. Syuhada, and A. Z. Arifin, “Deteksi bot spammer twitter berbasis time interval entropy dan global vectors for word representations tweet’s hashtag,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 5, no. 1, pp. 37–46, Jan. 2019, doi: 10.26594/register.v5i1.1382.

R. Aulianita, L. Utami, N. Musyaffa, G. Wijaya, A. Mukhayaroh, and A. Yoraeni, “Sentiment Analysis Review Of Smartphones With Artificial Intelligent Camera Technology Using Naive Bayes and n-gram Character Selection,” J. Phys. Conf. Ser., vol. 1641, no. 1, p. 012076, Nov. 2020, doi: 10.1088/1742-6596/1641/1/012076.

M. A. P. Subali and C. Fatichah, “Kombinasi Metode Rule-Based dan N-Gram Stemming untuk Mengenali Stemmer Bahasa Bali,” J. Teknol. Inf. dan Ilmu Komput., vol. 6, no. 2, p. 219, 2019, doi: 10.25126/jtiik.2019621105.

C. D. Manning, P. Raghavan, and H. Schütze, “Text classification and Naive Bayes,” in Introduction to Information Retrieval, no. c, Cambridge University Press, 2008, pp. 234–265. doi: 10.1017/CBO9780511809071.014.

“sklearn.naive_bayes.MultinomialNB — scikit-learn 1.1.1 documentation.” https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html (accessed Jul. 30, 2022).

J. L. Morales and J. Nocedal, “Remark on ‘algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization,’” ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1–4, Nov. 2011, doi: 10.1145/2049662.2049669.

“sklearn.linear_model.LogisticRegression — scikit-learn 1.1.1 documentation.” https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html (accessed Jul. 30, 2022).

C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1–27, Apr. 2011, doi: 10.1145/1961189.1961199.

J. Platt and others, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Adv. large margin Classif., vol. 10, no. 3, pp. 61–74, 1999.

“sklearn.svm.SVC — scikit-learn 1.1.1 documentation.” https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html (accessed Jul. 30, 2022).

X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” J. Mach. Learn. Res., vol. 9, pp. 249–256, 2010.

K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec. 2015, vol. 2015 Inter, pp. 1026–1034. doi: 10.1109/ICCV.2015.123.

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–15, Dec. 2014, [Online]. Available: http://arxiv.org/abs/1412.6980

“sklearn.neural_network.MLPClassifier — scikit-learn 1.1.1 documentation.” https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html (accessed Jul. 30, 2022).

B. Zadrozny and C. Elkan, “Transforming classifier scores into accurate multiclass probability estimates,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’02, 2002, p. 694. doi: 10.1145/775047.775151.

“sklearn.linear_model.SGDClassifier — scikit-learn 1.1.1 documentation.” https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html (accessed Jul. 30, 2022).

A. Cutler, D. R. Cutler, and J. R. Stevens, “Random forests,” in Ensemble Machine Learning: Methods and Applications, Boston, MA: Springer, Boston, MA, 2012, pp. 157–175. doi: 10.1007/9781441993267_5.

“Random forests - classification description.” https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm (accessed Jul. 30, 2022).

A. M. Priyatno, F. M. Putra, P. Cholidhazia, and L. Ningsih, “Combination of extraction features based on texture and colour feature for beef and pork classification,” J. Phys. Conf. Ser., vol. 1563, no. 1, p. 012007, Jun. 2020, doi: 10.1088/1742-6596/1563/1/012007.