Classification Of Tomato Maturity Levels Based on RGB And HSV Colors Using KNN Algorithm

Tomatoes (Lycopersiconeculentum Mill) are vegetables that are widely produced in tropical and subtropic areas. According to (Harllee) tomatoes are grouped into 6 levels of maturity, namely green, breakers, turning, pink, light red, and red. One way that can be used to classify the level of maturity of tomatoes in the field of informatics is to utilize digital image processing techniques. This study classifies the maturity of tomatoes using K-Nearest Neighbor (KNN) based on the Red Green Blue and Hue Saturation Value color features. The KNN algorithm was chosen as a classification algorithm because KNN is quite simple with good accuracy based on the minimum distance using Euclidean Distance. The research conducted received the highest accuracy result of 91.25% at the value of K = 7 with the test data 80. This shows that the KNN algorithm successfully classified the maturity of tomatoes by utilizing the color image of RGB and HSV.


Tomatoes
(Lycopersiconeculentum Mill) are vegetables that are widely produced in tropical and subtropical areas [1]. Tomato plants are horticultural commodities that are needed by the community and become a basic need in Indonesia. However, in the industry and tomato farmers themselves when detecting tomato maturity is still done manually, namely by visual observation directly on the fruit [2].
Manual observations produce a level of maturity that is less uniform and unsatisfactory [3]. This process is very dependent on the subjectivity of officers when sorting tomatoes. This manual observation requires a long time and the products produced are also very diverse. This is due to human visual limitations, fatigue when working, and differences of opinion about the quality of the fruit [4].
Because the manual method has many weaknesses, so it takes a method that can choose and classify the level of maturity of tomatoes well. This process is carried out to reduce the risk of rotten to the tomatoes [5]. There are several factors that can be used as guidelines in seeing the level of maturity of tomatoes, including from the size, shape, texture, and color. Color is the most easily used characteristic in seeing the level of tomato maturity [3]. According to (Harllee) tomatoes are grouped into 6 levels of maturity, namely Green, Breakers, Turning, Pink, Light Red, and Red [5]. One way that can be used to classify the level of maturity of tomatoes in the field of informatics is to utilize digital image processing techniques. Digital image processing techniques are used because digital images are able to choose agricultural products automatically [4]. So that it can reduce the risk of rotten in tomatoes.
Research on the classification of tomato maturity levels has been carried out by [6]. The study used the HSV algorithm as a color feature and LVQ algorithm as classification. The study used tomato data set from one side and got an accuracy of 83.75%. Based on research related to the classification of the maturity level of tomatoes, research is needed to be carried out on the four sides of tomatoes. This is because not all parts of the tomatoes have the same color.
Research [7] Comparing the Hue Saturation Intensity (HSI) and Hue Saturation Value (HSV) (HSV) color features in detecting rose flowers. HSV color features get better accuracy compared to HSI. Research [8] Classifying the image of beef and pork using KNN get the percentage of accuracy of 93.33%.
Based on the problems and related research that has been explained, this study classifies the level of tomato maturity using K-Nearest Neighbor (KNN) based on RGB and HSV color features. The KNN algorithm was chosen as a classification algorithm because KNN is quite simple with good accuracy based on the minimum distance using Euclidean Distance. The algorithm used is expected to be able to classify the maturity of tomatoes so that it can reduce the problem of spoilage tomatoes and get better results from previous studies.

Research Methods
This study proposes a strategy to see the level of tomato maturity.

Figure 1 Research Stages
The level of tomato maturity based on the red, green, and blue (RGB) color features and from the Hue Saturation Value (HSV) from the tomatoes is classified using the K-Nearest Neighbor (KNN) algorithm.
The classification process of tomato maturity levels consists of training and testing. The training process is used to build and train a model of the image data used, the testing process is used to see the success rate of the model built.
Research conducted consists of preprocessing stages, feature extraction, modeling and evaluation. Preprocessing stage is done to prepare image data by removing background and uniforming image pixel size. The pre-processing stage conducted in this study is cropping and resize.
The feature extraction stage in this study uses color features consisting of RGB color features and HSV color features. The feature extraction process is done to get the features needed from a image. The feature value obtained from the process of extraction of color features is used as input in the classification process. The classification process used in research utilizes the Machine Learning technique using the K-Nearest Neighbor (KNN) algorithm.

Image data
The data used in this study is the image of tomato fruit classified in 5 class levels representing 5 levels of maturity, namely Green, Turning, Pink, Light Red, and Red according to Figure 2. The maturity level of tomato breakers in Figure 2 is combined with the Green class because the breakers class is more dominant in dark green, and only 10% contains a brownish yellow color on its surface [10]. The data was taken from research [9] using plum tomatoes with image acquisition using a 24.3 megapixel DSLR camera. The image data format in the study is PNG.
Collecting image data using a white background by positioning the image object in the middle. The data used in this study amounted to 400 images.

Preprocessing
After obtaining tomato image data, then preprocessing data is carried out to prepare data in accordance with the research needs. The initial stage of preprocessing data conducted in the study was cropping according to Figure 2. The cropping process is carried out to facilitate the system in processing the image used by taking the object needed and removing the background in the image. The results of the cropping process in this study are square. Cropping is done manually using the Photoshop CS6 application.

Figure 4 Resize Process
The next step is the resize process according to Figure  4. The resize process is done by changing the size of the image pixel according to the desired size in the study. This study uses a 400x400 pixel resize size.

RGB color space
RGB is a color space resulting from the acquisition of color frequency by an electronic sensor in the form of analog signals. The RGB color space consists of 3 basic colors, red, green and blue ( Figure 4) [11]. Of the three basic colors, 224 or 16,777,216 colors can be formed [12]. The combination of red and green colors produces yellow, red and blue combination produces purple. The combined blue and green colors produce cyan colors. While the combination of red, green, and blue produces white when it has the same intensity, which is 255. The lower the intensity value of the three colors will produce a gray color from bright to dark (gray level) to the black color when the three colors value This is the same as zero [12].

Hue Saturation Value (HSV)
The HSV color model is a derivative of the RGB color model [14], but the HSV color model is better than the RGB color space. This is because HSV can express color shadows, color hue, color degree and color contrast [15]. The HSV color model has 3 main components [16], [17] which can be seen in Figure 6 based on the following information.
1. Hue represents the basic color that has a range of 0 to 360 ° according to Figure 4. Point 0 is a color that varies from red, yellow, green, cyan, blue and magenta then return to red. 2. Saturation represents the level of purity or strength in a color that has a range of 0 to 1. The value of 0 here is a color that is nuanced gray until there is no white component.
= max( , , ) Where V is the maximum value (R, G, B), S is the saturation value, H is the Hue value.

K-Nearest Neighbor
K-Nearest Neighbor (KNN) is an algorithm that is often used as a classification. KNN is a supervised learning algorithm by storing training data and comparing data that has not been classified in the training data [19]. The KNN algorithm is one of the non -metric methods in the recognition of patterns. This algorithm groups objects based on the closest features by finding the closest distance between data and neighboring values (K) [20].

Calculate the distance using Euclidean
Distance for each object to new data according to equation (11) [22].
The PI value is a training data with Qi Data Testing, i is a data variable and n is the dimension of the data. 3. Sorting the object based on the minimum distance according to the value k.
4. Adjust the Y class label to the settings that have been set. 5. Looking for the number of classes from the closest prudence value as a basis for determining the class of new data.

Results and Discussions
Experimental testing conducted in the study according to table 2. Value k The process of determining the distance between the relationship between the KNN algorithm.
The amount of data sets used in research 400 image data taken from 100 tomato images. This study uses Plum type tomatoes. To determine the level of maturity of tomatoes can use the extraction of color features. This is because the color of tomatoes is a very important factor in determining the level of maturity of the tomatoes. Extraction of the color features used is RGB and HSV. Extraction of RGB color features can be rated in Table 3. After the feature extraction process is carried out, the testing process is then carried out using tomato image data based on HSV color extraction. The distribution of training data and test data is carried out with a percentage of division that is appropriate in Table 2. Testing using the KNN algorithm with several test scenarios, namely the distribution of data and the value of the Determination (K). The test results are determined by confusion matrix to calculate accuracy [23]. The results of testing the model carried out for the overall percentage of data distribution and K value can be seen in Figure 7.
Based on the results of the model testing using the KNN algorithm based on the color feature for the classification of tomato maturity levels in Figure 8. The highest accuracy results are located at the progress value (K = 7) with a percentage of accuracy of 91.25% with the amount of 80 data test data. The test scenario conducted at the highest accuracy can be seen using confusion matrix in Table 5.