The Influence of Transformational Education Prediction on Softskills of Madrasa Student using Data Mining

— Currently, transformative education is oriented towards students' independence in solving a problem they face. In other words, transformative education has its own influence on the soft skills of students. This research was conducted with the aim of providing predictions regarding the effect of transformative education on the soft skills possessed by students. The subjects in this study were students of Madrasah


I. INTRODUCTION
Transformative education strategy is a new perspective on education that was pioneered as an alternative solution to educational problems that have not been fully resolved [1], [2]. Therefore, educational reform is to solve existing problems in the field of education, adjust to the direction of development in the field of education, and bring more hope for future progress [3], [4]. So, education must be dynamic and become a torch in the race and confrontation for social change. The need for transformative education is based on VOLUME 1 ISSUE 1 (2023) DOI: https://doi.org/10.15575/kjrt.v1i1.157 21 existing education, we still need various references and guidelines so that education in Indonesia can function properly. Do not be influenced by the surroundings.
Education is often referred to as an effort to improve academic abilities and some abilities that are measurable and can be learned, in this case hard skills . However, through this transformative education, solutions to these problems began to emerge. This is reflected in the orientation of students' transformative education independently solving problems encountered in class and in the environment, students' habits of studying in groups, habits of independence, initiative, creativity and productivity, and planning their lives in the future. In other words, this transformative education has its own tasks and impacts on personality and character, namely soft skills [1].
Education is often referred to as an effort to improve academic abilities and some abilities that are measurable and can be learned, in this case hard skills. However, through this transformative education, solutions to these problems began to emerge. This is reflected in the orientation of students' transformative education independently solving problems encountered in class and in the environment, students' habits of studying in groups, habits of independence, initiative, creativity and productivity, and planning their lives in the future. In other words, this transformative education has its own tasks and impacts on personality and character, namely soft skills.
In addition, madrasas as formal education have a distinction compared to other formal education by prioritizing character education through good religious education (aqidah, akhlaq, fiqh) [5]- [9]. The influence of transformative education on the soft skills of madrasah students is an interesting thing to be analysed. Data mining technology can be a solution to find out the effect of transformative education on the soft skills of madrasah students. Therefore, this study aims to determine the effect of transformative education on students' soft skills using a supervised learning approach with the Naive Bayes and K-Nearest Neighbour algorithms.

A. Research Activities
This research begin with problem identification, data collecting, data mining implementation with Naïve Bayes and KNN algorithm, then model evaluation using confusion matrix. In the problem identification, this research determines the problem formulation of the research. In this case observing the problems related to the prediction of the effect of transformative education on the soft skills in madrasah. The locus of this research in Madrasah Aliyah Negeri (MAN) 2 Bandung City students. Existing problems are then analysed to find out how to solve these problems and determine the scope of the problems to be studied. Learn the basic theory from various literature regarding the application of the Naïve Bayes and K-Nearest Neighbour methods, through journals to gain basic knowledge for conducting further research.
The systematic procedure used to collect data is by asking questions to respondents. Data collection was carried out using the media google form which was distributed to the research sample, namely MAN 2 Bandung City students. The data collected is student identity data and a scale of suitability of student behaviour with the questions that have been presented. The results of the questionnaire obtained will be used as material for analysing data on the Naïve Bayes and K-Nearest Neighbour methods. After the data has been collected, data analysis is carried out to adjust the data process to be processed using the Naïve Bayes and K-Nearest Neighbour methods.
Next is the implementation process using Microsoft Excel and Google Colab using Python programming language in the process of building the Naive Bayes and KNN models. This study applies a data mining process which consists of [25], [26]: 1. Data cleaning, is a cleaning or process of removing noise and inconsistent data. 2. Data integration, is a process in which data from various data sources are combined. This process is carried out when more than one data source is used. 3. Data selection, is a data selection process in which the data to be used in the data mining process is taken and leaves unused data. 4. Data transformation, is the process of converting data into a form that can be used in calculating an algorithm. 5. Mining is the process of finding patterns from datasets that are used as a knowledge base, in this case using the Naïve Bayes and K-Nearest Neighbor algorithms. 6. Pattern evaluation, is the process of analyzing the results of the mining process using a unit of measure. 7. Knowledge presentation, is a process to display the results of the mining process.

B. Naïve Bayes Algorithm
Naïve Bayes is an algorithm used for classification that uses Bayes' theorem and assumes that the values between variables are independent (independent) in an output value [27]- [30]. In other words, the presence or absence of a particular variable will not affect or be related to the presence or absence of other variables. This method which is also known as Naive Bayes Classifier implements a supervised object classification technique in the future by assigning class labels to instances/records using conditional probabilities. The term supervised refers to the classification of training data that has been labelled with a class. Equation (1)  describes Bayes' theorem. In simpler terms, Bayes' Theorem is a simple mathematical formula for finding probabilities when we know certain other probabilities.

C. K-Nearest Neigbor Algorithm
The K-Nearest Neighbor (K-NN) algorithm is a lazy learning technique. K-NN is done by looking for groups of objects in the training data that are closest (similar) to objects in the new data or testing data [31]- [33]. The K-NN algorithm is a method that uses a supervised algorithm. In general, how the KNN algorithm works is as follows [34]: 1. Determine the number of neighbours (K) that will be used for class determination considerations. 2. Calculate the distance from the new data to each data point in the dataset. 3. Take a number of K data with the shortest distance, then determine the class of the new data. To calculate the distance between two points in the KNN algorithm, the Euclidean Distance method is used which can be used in 1-dimensional space, 2-dimensional space, or multi-dimensional space. 1-dimensional space means that the distance calculation uses only one independent variable, 2dimensional-space means that there are two independent variables, and multi-dimensional space means that there are more than two variables. In general, the Euclidean distance formula in 1-dimensional space is found in equation (2). Formula (2) can be used if the number of independent variables is only one variable. If there is more than one, it becomes formula (3).

D. Confusion Matrix
Confusion Matrix represents predictions and actual (actual) conditions from data generated by Machine Learning algorithms. Based on the Confusion Matrix, Accuracy, Precision, Recall and Specificity can be determined [35]. Figure 1 describes the parameters that must be considered in the Confusion Matrix. Where, True Positives (TP) if the predicted positive value is true, which means that the true class value is "YES" and the predicted class value is also "YES". True Negatives (TN) if the predicted negative value is correctly, it means that the actual class value is "NO" and the predicted class value is also "NO". False Positives (FP) if the actual class is "NO" and the predicted class is "YES". Then, False Negatives (FN) if the actual class is "YES" but the predicted class is "NO".
After understanding these four parameters, Accuracy, Precision, Recall and F1-Score can be calculated. a. Accuracy is the ratio of True predictions (positive and negative) to all data. The formula calculation is available in formula (4). Precision is the ratio of the correct positive predictions compared to the overall positive predicted results. The formula calculation is provided in formula (5). Recall is the ratio of correctly positive predictions compared to all of the correctly positive data. The formula for Recall is available in formula (6). Then, F1 Score is a weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. The formula (7) shows the calculation of F1 Score.

A. Data Preperation Result
The initial step in the data mining process is to clean the data, in other words to ensure that the data obtained does not contain empty values or values that are inconsistent with the values they should be. The cleaning process is done manually on the 100 data you have. Of the 100 data owned, no data has null values or inconsistent values.
Data transformation is a process where data is converted into a form that can be processed by an algorithm. There are 3 steps that will be carried out in the data transformation process to classify using the Naïve Bayes algorithm, including the following: 1. Initializing data that is not yet in numerical form. In this process, Name data as identity is initialized to numbers, from numbers 1-100. In addition, the Gender data is also initialized, with the provision that the "Female" gender is given the initial 1 and the "Male" gender is given the initial 2. of all scale results on data  T1, T2, T3, T4, T5, PS1, PS2, PS3, PS4, PS5, C1, C2, C3, C4, C5, CT1, CT2, CT3, CT4, and CT5. This data is called "Average Scoring" data. 3. Performing discretization, or changing data in continuous form into discrete form. Data with an Average Scoring > 3.5, then the new data which is named "Output" is given the initial 1, which means "has influence". Whereas data with Average Scoring ≤ 3.5 will be given the initial 2, which means "has no influence".

B. Implementation of Naïve Bayes Algorithm
The next stage is the implementation stage of the method described earlier. First, the manual data cleaning process will be carried out, then selecting data, and converting data both into numeric form and into discrete form, so that it can be implemented into the computational process. This process can also be called the training process, where the Naïve Bayes model will be created using 100 datasets which will then be directly tested using the Confusion Matrix to obtain its performance. The result of this stage is a Naïve Bayes model that can already be used to classify and predict results, according to the purpose of this study.  Figure 1, the model with the Naïve Bayes algorithm which is fully listed on Google Colab provides predictive results that transformative education has an influence on students' soft skills, in this case teamwork soft skills, problem solving, communication skills, and critical thinking. Then the next test is carried out. Based on the test results of the model that has been created using the Confusion Matrix, the values shown in Figure 2 are obtained. It can be seen from Figure 2, the accuracy value obtained is 0.98, with a precision value for macro avg of 0.96 and a weighted avg of 0.98, a recall value for macro avg of 0.99 and a weighted avg of 0.98, and an f1-score value for macro avg of 0.97 and weighted avg of 0.98. This shows that the level of accuracy, precision, recall, and f1-score is high so that the system can be considered good.

C. Implementation of K-Nearest Neigbor Algorithm
This model also has the same data preparation as the model above, but the difference can be seen from the algorithm used. A total of 100 datasets will be used and tested using the Confusion Matrix. Of the 100 datasets, the data is divided into two types, namely 70% for Training Data and 30% for Test Data. The result of this stage is also a KNN model that can be used and a prediction result, according to the purpose of this research.   Figure 3, in contrast to the prediction results given by the Naive Bayes algorithm model in Figure 5, this model provides predictive results according to the row for all 30 Data Tests. In accordance with the data preparation that has been done, the initial 1 is the initial to give information "has influence", while the initial 2 means "has no influence". Then the performance test was then carried out using the Confusion Matrix, with the test results given in Figure 4.

The Influence of Transformational Education Prediction on Softskills of Madrasa Student using Data Mining
Based on Figure 4, the accuracy obtained is 76.67% with a precision value of 78.33% and a recall value of 96.67%. Based on the Confusion Matrix parameters, the model with the K-NN algorithm produces the following information: a. There are as many as 76.67% of students from all students who are correctly predicted to be affected by transformative education on their soft skills and are not affected. b. There were as many as 78.33% of students who were correctly predicted to be affected by transformative education on their soft skills of all students who were predicted to be affected. c. There are as many as 96.67% of students predicted to be affected by transformative education on the soft skills they have of all students who are correctly predicted to be affected.

D. Compaison Result between Naïve Bayes and K-Nearest
Neigbor Algorithm Based on the data that has been tested, the calculation results for accuracy, precision, and recall of each algorithm are obtained. The results of testing each algorithm can be seen in Table 1. It can be seen that the performance of the Naive Bayes method is better than the K-Nearest Neighbour (KNN) method. However, this is an invalid result because the amount of data tested for each method is not the same. Thus, research was conducted on other studies to compare the performance of these two methods with the same data. Based on this research, it was concluded that by using 30 test data for both methods, the resulting accuracy for the Naive Bayes algorithm is quite good, this is due to the advantages of the Naive Bayes method itself which is able to classify even though it has little training data for parameter estimation. Whereas the K-Nearest Neighbour method produces low accuracy, this is because the method is not effective if the training data is small [36].

V. CONCLUSION
In this study, a data mining method was applied in the form of a classification to predict the effect of transformative education on students' soft skills. There are two methods used, namely Naïve Bayes and K-Nearest Neighbor. As for the data used in testing the classification method comes from student data MAN 2 Bandung City. Based on the trial it can be concluded that the system is able to predict the effect of transformative education on soft skills with fairly high accuracy. The trials conducted showed that the Naïve Bayes method resulted in a higher curation compared to the K-Nearest Neighbor, which was 98%. For further research, you can develop datasets and use other machine learning algorithms.