Analysis of Performances of Pupils in the High School of Kimwenza from a Data Warehouse based on Data mining from 2015 to 2021

In this article, we analyze the performance of students in the eighth year of basic education before, during, and after COVID-19 at Kimwenza High School from 2015 to 2021. The machine learning model used is the algorithm classification of the decision tree with the ID3 algorithm . The confusion matrix helps us to predict in the short term these results in the future years.


Introduction
The sustainable development of a country necessarily passes through education. The Democratic Republic of Congo (DRC in acronym) is not spared. Therefore, there is a ministry dedicated to it. In this article, our efforts are directed towards the performance analysis of the pupils of the eighth year of basic education of the high school of Kimwenzabefore, during, and after COVID-19. High School of Kimwenza is a Catholic School of the Diocese of Kisantu in the DRCongo, located in the Commune of Mont-Ngafula.
The objective pursued is to assess the level of performance of students in the eighth year of basic education at the Lycée de Kimwenza and to predict them in the short term for the coming years.
The approached methodology consisted in arriving at the Kimwenza high school to collect information on the results obtained in the time interval as indicated above. This is for each student: his age, the method of payment of his school fees, the school year and the application (results) and the type of education given at the Lycée. We did not used the gender of thesepupilsbecause of the total absence of boys in this school.
The contribution of this article is not only to analyze the performance of pupils in the eighth year of basic education but also to predict it. Thus, we had also proposed a short-term prediction from the confusion matrix for the coming years.

Data mining objectives
The objectives of datamining can be grouped into three important areas − Discovery of hidden rules : discover associative rules, between different events. − Confirmation of hypotheses : confirm or refute the hypotheses proposed by analysts and decisionmakers, and provide them with a degree of confidence.

Data mining techniques
− Datamining refers to a set of techniques for exploring and analyzing, by automatic or semiautomatic means, a large mass of data with the aim of discovering hidden trends or significant rules (non-trivial, implicit and potentially useful) [1]. Datamining tools are generally based on techniques based on statistics, classification or the extraction of associative rules.

Type of Datamining Techniques
There are two main types of approaches: predictive techniques and descriptive techniques.

Predictive or supervised techniques
They aim to extrapolate new information from present information and they explain the data. They are used in two types of problems [2]:

Descriptive or unsupervised techniques
They aim to highlight information present but hidden by the volume of data and they reduce, summarize, synthesize the data, where there is no target variable. They are used in: − Factor analysis (ACP, ACM, etc.); − Segmentation or Clustering (K-Means , Dynamic Clusters, CAH, etc.); She is looking for association.

Classification by decision tree
We are given a set X of n unlabeled elements x i whose P attributes are quantitative or qualitative. Each set Y is labelled, i.e. it is associated with a "class" or a "target attribute" that we note y ∈ Y [2].
From these examples, we construct a so-called "decision" tree such as: − Each node corresponds `to a test on the value of one or more attributes; − Each branch starting from a node corresponds to one or more values of the test carried out; − each leaf is associated with a value of the target attribute.
There are several algorithms for building decision trees, namely [3]:

Presentation of an algorithm of ID3[ 4]
Algorithm: ID3 Algorithm

. Presentation of ETL with the PowerBI(Business Intelligence tool)
At this stage, we are preparing the data found in our Excel data source collected at the Lycée de Kimwenza . This data is extracted using Microsoft PowerBI. After the extraction phase, we move on to data transformation in order to homogenize them before loading them into the cube. We have extracted, transformed and loaded all the dimensions and that of the fact table as shown in the figure below: [6] Figure1: Overview of ETL/PowerBI Tool

Overview of Data Warehouse Structure
The structure of our data warehouse is represented by a star schema of which we have four tables of dimensions with a fact. In 2015-2016, the performance of eighth grade students in all 2 nd secondary classes at the Lycée de kimwenza was as follows: 143 students obtained 7705.9, which makes an average of 57.5%, by student for all the second classes of the Orientation Cycle (II ieme A, II iem B, II iem C) In 2016-2017, 122 students for a total of 7279.2. which makes an average of 59.6% per pupil. There is a slight increase in performance for all students in these classes.
The year 2017-2018 marks the beginning of teaching by situation in Basic Education. With a staff of 123 students for all classes of Eighth (that is to say the classes of second CO changes the name become 8 th )and the marks achieved are up to 7002.3for the whole i.e. an average of 56.9% per pupil. Here, the curve drops. This can be justified by the lack of mastery of the approach by situation by the learners and also by the facilitators without forgetting the lack of adequate tools.
In 2018-2019, the 144 students enrolled overall represent 6644, which represents an average of 46.1%. A remarkable drop compared to previous years. This fuck can be explained by the entry of free education into the basic education system. Facilitators once again lose their bonuses and are unmotivated. The 2019-2020 school year is a year where COVID-19 is present in the country and does not spare the education sector. 119 students are enrolled and realize overall 6854or an average of 57.5% per student. A new hope is reborn despite free education and covid 19. We also think with regard to many of the days of the courses actually attended by the students, the teachers did not have many subjects to objectively evaluate the students. The effectiveness of the situational approach is beginning to show its effects.
The 2020-2021 school year, 131 registered to obtain overall 7232.2, an average of 55.2%. In this period, the second wave of covid 19 is raging, there is also free education and the situation-based approach. The performances of last years are not acquired for this year because its performances do not reflect the objective. , the year of the outbreak of covid 19, at the top of the ranking. Fewer school days, frustrated students, not many subjects to assess, and teachers respecting society's motto of "save the year". From our point of view, despite the teaching by situation, this year was not well evaluated.

Details on performances achieved by class
2019-2020 comes in second position, it receives all the achievements of 2018-2019 and also free education, a slight decrease is recorded. In the same way, for 2020-2021 which holds the tail, with a dizzying drop. The awareness of teachers is gradually being reborn. It must be said here that teaching by situation is not yet anchored in the eighth class of the Lycée de Kimwenza According to class: In view of the above, it should be said that class performance varies from class to class and from year to year. This can be justified by the selection, the families of the learners, and especially by the organization of the class as a whole. The results obtained from 2018-2019 and 2019-2020 are overwhelming and astonishing, eighth graders work well during times of distress or it is simply the slogan mentioned above to save the school year.The method of teaching by situation has not yet shown its effects. The prediction decision tree indicates that in six years, the excellent and very good students will represent a proportion higher than 59.95% while the weak, a proportion lower than 59.95. Volume 5, Issue 3, May-June 2023 26

Presentation of performance measures a) Confusion matrix
supervised learning terminology , is a tool used to measure the quality of a classification system. Each column of the matrix represents the number of occurrences of an estimated class, while each row represents the number of occurrences of an actual (or reference) class. [5] Finally, the confusion matrix below represents the occurrence of the global way i.e. we have presented a matrix with the Test data set.  The table of indicators above shows that the prediction error rate is 33.33%; the sensitivity measures the ability of a test to produce a positive or negative result and in our table the sensitivity is 100%, that is to say that the test ability of our prediction is normal; the accuracy of our prediction is 66.43%.

c) Confusion Matrix (With 2020-2021 School Year Test data)
In this confusion matrix, we have predicted the application performance that will be achieved in the 2021-2022 school year and the details are presented in the indicator table below.  The table of indicators above represents the prediction error rate is 45%; the sensitivity is 100%; the accuracy of our prediction is 63.2%, i.e. the 2021-2022 school year will be a year during which students will achieve an increasing application performance compared to the 2020-2021 year.

Conclusion and Perspectives
The work presented in this article consists of the analysis of pupil's application performance from a data warehouse based on datamining. The multidimensional tools allowed us to analyze the performance of the "application" variable from the 2015-2016 school year to the 2020-2021 school year. We used Power BI for data analysis and SIPINA which is a learning tool for application performance prediction. This very important work can allow the decision-makers of this school and the authorities to make decisions on the quality and performance of education in our country, particularly in the city of Kinshasa. The results obtained at the Lycée de Kimwenza can also be obtained in other eighth grade classes in the DRC.
As a perspective, this work opens the door to the world of research regarding the analysis of teaching performance in the DRCongo. For future research, multidimensional tools and the technique of datamining (decision tree) for data analysis will be used.