Towards Safer Roads: A Deep Learning Approach to Driver Distraction Detection in Four-Wheeler Cars

The fact that there are more than 50 million cars sold annually and more than 1.3 million fatal motor vehicle accidents each year indicates the urgent need for stronger road safety regulations. Driver behaviour needs to be addressed, especially in emerging nations like India, which is responsible for 11% of all road fatalities worldwide. Diversion has been identified as the leading cause of 78% of accidents involving drivers. Distractions can take many different forms, from using a phone to interacting with others, and they greatly hinder road safety. This work aims to address this important problem by creating a highly effective machine learning (ML) model that employs computer vision techniques to classify various driver distractions in real-time. Utilising cutting-edge models, such as an ensemble of CNNs and convolutional neural networks (CNNs) like ResNet50. Our goal is to effectively identify and categorise distractions through the use of deep learning and picture recognition, allowing for proactive intervention to avert mishaps. Beyond classification accuracy, this study evaluates the model’s overall speed and scalability, which are critical for deployment on edge devices. We assess the practical practicality and wide-spread adoption of our approach by analysing performance parameters like inference time and resource utilisation.


I. INTRODUCTION
The issue at hand is related to the automotive industry, where worrying data showing more than 50 million cars sold yearly and more than 1.3 million fatalities from motor vehicle accidents highlight how urgent it is to solve issues related to road safety [6].Notably, 11% of road accident fatalities worldwide occur in India, underscoring the seriousness of the issue.The financial cost is high; in FY 18-19, vehicle insurance claims totaled Rs. 58,456.932crores, or a sizeable percentage of the nation's GDP yearly.
A startling 78% of accidents are deemed to be the fault of the driver, and human behaviour is the main cause of road safety problems, especially in developing nations [7].Notwith-standing these obstacles, the automotive sector is currently observing a paradigm change in favour of technologybased IJFMR240319876 Volume 6, Issue 3, May-June 2024 solutions, most notably with the introduction of fully auto-mated and networked automobiles.This technical development offers a special chance to successfully solve issues related to road safety [8].
Technology advancements are pushing the automotive in-dustry towards safer roads, but it's also critical to recognise the socioeconomic implications of road safety.In addition to the horrific death toll, automobile crashes entail serious consequences like significant financial losses and strain on the healthcare system.
In addition to the direct casualties, road accidents can have an impact on the victims' families and communities.Many of these occurrences result in long-term disabilities, which diminish quality of life and increase dependency on social support systems.Moreover, there are serious financial ramifications because lost productivity, medical expenses, and rehabilitation costs can take a sizable portion of national budgets.
Addressing these difficulties becomes essential for sus-tainable growth in rising economies like India, where rapid urbanisation and infrastructural challenges increase road safety concerns.It is the responsibility of civil society, business, and government organisations to work together to develop creative solutions that reduce hazards and encourage safe driving practices.
Therefore, incorporating AI-driven technology is a workable solution for proactive risk management and accident preven-tion.By utilising real-time data analytics and prediction algo-rithms, stakeholders may efficiently identify high-risk areas, implement targeted interventions, and promote safer driving practises across a broad population.The use of technology is supplemented by initiatives aimed at fostering a culture of road safety through campaigns for awareness and education.Encouraging behavioural changes, such as putting down gadgets and following traffic laws, ne-cessitates a multifaceted approach that combines technological innovation, community involvement, and policy enforcement.In other words, resolving the issues related to, solving the problems associated with road safety necessitates a comprehensive strategy that goes beyond technology advancement to include social, economic, and regulatory aspects.We may aim to construct roads that are not only effective and technologi-cally sophisticated, but also safe and inclusive for all users by combining the efforts of stakeholders from all sectors.
By utilising artificial intelligence (AI), a real-time alert system that serves as a reminder to drivers to maintain concentration can be created, hence lowering the likelihood of accidents and minimising the loss of life and property.Distracted driving, which can result from using a phone, drinking, and interacting with others, is a major cause of accidents.Our proposal to address this problem is to create a real-time distraction detection system that can notify drivers in real-time to avoid unfavourable consequences [9].In order to evaluate data and give insights asynchronously, our method involves installing an edge device configuration inside the car, enabling rapid alarms and interactions over IoT.This project's main goal is to create a highly effective machine learning (ML) model that uses computer vision techniques to categorise different types of driver distractions during runtime.Furthermore, The scalability and speed of the model to ensure a seamless integration into edge device settings.In measuring driver distraction, various approaches are employed, focusing on visual, manual, or cognitive aspects.In the manual and cognitive domains, researchers have explored methodologies such as monitoring lane maintenance, speed performance, and the duration of lane departures to infer the driver's state [13].Castignani et al. classified driving events as risky or not by analyzing acceleration, braking, and steering activities through the SenseFleet system [16].Pavlidis et al. conducted statistical analyses to understand the relationship between driver distractions and various driving parameters [17].A forward collision warning algorithm based on the driver's braking activity was proposed by Wang et al. [18].Vi-sual assessments, such as head position, facial expressions, pupil diameter, and eye gazing, Visual measurements, including eye gaze, pupil diameter, head pose, facial expressions, and driving posture, offer rich data for detecting driver distractions.These visual cues have been leveraged in various methodologies, including mathematical models, rule-based models, and models based on machine learning (ML) algorithms [19].Among ML-based approaches, Baheti et al. developed a Convolutional Neural Network (CNN) system specifically tailored for detecting different driver actions, showcasing the potential of deep learning in driver distraction detection [6][7].In another study, Huang et al. proposed a CNN model aug-mented with cooperative pre-trained models such as ResNet, Inception, and Xception, along with a novel dropout layer to mitigate overfitting.Their approach achieved impressive classification accuracy on the AUC dataset, outperforming other CNN classifiers [26].
Furthermore, Jamsheed V. et al. introduced a novel structure incorporating vanilla CNNs, vanilla augmented CNNs, and deeper CNNs based on transfer learning for driver distraction detection.Their evaluation on the AUC dataset demonstrated significant accuracy improvements, especially when leveraging transfer learning techniques [28].
These studies collectively underscore the importance of leveraging advanced computational techniques, particularly deep learning and transfer learning, to enhance the accuracy and robustness of driver distraction detection systems, con-tributing to the ongoing efforts to improve road safety.
Expanding on the existing literature, recent advancements in driver distraction detection have seen an increasing focus on leveraging multimodal data fusion techniques.Researchers have recognized the potential of combining information from various sensors, including cameras, accelerometers, gyroscopes, and biometric sensors, to capture a comprehensive understanding of driver behavior [20].
Furthermore, advancements in deep learning architectures have paved the way for more sophisticated feature extraction and representation learning.Models such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks have shown promise in capturing temporal depen-dencies in driver behavior data, enabling more accurate and robust detection of distractions over time [22].
Moreover, the integration of attention mechanisms within deep learning architectures has emerged as a promising avenue for enhancing the interpretability and performance of driver distraction detection systems.Attention mechanisms enable models to dynamically weigh the importance of different input features, allowing for more nuanced and context-aware analysis of driver behavior [25].
Additionally, research efforts have increasingly focused on addressing the challenges of dataset imbalance and domain adaptation in driver distraction detection.Techniques such as data augmentation, transfer learning, and domain adaptation algorithms have been explored to improve model generalization and performance across diverse driving environments and conditions [26].
Furthermore, the advent of edge computing technologies has enabled real-time processing of sensor data directly within the vehicle, reducing latency and facilitating faster response times for driver distraction detection systems.Edge-based solutions offer the potential for more efficient and scalable deployment of intelligent transportation systems, ultimately contributing to enhanced road safety [28].
In summary, recent developments in driver distraction de-tection have witnessed significant progress across various fronts, including multimodal data fusion, deep learning ar-chitectures, attention mechanisms, dataset imbalance, domain adaptation, and edge computing technologies.By leveraging these advancements, researchers aim to develop more accurate, robust, and real-time systems capable of mitigating the risks associated with distracted driving and improving overall road safety.III.DATASET DESCRIPTION We have selected the StateFarm distracted driver detection dataset for our capstone project.The dataset was released for a 2016 competition on Kaggle [5].This is the most widely used dataset for identifying driver distraction, having been utilised in many studies.The StateFarm dataset consists of the following ten classes: The photos have had their metadata, including creation dates, erased.As a result of State Farm's careful setup of these tests-a truck towing the car through the streets-these "drivers" weren't actually operating a vehicle.A driver can only appear on one of the two test or train sets since the drivers' train and test data are divided among them.Only left-hand drive cars are included in the photo collection.There are around 2300 photographs in each class; the distribution of images by class is provided below.

A. Visualizing and Preparing Data
We concentrate on visualising and getting the dataset ready for training our deep learning models during this stage of our technological approach.The State Farm Distracted Driver Detection dataset, which consists of photos divided into ten classes that represent various driving behaviours, was used in this investigation.
1) Data Visualization: Investigating the class distribution within the dataset is where we begin.We can learn more about the diversity and balance of behaviours included in the dataset by enumerating the classes and the labels that go with them.By visualising the distribution of classes, we may get a general idea of the dataset's makeup using visualisation techniques like Seaborn's countplot.
2) Data Preparation: After that, we divide the dataset into training and testing subsets and apply data transformations to get it ready for training.To enhance the dataset and boost model generalisation, data augmentation techniques are used, such as shrinking photos to a standard size of 400x400 pixels and performing random rotations.Next, using the random split approach, the dataset is divided into training and testing subsets while maintaining the same class distribution in both subsets.
3) Data Loading: Our deep learning models require rapid training and assessment, thus we use PyTorch's DataLoader module to generate data loaders.Batch-wise loading of photos during training and testing is made possible by data loaders, which increases memory usage and computational efficiency.To guarantee randomness and avoid model overfitting, training and testing data loaders are set up with the proper batch sizes and shuffle parameters.
4) Data Visualisation (Sample): We provide an example image (Fig. 2) from the dataset along with the label for it as a visual validation step.In addition to ensuring that the preprocessing and data loading processes have been carried out appropriately, this stage offers a qualitative insight of the dataset's contents.z(l) = z(l) − μ/ / √σ2 + ε where µ and σ 2 are the mean and variance of z (l) respectively, and ϵ is a small constant to avoid division by zero.

Residual Connection:
The output of the residual block is computed as the sum of the input x (l) and the transformation F (x (l) ): x (l+1) = x (l) + F (x (l) ) where f(x) represents the transformation performed by the residual blocks, and x is the input to the block.This equation signifies the main idea behind residual learning, where the identity mapping (represented by x) is added to the learned features f(x).
The goal is to learn the residual f(x) to make the optimization process easier.For the purpose of producing predictions for the ten driver behaviour classes, the last fully linked layer is altered.Stochastic gradient descent with momentum is used to op-timise the model.The model's performance on the validation set is used by a learning rate scheduler to dynamically modify the learning rate.fig. 4 shows the architecture of the RasNet50 model.Table 2 provides a summary of the achieved recall, preci-sion, and F1-score for each class in the SFDDD.Notably, the ResNet50 model exhibits exceptional performance across all classes, with precision ranging from 98.74% to 100%, recall ranging from 98.71% to 100%, and F1-score ranging from 99.65% to 100%.These results underscore the effectiveness of the ResNet50 model in accurately detecting various distracted driving behaviors.This matrix shows the true positives, true negatives, false positives, and false neg-atives for each class of distracted driving behaviour, providing a visual representation of the classification results.Examining the confusion matrix's diagonal elements, we find that the model successfully recognises the majority of examples in each class, producing high recall and precision scores, as shown in Table 2.Moreover, the off-diagonal components draw attention to regions where the model might mistakenly label particular behaviours on occasion, providing insightful information that can be used to improve the model's accuracy and performance.All things considered, the confusion matrix offers a thorough summary of the model's classification perfor-mance, supporting the previously described metrics of preci-sion, recall, and F1-score and demonstrating the effectiveness of the ResNet50 model in tackling the problems associated with driver distracted detection.In addition to achieving a high accuracy rate of 99% on the test dataset, the research findings underscore the robustness and reliability of the ResNet50 model in detecting distracted driving behaviors.The thorough exploration of the model's architecture and training process has provided valuable in-sights into its capabilities and limitations, paving the way for further refinement and optimization.Moreover, the successful application of deep learning techniques and GPU acceleration highlights the potential of advanced technologies in addressing complex real-world challenges.
Looking ahead, future research endeavors could focus on enhancing the model's performance through fine-tuning strate-gies, such as data augmentation and regularization techniques.Additionally, the exploration of ensemble methods and transfer learning approaches could offer opportunities to leverage the knowledge gained from related tasks and domains, thereby improving the model's generalization ability.Furthermore, the integration of attention mechanisms and real-time deployment considerations could enhance the model's responsiveness and applicability in dynamic driving environments.
By advancing the field of computer vision in the context of road safety, this research contributes to the ongoing efforts aimed at reducing the incidence of distracted driving-related accidents and promoting safer driving behaviors.The insights gained from this study serve as a foundation for future research endeavors and practical applications, ultimately benefitting society as a whole.between eco-nomic development and road traffic injuries and fatalities in Thailand, employing spatial panel data analysis.
[7] Apostoloff, N., & Zelinsky, A. (2004).Vision in and out of vehicles: Integrated driver and road scene monitoring.Apostoloff and Zelinsky discussed an integrated system for monitoring both drivers and the road scene, utilizing vision technologies.
[9] Saito, Y., Itoh, M., & Inagaki, T. (2016).Driver assistance system with a dual control scheme: Effectiveness of identifying driver drowsiness and preventing lane departure accidents.Saito et al. developed a driver assistance system incorporating a dual control scheme to identify driver drowsiness and prevent lane departure accidents.

Fig. 9 .
Fig. 9. Result Generated By RasnNet50 Model On Test Image

TABLE II ACHIEVED
RECALL, PRECISION, AND F1-SCORE FOR EACH CLASS

TABLE IV RESNET50
MODEL PROPERTIES AND ACCURACY