Video Based Fire and Smoke Detection Using Deep Learning

The objective of this work is to design models for computer vision-based fire and smoke detection. CCTV cameras are installed at major public locations, homes, and buildings in every city. These cameras are often used for surveillance purposes and used mostly for forensic purposes. As a side advantage, these cameras can also be used for detection of fire and smoke in the camera field of view. IoT based sensors are the traditional way of detecting fire and smoke. Additionally, video streams from the camera can also be processed to detect fire and smoke events. This will enable additional security in cases when IoT sensors fail to detect a real fire and smoke event. It is proposed to use the advancements of AI/ML for computer vision-based fire and smoke detection.


Introduction
Fire is one of the major disasters causing loss to life and properties around the globe.Therefore, it is crucial to develop a robust and reliable system for early detection of fire.Detecting fire at early stages will increase the chances of survival and reduce the loss to property.A report from the agency National Fire Protection Association (NFPA) shows that in United States of America (USA), the average number of fires per year are about 1.5 million approximately, with a high cost in terms of lives (with more than 3000 civilian fire deaths) and losses in economy-the cost for fire losses is estimated to be about 55 billion US Dollars (USD) per year REF .Hence, with the development of the Internet of Things (IoT) domain along with more recent Artificial Intelligence (AI) and the increasing need of safety in public places, an early fire-smoke detection system needs to be implemented for the benefit of all citizens.Under this work, a video or Visual based approach for the recognition of fire smoke is a promising method.In fact, the video has a wide view of the area under surveillance and often Closed-Circuit Tele Vision (CCTV) systems for surveillance purpose are already installed in buildings, in public places in various cities, and sometimes onboard passenger vehicles of public transport systems.Exploiting an already existing video infrastructure allows the reduction of purchase/installation costs of additional addon products, increasing only the complexity of the algorithm used to detect smoke and fire.Toreyin et al. (2006) [16] proposed a novel method to detect fire and/or flames in real-time through processing of the video frames data generated by an ordinary CCTV camera monitoring a scene.Along with ordinary motion and color clues, flames and fire flicker were detected by analyzing the video input in the wavelet domain.By using temporal wavelet transform periodic behavior in flame boundaries was detected.Color variations in flame regions were detected by computing the spatial wavelet transformation of moving fire-colored regions.Another trick which was used in the fire detection algorithm was the non-regularity of the boundary of the fire-colored region.All the above-mentioned clues were combined to reach a final decision.Experimental results showed that the proposed method was successful in detecting fire and/or flames.In addition, the method heavily reduced the false alarms on ordinary fire-colored moving objects in comparison to the methods using only motion and color clues.Venugopal (2012) [17] proposed novel approach for forest fire detection using the image processing technique is proposed.A rule-based color model for fire pixel classification was used.The proposed algorithm used RGB and Y C b C r color space.One of the advantages of using Y C b C r color space is that it can separate the chrominance and luminance more effectively than RGB color space.The performance of the proposed algorithm was tested on two sets of images, one set contained fire and the other contained fire-like regions (but no fire).Standard methods were used for calculating the performance of the algorithm.As a result, the proposed method had both a higher detection rate and a lower false alarm rate.Since the algorithm was light in computation, in addition to city applications, it could also be used for real-time forest fire detection.Kim and Lee (2019) [14] proposed a deep learning-based fire detection method using a video sequence, which imitates the human way of fire detection process.The proposed method used Faster Region-based Convolutional Neural Network (R-CNN) to detect the suspected regions of fire (SRoFs) and of non-fire based on their spatial features.Then, the features which are summarized within the bounding boxes in successive frames were accumulated by Long Short-Term Memory (LSTM) to classify whether there is a fire or not in a short-term period.The decisions for successive short-term periods were then combined in the majority voting for the final decision for the long-term period.In addition, the areas of both flame and smoke are calculated and their temporal changes were reported to interpret the dynamic fire behavior with the final fire decision.The experiments proved that that the proposed long-term videobased method may successfully improve the fire detection accuracy compared with the still image-based or short-term video-based method by reducing both the false detections and the misdetections.Wang et al. (2022) [18] addressed the problem of limited number and modality of fire detection training datasets.They constructed a 100,000-level Flame and Smoke Detection Dataset (FASDD) based on multi-source heterogeneous flame and smoke images.According to them, FASDD was currently the most versatile and comprehensive dataset for fire detection.It provided a challenging benchmark to drive the continuous evolution of fire detection models.Additionally, they formulated a unified workflow for preprocessing, annotation and quality control of fire samples.Their extensive performance evaluations based on classical methods showed that most of the models trained on FASDD can achieve satisfactory fire detection results, and especially YOLOv5x achieves nearly 80% mAP@0.5 accuracy on heterogeneous images spanning two domains of computer vision and remote sensing.And the application in wildfire location demonstrated that deep learning models trained on their dataset could be used in recognizing and monitoring forest fires.They proposed that their solution can be deployed simultaneously on watchtowers, drones and optical satellites to build a satellite-ground cooperative observation network, which can provide an important reference for large-scale fire suppression, victim escape, firefighter rescue and government decision-making.Toreyin et al. (2008) [19], Aleksic (2004) [1], triggers an alarm within several minutes when combustion operation is producing flames and increasing the temperature of environment around.In Amer and Daoud (2007) [3], Amer and Daoud (2005)[2] a photo electric smoke detector with a real smoke chamber was combined with smoke temperature.Current standards such as EN50155 for onboard train safety recommend setting a delay of 60 seconds between the start of fire and its detection.Also, commercial solution uses point-based optical and temperature smoke detectors as per Corp (2020a) [5] for onboard train antifire systems.These sensors aim at alarming the presence of a fire with a time delay within 60 seconds.On the other hand, the system in Corp (2020a) [5] works on the principle that hot air produced by a fire moves from the bottom to the top of the train coach, and hence the smoke moves upwards toward the sensor placed on the roof of the train coach.

Smoke Detection
Gagliardi and Saponara (2020a) [10] proposed Advised, a novel video smoke detection algorithm for antifire surveillance systems, for detection in both outdoor and indoor application scenarios.To optimize installation costs, they considered a fixed single camera, working in the visible spectral range, already installed in a close circuit television system for surveillance purposes.They adopted multiple techniques which are Kalman based motion detection technique, image segmentation, color analysis, blob labeling, time/edge-based blob analysis, geometrical features analysis, and M of N decisor function, based on which the system was able to generate alarm signals with improvements in estimation performance compared with the state-of-art techniques.
In Corp (2020b) [6], fire flames were identified by measuring absolute temperature and its own gradient through an intelligent Infra-Red (IR) sensor.But the detection systems presented have the drawback of reacting slowly.Moreover, the approach needs active fans or air conditioning to speed up the process of smoke/fire detection, hence avoiding a very high measuring latency.The purpose of this work is to develop an innovative video-based smoke detection technique able to trigger the alarm within a few seconds.Such an algorithm has been already implemented into several IoT embedded devices to develop a distributed antifire system accessible via web browser and able to signal a fire alarm from different camera nodes, discussed in Gagliardi and Saponara (2020b) [11].With this paper, the authors intend to extend that discussion by focusing on the smoke detection algorithm aspects.[9] presented a hybrid approach to use the fast and accurate identification of smoke in a video input.The algorithm combined a traditional feature detector based on Kalman filtering and motion detection and used a lightweight short convolutional neural network.The technique allowed the automatic selection of specific regions of interest within the image by the generation of bounding boxes for black and white colored moving objects.In the final steps, the convolutional neural network verified verification of the true presence of smoke in the proposed regions of interest.The algorithm also provided an alarm generator that can trigger an alarm signal if the smoke is persistent in a time window of 3000 ms (about 3 seconds).The proposed technique was compared to the state-of-the-art methods available in literature by using several videos of public and nonpublic dataset showing an improvement in the metrics.Finally, the authors developed a portable solution in embedded systems domain and evaluated its performance for the Raspberry Pi 3 embedded board as well as Nvidia Jetson Nano board.[20] proposed a video based spatial-temporal convolutional neural network for fire smoke recognition.The model works by concatenating the motion features and appearance features, and later by a convolution layer to implement fusion of spatial and temporal feature.To reduce the influence of background of no-smoke, the authors used an attention module to capture salience features from the input image.Experiments on their own self-created dataset showed that the presented method is valid, which achieved a detection rate of 97.5% and accuracy rate of 96.8%.

Proposed Methodology 3.1 Data Collection
Fire Data Collection: Images of Fire have been downloaded from the Internet.These images were collected from Google Images search engine.The images consist of various fire incidents and accidents that happened all over the world.Following are the types of collected fire scenarios: 1. Indoor Fire

•
YOLO for Object Detection: YOLO stands for "You Only Look Once," and is a popular object detection algorithm.It is known for its real-time processing speed and higher accuracy.YOLO works by solving the object detection problem as a regression problem and performs the detection of multiple objects within an image.The YOLO approach divides the input image into a grid of cells and predicts bounding boxes and respective class probabilities for each cell.Each of the grid cells is responsible for predicting a fixed number of bounding boxes, regardless of the number of objects present in that cell.The first 20 convolution layers of the model are pretrained using ImageNet by putting in a temporary average pooling layer and also fully connected layer.After that, this pretrained model is converted to perform detection on the basis of previous research that showcased that adding convolution and connected layers on a pretrained network improves performance.YOLO's fully connected final layer predicts both class probabilities and bounding box coordinates.
• YOLO v5: YOLO v5 was introduced in 2020 by team Ultralytics.YOLO v5 builds upon the success of previous versions and adds several new features and improvements.Unlike the original YOLO, YOLO v5 uses a more complex architecture called Efficient-Det (architecture shown in Figure 3.2.5),based on the EfficientNet network architecture.Using a more complex architecture in YOLO v5 allows it to achieve higher accuracy and better generalization to a wider range of object categories.YOLOV1 was trained on the PASCAL VOC dataset, which consisted of 20 object categories.But YOLO V5 was trained on a larger and more diverse dataset called D5, which includes a total of 600 object categories.Additionally, YOLO v5 uses a new method for generating anchor boxes for detection, called "dynamic anchor boxes."It involves using a unique clustering algorithm to group the ground truth bounding boxes into clusters and subsequently using the centroids of the clusters as the anchor boxes.This allowed the anchor boxes to be more closely align with the detected objects' sizes and shapes.Moreover, YOLO V5 also introduced the concept of "spatial pyramid pooling"(SPP), a type of pooling layer used to reduce the spatial resolution of the feature maps.SPP is used to improve the detection performance on small objects, which was a problem in older YOLO versions, as it allows the model to see the objects at multiple scales.The older YOLO V4 also uses SPP, but YOLO V5 included several improvements to the SPP architecture that allowed it to achieve overall better results.[8] proposed a method which had been tested on a very large dataset of fire videos acquired both in real environments and from the web.The dataset is composed by two main parts: the first 14 videos characterized by the presence of fire and the last 17 videos which do not contain fires; in particular, the second part is characterized by objects or situations which can be wrongly classified as containing fire: a scene containing red objects may be misclassified by color based approaches, while a mountain with smoke, fog or clouds may be misclassified by motion based approaches.Such composition allows us to stress the system and to test it in several conditions which may happen in real environments.Chino et al. (2015) [4] consists of 226 images with various resolutions.Dataset was divided into two categories: 119 images containing fire, and 107 images without fire.The fire images consist of emergency situations with different fire incidents, such as buildings on fire, industrial fire, car accidents, and riots.These images were manually cropped by human experts.The remaining images consist of emergency situations with no visible fire and images with fire-like regions, such as sunsets, and red or yellow objects.Jadon et al. (2019) [12] To evaluate the efficiency of the proposed approach, we tested it also with Datasetv2 Jadon et al. (2019).The test bench consisted of 160 non-fire images, 46 fire videos, and 16 non-fire videos.The dataset is limited, but it is challenging, e.g., it consists of videos for no-fire/smoke, • Email: editor@ijfmr.com

EXPERIMENTAL ANALYSIS
Considering the success of YOLO in object detection, we tried to use YOLO model and trained for Fire and an objection system.We used the base implementation of YOLOv5 Jocher (2020) [13] model.We re -used the implementation from https://github.com/ultralytics/yolov5for training a testing the fire and smoke detection.The above link provides many variants of YOLO models.Figure 6.1 shows variants of YOLOV5 models along with their size, inference time and accuracies.

Training
For the purpose for Batch enabled training on Dataset, we used batch size(images) of 16 due to limitation in resources.Image size was rescaled/stretched to 640 pixels resolution.We used SGD as optimizer for training.We choose the initial learning rate(lr) as 10 -2 and stop whenever rate(lr) reaches 10 -5.We used patience value as 100 and Weight decay by 0.1.We trained for 300 epochs.A total of 1500 images of fire and smoke with Average 10 bounding boxes per image (both fire/smoke) were used for training.We re-used models of YOLOv5s and YOLOv5m with pretrained on COCO128 dataset.We used PyTorch framework for model training and testing.As said earlier, we used pretrained weights from YOLOv5s and YOLOv5m models and further fine-tuned and adapted weights for fire and smoke detection purposes.We used the above listed dataset.yaml as input labels for YOLO formatted dataset.
For Training purposes, we used NVIDIA GTX 1080 GPU for acceleration in training of YOLO Deep learning model.

NETWORK ARCHITECTURES & EXPERIMENTAL RESULTS
We have re-used pretrained models of YOLOv5m and YOLOv5s and further trained these models for our fire and smoke detection problem, as per Transfer learning methodology.As per domain adaptation, the Pretrained models are well suited for similar domain and can be adopted to newer domain by re training of limited layers.For testing purpose, we have used FireNetV2 Jadon et al. (2019) [12] as Fire test set for evaluation of accuracies which are displayed in Table 7 For testing purpose for Smoke Detection, we have used Fire-Flame-Dataset Olayemi Abimbola (2019) [15] as Fire, smoke and Neutral test set for evaluation of accuracies using detection threshold as 0.25 which are displayed in Additionally, we have calculated detailed metrics in the form of Accuracies, Precision, Recall and F1-Score on Fire-Flame-Dataset Olayemi Abimbola (2019) [15] for both Fire and smoke using model "yolov5m2" with revised detection threshold as 0.3, as displayed in

Detection Results on Videos
We finally used the model for Detection on videos.As it is well known that Video is just a collection of images called as frames in video domain terminology.We tested on few selected videos and achieved detection results as shown on

CONCLUSIONS
We have developed and trained advanced YOLO based DL models for the purpose of Smoke and fire detection.These models have been trained on suitable datasets and metrics are calculated.The data preprocessing is also used to improve the overall accuracy of the models.

FUTURE ENHANCEMENTS
The following activities will be part of future scope of work • To further train deep learning models for fire and smoke detection and get improved accuracies • To achieve more than 95% accuracy and accurate precision and recall values for the models on firenet dataset.

•
To deploy the models on edge devices for example: Raspberry PI and NVIDIA Jetson Nano.
2. Outdoor Fire Indoor Fire Outdoor Fire Smoke Data Collection: Images of Fire and Smoke are also downloaded from the Internet.These images were collected from Google Images search engine and mostly part of any fire images.The smoke scenarios are mostly outdoor and are majorly of two colors: 1. White Smoke 2. Black Smoke White Smoke White Smoke Black Smoke Figure 3.1.2:Sample Smoke images 3.2 Training Deep learning models for Fire and Smoke • Deep learning-based Approach: We have used the Deep learning-based approach for fire and smoke detection through Video/Image input as it has following advantages: 1.It does not require any additional hardware sensor.2. RAW data in the form of Images of fire & smoke are considered.No feature engineering is required.3. It may reuse the existing video feeds available through CCTV cameras 4. It can work for longer distances for fire and smoke detection.5. Video data from CCTV cameras can be effectively utilized For example, if there are five bounding box predictions per cell, YOLO will predict five potential objects.The bounding box prediction consists of four parameters: the x and y coordinates of the center of the box, the width, and the height of the box.Technically, these values are predicted relative to the size of the grid cell.The class probabilities indicate the likelihood of each object class being present in the box.During training, YOLO uses a labeled dataset to learn the prediction of the bounding boxes and class probabilities.It works by optimizing the model by minimizing the sum of the squared differences between the predicted and ground-truth bounding box coordinates and class probabilities.Specifically, in terms of architecture, YOLO typically uses a deep Convolutional Neural Network (CNN) as its backbone.This CNN processes the entire image to extract relevant features.The feature maps are further fed into additional layers to evaluate the final predictions for each grid cell.One of the advantages of YOLO algorithms is its speed.As it processes the entire image in one pass, it can achieve real-time object detection on a standard Graphical Processing Unit (GPU).But this speed brings a limitation as smaller objects become harder to detect accurately.Over the years, YOLO has undergone several improvements in architecture, resulting in different versions such as YOLOv2, YOLOv3, and YOLOv4 and the latest being YOLOv8.These versions introduced various enhancements to improve detection accuracy, handle different object scales, and address other limitations present in the original YOLO algorithm.

𝐹1𝑆𝑐𝑜𝑟𝑒 = 2 × 6 )
Pr   ×  Pr   +  (5.4) ROC AUC captures the area under the curve and compares the relation with the True Positive Rate (TPR) and False Positive Rate (FPR).True positive rate (TPR) is also called Recall or sensitivity (SEN).False positive rate (FPR) is also called the probability of false alarm.TPR is defined as per equation 5.5 and FPR is defined in equation 5ROC AUC example is shown in Figure 5.1.It is used for classification problems with probabilities of result as decision makers.AUC compares two models as well as evaluates the same model's performance across different thresholds.ROC AUC is calculated by plotting TPR versus FPR are different thresholds.