International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 8, Issue 2 (March-April 2026) Submit your research before last 3 days of April to publish your research paper in the issue of March-April.

A Novel Hybrid Model for Spatio-Temporal Human Activity Recognition in Video Streams

Author(s) Prof. Rashidkhan R Pathan, Prof. Hetal S Chavda
Country India
Abstract Human Activity Recognition (HAR) from video streams has become an important research area in computer vision because of its vast applications in surveillance, health monitoring, sports analysis, and human-machine interaction. Conventional approaches based on either spatial or temporal features alone tend to fail to make accurate decisions in challenging real-world situations. In this work, we introduce a new hybrid deep learning architecture that successfully combines Convolutional Neural Networks (CNNs) for spatial feature extraction with Gated Recurrent Units (GRUs) for temporal sequence modeling. Our model captures both appearance and motion dynamics intrinsic in human actions through consecutive video frames. The suggested model is tested on three benchmark video datasets, namely UCF11, KTH, and UCF Sports Action, containing various human actions and adverse environmental factors like camera movements, occlusion, and changed angles. Experimental evaluations show that the suggested CNN-GRU model attains higher accuracy of 97.37% on the UCF11 dataset, 97.4% on KTH, and 97.5% on UCF Sports compared to some current deep-learning baselines. We also use methods like data augmentation, dropout regularization, and learning rate scheduling to increase generalization and avoid overfitting. The results demonstrate the power of our compact yet robust hybrid model in spatio-temporal video classification tasks and offer a scalable solution for real-world HAR systems. Future directions involve expanding the model to real-time recognition and investigating transformer-based optimization for better context modeling.
Keywords Human Activity Recognition (HAR), Spatio-Temporal Features, Video Classification, Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), Deep Learning, Action Recognition, Video Streams, Temporal Modeling, Computer Vision
Field Computer Applications
Published In Volume 7, Issue 5, September-October 2025
Published On 2025-10-22
DOI https://doi.org/10.36948/ijfmr.2025.v07i05.57511

Share this