1
Electrical and Computer Engineering, Ayatollah Boroujerdi University, Boroujerd, Iran
2
Faculty of Electrical and Computer Engineering, Artificial Intelligence Department. Kashan, Iran
Abstract
In the rapidly advancing domain of video analysis, specifically within the subfield of human action recognition, sophisticated neural network architectures have become pivotal tools due to their proficiency in capturing intricate spatiotemporal features. Recent advancements highlight the integration of generative and recurrent models as particularly effective in modeling the complexity of human activities. Building upon these developments, this paper proposes a novel deep learning architecture that combines two-dimensional (2D) Restricted Boltzmann Machines (RBMs) with Long Short-Term Memory (LSTM) networks to comprehensively address persistent challenges encountered in action recognition tasks. The proposed architecture provides several notable contributions. First, by directly operating on raw 2D video frames, it retains rich spatial details intrinsic to visual data, thereby eliminating the need for complex and resource-intensive preprocessing techniques. Second, the integration of 2D RBMs enables efficient extraction and representation of spatial features, effectively capturing intricate spatial structures present within individual frames. Third, by incorporating LSTM units, the model adeptly learns temporal dependencies and dynamics across sequential frames, significantly enhancing its capacity to interpret spatiotemporal contexts crucial for recognizing complex actions. Extensive experimentation conducted on benchmark datasets demonstrates the robustness and superiority of our approach. The proposed model achieves impressive accuracy scores of 95.3% on UCF101, 93.4% on the KTH dataset, and 70.8% on the challenging HMDB51 dataset. These results represent notable improvements over existing state-of-the-art methods, underscoring the effectiveness and potential applicability of our integrated RBM-LSTM architecture for robust and accurate human action recognition in diverse video analysis scenarios.
Joudaki,M and Ebrahimpour Komleh,H . (2024). Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks. Computing and distributed systems, 7(1), 86-97.
MLA
Joudaki,M , and Ebrahimpour Komleh,H . "Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks", Computing and distributed systems, 7, 1, 2024, 86-97.
HARVARD
Joudaki M, Ebrahimpour Komleh H. (2024). 'Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks', Computing and distributed systems, 7(1), pp. 86-97.
CHICAGO
M Joudaki and H Ebrahimpour Komleh, "Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks," Computing and distributed systems, 7 1 (2024): 86-97,
VANCOUVER
Joudaki M, Ebrahimpour Komleh H. Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks. Computing and distributed systems. 2024;7(1):86-97 (In Persian).