Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks

Joudaki, Majid; Ebrahimpour Komleh, Hossein

Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks

Document Type : Original Article

Authors

Majid Joudaki ¹

Hossein Ebrahimpour Komleh ²

¹ Electrical and Computer Engineering, Ayatollah Boroujerdi University, Boroujerd, Iran

² Faculty of Electrical and Computer Engineering, Artificial Intelligence Department. Kashan, Iran

Abstract

In the rapidly advancing domain of video analysis, specifically within the subfield of human action recognition, sophisticated neural network architectures have become pivotal tools due to their proficiency in capturing intricate spatiotemporal features. Recent advancements highlight the integration of generative and recurrent models as particularly effective in modeling the complexity of human activities. Building upon these developments, this paper proposes a novel deep learning architecture that combines two-dimensional (2D) Restricted Boltzmann Machines (RBMs) with Long Short-Term Memory (LSTM) networks to comprehensively address persistent challenges encountered in action recognition tasks. The proposed architecture provides several notable contributions. First, by directly operating on raw 2D video frames, it retains rich spatial details intrinsic to visual data, thereby eliminating the need for complex and resource-intensive preprocessing techniques. Second, the integration of 2D RBMs enables efficient extraction and representation of spatial features, effectively capturing intricate spatial structures present within individual frames. Third, by incorporating LSTM units, the model adeptly learns temporal dependencies and dynamics across sequential frames, significantly enhancing its capacity to interpret spatiotemporal contexts crucial for recognizing complex actions. Extensive experimentation conducted on benchmark datasets demonstrates the robustness and superiority of our approach. The proposed model achieves impressive accuracy scores of 95.3% on UCF101, 93.4% on the KTH dataset, and 70.8% on the challenging HMDB51 dataset. These results represent notable improvements over existing state-of-the-art methods, underscoring the effectiveness and potential applicability of our integrated RBM-LSTM architecture for robust and accurate human action recognition in diverse video analysis scenarios.

Keywords

Deep learning

2D Restricted Boltzmann Machines

LSTM networks

human activity recognition

recurrent neural networks

Subjects

Big Data

XML

PDF 891.19 K

Article View 156
PDF Download 132

Spatiotemporal Modeling of Human Activity in Videos Using 2D Restricted Boltzmann Machines and LSTM Networks

Files

Share

How to cite

Statistics