Tennis serve recognition based on bidirectional long- and short-term memory neural networks

  • Jianhong Ni Modern Education Technology Center, Hebei Institute of Physical Education, Shijiazhuang 050063, China
  • Jing Wang Modern Education Technology Center, Hebei Institute of Physical Education, Shijiazhuang 050063, China
Keywords: BiLSTM; CNN; spatial feature information; tennis serve; self-attentive weighting
Article ID: 1546

Abstract

The serve is a crucial technique in tennis, providing players the opportunity to master and organize their attacks during competitions. Current tennis serve training methods often lack sophisticated, data-driven tools, and existing recognition techniques rely on single-feature extraction methods focusing on isolated attributes like trajectory or speed. These methods do not fully utilize the comprehensive spatio-temporal information present in video data, resulting in limited accuracy and robustness in recognizing and analyzing different serving techniques. To address these limitations, this paper proposes a tennis serve recognition method using a bi-directional long short-term memory neural network (BiLSTM). Our approach first employs a modified convolutional neural network (CNN) to extract spatial features from images, enhanced by self-attentive weighting to improve feature extraction. It then uses BiLSTM to capture and represent important temporal features, thereby enhancing the model’s ability to recognize and evaluate serving actions. Experimental results demonstrate that our method outperforms existing neural network models in server recognition tasks, effectively addressing the limitations of previous approaches.

References

1. Elliott B, Fleisig G, Nicholls R, et al. Technique effects on upper limb loading in the tennis serve. Journal of Science and Medicine in Sport. 2003; 6(1): 76-87. doi: 10.1016/S1440-2440(03)80011-7

2. Kwon H, Lee S. Detecting textual adversarial examples through text modification on text classification systems. Applied Intelligence. 2023; 53(16): 19161-19185. doi: 10.1007/s10489-022-03313-w

3. Kwon H, Lee S. Ensemble transfer attack targeting text classification systems. Computers & Security. 2022; 117: 102695. doi: 10.1016/j.cose.2022.102695

4. Kwon H. Dual-Targeted Textfooler Attack on Text Classification Systems. IEEE Access. 2023; 11: 15164-15173. doi: 10.1109/access.2021.3121366

5. Kwon H. Friend-guard textfooler attack on text classification system. IEEE Access; 2021.

6. Abdullahi SB, Chamnongthai K. IDF-Sign: Addressing Inconsistent Depth Features for Dynamic Sign Word Recognition. IEEE Access. 2023; 11: 88511-88526. doi: 10.1109/access.2023.3305255

7. Abdullahi SB, Chamnongthai K, Bolon-Canedo V, et al. Spatial–temporal feature-based End-to-end Fourier network for 3D sign language recognition. Expert Systems with Applications. 2024; 248: 123258. doi: 10.1016/j.eswa.2024.123258

8. Hsieh JW, Hsu YT, Liao HYM, et al. Video-Based Human Movement Analysis and Its Application to Surveillance Systems. IEEE Transactions on Multimedia. 2008; 10(3): 372-384. doi: 10.1109/tmm.2008.917403

9. Turaga P, Chellappa R, Veeraraghavan A. Advances in video-based human activity analysis: Challenges and approaches. Advances in Computers. 2010; 80: 237-290. doi: 10.1016/S0065-2458(10)80007-5

10. Bianchi V, Bassoli M, Lombardo G, et al. IoT Wearable Sensor and Deep Learning: An Integrated Approach for Personalized Human Activity Recognition in a Smart Home Environment. IEEE Internet of Things Journal. 2019; 6(5): 8553-8562. doi: 10.1109/jiot.2019.2920283

11. Abdullahi SB, Chamnongthai K. American Sign Language Words Recognition of Skeletal Videos Using Processed Video Driven Multi-Stacked Deep LSTM. Sensors. 2022; 22(4): 1406. doi: 10.3390/s22041406

12. Khalifa S, Lan G, Hassan M, et al. HARKE: Human Activity Recognition from Kinetic Energy Harvesting Data in Wearable Devices. IEEE Transactions on Mobile Computing. 2018; 17(6): 1353-1368. doi: 10.1109/tmc.2017.2761744

13. Abdullahi SB, Chamnongthai K. American Sign Language Words Recognition Using Spatio-Temporal Prosodic and Angle Features: A Sequential Learning Approach. IEEE Access. 2022; 10: 15911-15923. doi: 10.1109/access.2022.3148132

14. Arif S, Ul-Hassan T, Hussain F, et al. Video representation by dense trajectories motion map applied to human activity recognition. International Journal of Computers and Applications. 2018; 42(5): 474-484. doi: 10.1080/1206212x.2018.1486001

15. Al-Faris M, Chiverton J, Yang Y, et al. Deep Learning of Fuzzy Weighted Multi-Resolution Depth Motion Maps with Spatial Feature Fusion for Action Recognition. Journal of Imaging. 2019; 5(10): 82. doi: 10.3390/jimaging5100082

16. Ji S, Xu W, Yang M, et al. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013; 35(1): 221-231. doi: 10.1109/tpami.2012.59

17. Singh RD, Mittal A, Bhatia RK. 3D convolutional neural network for object recognition: a review. Multimedia Tools and Applications. 2018; 78(12): 15951-15995. doi: 10.1007/s11042-018-6912-6

18. Abdullahi SB, Bature ZA, Chophuk P, et al. Sequence-wise multimodal biometric fingerprint and finger-vein recognition network (STMFPFV-Net). Intelligent Systems with Applications. 2023; 19: 200256. doi: 10.1016/j.iswa.2023.200256

19. Maturana D, Scherer S. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2015. doi: 10.1109/iros.2015.7353481

20. Li R, Liu Q, Gui J, et al. Indoor Relocalization in Challenging Environments With Dual-Stream Convolutional Neural Networks. IEEE Transactions on Automation Science and Engineering. 2018; 15(2): 651-662. doi: 10.1109/tase.2017.2664920

21. Yang Y, Tu W, Huang S, et al. Dual-Stream Convolutional Neural Network With Residual Information Enhancement for Pansharpening. IEEE Transactions on Geoscience and Remote Sensing. 2022; 60: 1-16. doi: 10.1109/tgrs.2021.3098752

22. Tiong LCO, Kim ST, Ro YM. Multimodal facial biometrics recognition: Dual-stream convolutional neural networks with multi-feature fusion layers. Image and Vision Computing. 2020; 102: 103977. doi: 10.1016/j.imavis.2020.103977

23. Huang E, Zheng X, Fang Y, et al. Classification of Motor Imagery EEG Based on Time-Domain and Frequency-Domain Dual-Stream Convolutional Neural Network. IRBM. 2022; 43(2): 107-113. doi: 10.1016/j.irbm.2021.04.004

24. Tiong LCO, Teoh ABJ, Lee Y. Periocular Recognition in the Wild with Orthogonal Combination of Local Binary Coded Pattern in Dual-stream Convolutional Neural Network. In: Proceedings of the 2019 International Conference on Biometrics (ICB); 2019. doi: 10.1109/icb45273.2019.8987278

25. Mutegeki R, Han DS. A CNN-LSTM Approach to Human Activity Recognition. In: Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC); 2020. doi: 10.1109/icaiic48513.2020.9065078

26. Zhao J, Mao X, Chen L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control. 2019; 47: 312-323. doi: 10.1016/j.bspc.2018.08.035

27. Heryadi Y, Warnars HLHS. Learning temporal representation of transaction amount for fraudulent transaction recognition using CNN, Stacked LSTM, and CNN-LSTM. In: Proceedings of the 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom); 2017. doi: 10.1109/cyberneticscom.2017.8311689

28. Ullah A, Ahmad J, Muhammad K, et al. Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features. IEEE Access. 2018; 6: 1155-1166. doi: 10.1109/access.2017.2778011

29. Wigington C, Stewart S, Davis B, et al. Data Augmentation for Recognition of Handwritten Words and Lines Using a CNN-LSTM Network. In: Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR); 2017. doi: 10.1109/icdar.2017.110

30. Ercolano G, Riccio D, Rossi S. Two deep approaches for ADL recognition: A multi-scale LSTM and a CNN-LSTM with a 3D matrix skeleton representation. In: Proceedings of the 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN); 2017. doi: 10.1109/roman.2017.8172406

31. Liu R, Shen J, Wang H, et al. Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. doi: 10.1109/cvpr42600.2020.00511

32. Jain D, Kumar A, Garg G. Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN. Applied Soft Computing. 2020; 91: 106198. doi: 10.1016/j.asoc.2020.106198

33. Deng WM, Zhang HB, Lei Q, et al. Pose attention and object semantic representation-based human-object interaction detection network. Multimedia Tools and Applications. 2022; 81(27): 39453-39470. doi: 10.1007/s11042-022-13146-x

34. Yan C, Tu Y, Wang X, et al. STAT: Spatial-Temporal Attention Mechanism for Video Captioning. IEEE Transactions on Multimedia. 2020; 22(1): 229-241. doi: 10.1109/tmm.2019.2924576

35. Vo K, Truong S, Yamazaki K, et al. AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation. International Journal of Computer Vision. 2022; 131(1): 302-323. doi: 10.1007/s11263-022-01702-9

36. Stroud JC, Ross DA, Sun C, et al. D3D: Distilled 3D Networks for Video Action Recognition. In: Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV); 2020. doi: 10.1109/wacv45572.2020.9093274

Published
2025-03-17
How to Cite
Ni, J., & Wang, J. (2025). Tennis serve recognition based on bidirectional long- and short-term memory neural networks. Molecular & Cellular Biomechanics, 22(4), 1546. https://doi.org/10.62617/mcb1546
Section
Article