Application of deep learning in biomechanical image recognition: Based on transformer architecture
Abstract
Biomechanical image recognition has important applications in clinical diagnosis and biomedical engineering, but traditional convolutional neural network (CNN) has limitations in capturing global features. In this paper, a biomechanical image recognition method based on Vision Transformer (ViT) is proposed to improve the classification performance of complex images. Biomechanical image dataset containing five types of data is constructed, and ViT input features are represented by standardization, data enhancement and Patch segmentation. Accuracy, precision, recall, F1 score and confusion matrix are used to evaluate the performance, and compared with ResNet-50 and DenseNet-121. The experimental results show that the accuracy of ViT model is 92.3%, and it performs best in the categories of “normal bones” and “soft tissue lesions”, and other indicators are better than the traditional CNN model. ViT realizes global feature modeling through self-attention mechanism, which significantly improves the recognition accuracy and robustness, provides efficient and accurate technical support for clinical diagnosis, disease screening and surgical planning, and shows its application potential in the field of biomechanical image recognition.
References
1. Yu X, Li S, Zhang Y. Incorporating convolutional and transformer architectures to enhance semantic segmentation of fine-resolution urban images. European Journal of Remote Sensing. 2024; 57(1). doi: 10.1080/22797254.2024.2361768
2. Garibaldi-Márquez F, Martínez-Barba DA, Montañez-Franco LE, et al. Enhancing site-specific weed detection using deep learning transformer architectures. Crop Protection. 2025; 190: 107075. doi: 10.1016/j.cropro.2024.107075
3. Huang C, Wu Z, Xi H, et al. kMaXU: Medical image segmentation U-Net with k-means Mask Transformer and contrastive cluster assignment. Pattern Recognition. 2025; 161: 111274. doi: 10.1016/j.patcog.2024.111274
4. Song Y, Zhou Q. Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture. Applied Artificial Intelligence. 2024; 38(1). doi: 10.1080/08839514.2024.2356992
5. Asha CS, Siddiq AB, Akthar R, et al. ODD-Net: a hybrid deep learning architecture for image dehazing. Scientific Reports. 2024; 14(1). doi: 10.1038/s41598-024-82558-6
6. Guerrero-Pantoja D, Pautsch E, Almeida C, et al. Accelerating uncertainty methods for distributed deep learning on novel architectures. The Journal of Supercomputing. 2024; 81(1). doi: 10.1007/s11227-024-06818-y
7. Zhang Z, Song W, Wu Q, et al. A novel local enhanced channel self-attention based on Transformer for industrial remaining useful life prediction. Engineering Applications of Artificial Intelligence. 2025; 141: 109815. doi: 10.1016/j.engappai.2024.109815
8. Paul A, Mallidi S. Enhancing signal-to-noise ratio in real-time LED-based photoacoustic imaging: A comparative study of CNN-based deep learning architectures. Photoacoustics. 2025; 41: 100674. doi: 10.1016/j.pacs.2024.100674
9. Arumai Shiney S, Geetha R. AGBUNet: an enhanced CNN-UNET architecture for the prediction of above ground biomass using deep learning. Neural Computing and Applications; 2024.
10. Omari Alaoui A, Boutahir MK, El Bahi O, et al. Accelerating deep learning model development—towards scalable automated architecture generation for optimal model design. Multimedia Tools and Applications; 2024.
11. Tanneeru VR, Miriyala S, Narukull VR, et al. A deep learning model employing Bi-LSTM architecture to predict Martian ionosphere electron density using data from the Mars Global Surveyor mission. Advances in Space Research. 2024; 74(12): 6343-6355. doi: 10.1016/j.asr.2024.07.051
12. Yang Z, Li G, Xue G, et al. A novel multi-sensor local and global feature fusion architecture based on multi-sensor sparse Transformer for intelligent fault diagnosis. Mechanical Systems and Signal Processing. 2025; 224: 112188. doi: 10.1016/j.ymssp.2024.112188
13. Hu Z, Wang Y, Qi H, et al. Real-time 3D temperature field reconstruction for aluminum alloy forging die using Swin Transformer integrated deep learning framework. Applied Thermal Engineering. 2025; 260: 125033. doi: 10.1016/j.applthermaleng.2024.125033
14. Wang Z, Wang B, Dou H, et al. Windows deep transformer Q-networks: an extended variance reduction architecture for partially observable reinforcement learning. Research square; 2024.
15. Naumova K, Devos A, Karimireddy SP, et al. MyThisYourThat for interpretable identification of systematic bias in federated learning for biomedical images. npj Digital Medicine. 2024; 7(1). doi: 10.1038/s41746-024-01226-1
16. Ilyasova NYu, Demin NS. Systems for Recognition and Intelligent Analysis of Biomedical Images. Pattern Recognition and Image Analysis. 2023; 33(4): 1142-1167. doi: 10.1134/s105466182304020x
17. Qiao M, Dong S. Analysis of Biomechanical Parameters of Martial Arts Routine Athletes’ Jumping Difficulty Based on Image Recognition. Computational intelligence and neuroscience; 2024.
Copyright (c) 2025 Author(s)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright on all articles published in this journal is retained by the author(s), while the author(s) grant the publisher as the original publisher to publish the article.
Articles published in this journal are licensed under a Creative Commons Attribution 4.0 International, which means they can be shared, adapted and distributed provided that the original published version is cited.