Development of multimodal English translation system based on biomechanics

  • Yaodan Liang School of Foreign Languages, Yulin Normal University, Yulin 537000, China
Keywords: biomechanical information; multimodal information; English translation system; transformer model; oral opening angle
Article ID: 1136

Abstract

With the increasing demand for translation quality, improving the translation quality of English translation systems has become a severe challenge. Traditional English translation systems mostly use single-modal information of text for translation, ignore multimodal information such as speech, and fail to combine the biomechanical movement data of the mouth, throat, and tongue, resulting in poor English translation quality. This paper develops an English translation system based on text, speech and biomechanical motion data, using the translation modeling of Transformer and the temporal feature processing advantages of BiLSTM (Bidirectional Long Short-Term Memory). The study first used MFCC to extract speech features and the spectral features of the audio, and used oral pressure sensors, optical sensors and other sensors to synchronously collect the students’ oral opening angles and lip curvature, vocal cord vibration frequency, glottal opening and closing degree, tongue tip position, tongue back curvature and other biomechanical motion feature data, and then input the features into the BiLSTM model for time series modeling and the time series relationship between speech and biomechanical features. Then, a multimodal Transformers framework was constructed, and text, speech, and biomechanical data were integrated using a weighted fusion method, and the multi-head attention mechanism was combined for translation. Finally, the BiLSTM model and the multimodal Transformers framework were integrated in series, and the module was designed and integrated into the English translation system. The experiment was based on the LibriSpeech ASR corpus. The results showed that the multimodal combination of text + audio + biomechanical information performed best, reaching 0.82, 0.76 and 0.79 in BLEU, METEOR, and ROUGE-L indicators respectively, which were 0.06 higher than the multimodal combination without the introduction of biomechanical information, and the loss of emotion transfer was only 0.01. The experimental results show that combining biomechanical information can significantly improve the translation quality of the multimodal English translation system and maintain the original emotion of the language.

References

1. Su Y. Intercultural Communication and Language Conversion in Translation Studies. International Journal of Education and Humanities. 2023; 10(1): 186-189. doi: 10.54097/ijeh.v10i1.11116

2. Wang J. The Role of Sentiment Analysis in Machine Translation- A Review on the Example of ChatGPT. Science and Technology of Engineering, Chemistry and Environmental Protection. 2024; 1(9). doi: 10.61173/cqqm2k31

3. Zhang C, Yu T, Gao Y, et al. Design of a Smart Teaching English Translation System Based on Big Data Machine Learning. International Journal of Web-Based Learning and Teaching Technologies. 2023; 18(2): 1-14. doi: 10.4018/ijwltt.330144

4. Sitender, Bawa S, Kumar M, et al. A comprehensive survey on machine translation for English, Hindi and Sanskrit languages. Journal of Ambient Intelligence and Humanized Computing. 2021; 14(4): 3441-3474. doi: 10.1007/s12652-021-03479-0

5. Ma W, Yan B, Sun L. Generative Adversarial Network-Based Short Sequence Machine Translation from Chinese to English. Ding B, ed. Scientific Programming. 2022; 2022: 1-10. doi: 10.1155/2022/7700467

6. Zhang H, Si N, Chen Y, et al. Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023; 31: 1075-1086. doi: 10.1109/taslp.2023.3244521

7. Wu Y, Qin Y. Machine translation of English speech: Comparison of multiple algorithms. Journal of Intelligent Systems. 2022; 31(1): 159-167. doi: 10.1515/jisys-2022-0005

8. Meetei LS, Singh SM, Singh A, et al. Hindi to English Multimodal Machine Translation on News Dataset in Low Resource Setting. Procedia Computer Science. 2023; 218: 2102-2109. doi: 10.1016/j.procs.2023.01.186

9. Yang Y, Pan X, Wang R, Zheng Y. English Translation Strategies of Traditional Chinese Cultural Terms in Multimodal Context. China Terminology. 2023.

10. Gambier Y. Audiovisual translation and multimodality: what future?. Media and intercultural communication: a multidisciplinary journal. 2023.

11. Shi X, Yu Z. Adding Visual Information to Improve Multimodal Machine Translation for Low-Resource Language. Yousaf MH, ed. Mathematical Problems in Engineering. 2022; 2022: 1-9. doi: 10.1155/2022/5483535

12. Kent RD. The Feel of Speech: Multisystem and Polymodal Somatosensation in Speech Production. Journal of Speech, Language, and Hearing Research. 2024; 67(5): 1424-1460. doi:10.1044/2024_JSLHR-23-00575

13. Serrurier A, Neuschaefer-Rube C. Morphological and acoustic modeling of the vocal tract. The Journal of the Acoustical Society of America. 2023; 153(3): 1867-1886. doi: 10.1121/10.0017356

14. Mielke J, Hussain Q, Moisik SR. Development of a new vowel feature from coarticulation: Biomechanical modeling of rhotic vowels in Kalasha. Laboratory Phonology. 2023; 14(1). doi: 10.16995/labphon.9019

15. Zhao Y, Zhang J, Zong C. Transformer: A General Framework from Machine Translation to Others. Machine Intelligence Research. 2023; 20(4): 514-538. doi: 10.1007/s11633-022-1393-5

16. Rahali A, Akhloufi MA. End-to-End Transformer-Based Models in Textual-Based NLP. AI. 2023; 4(1): 54-110. doi: 10.3390/ai4010004

17. Gamage S, Lakshan K, Wickramarathna V, et al. A Multimodal Approach for Real-Time Sinhala Sign Language Translation. International Research Journal of Innovations in Engineering and Technology. 2023.

18. Liu H. A new double attention decoding model based on cascade RCNN and word embedding fusion for Chinese-English multimodal translation. International Journal of Reasoning-based Intelligent Systems. 2024; 16(1): 26-36. doi: 10.1504/ijris.2024.137429

19. Kumhar SH, Ansarullah SI, Gardezi AA, et al. Translation of English Language into Urdu Language Using LSTM Model. Computers, Materials & Continua. 2023; 74(2): 3899-3912. doi: 10.32604/cmc.2023.032290

20. Zhili W, Qian Z. A Deep Learning-based Method for Determining Semantic Similarity of English Translation Keywords. International Journal of Advanced Computer Science and Applications. 2024; 15(5). doi: 10.14569/ijacsa.2024.0150531

21. Gamal D, Alfonse M, Jiménez-Zafra SM, et al. Case Study of Improving English-Arabic Translation Using the Transformer Model. International Journal of Intelligent Computing and Information Sciences. 2023; 23(2): 105-115. doi: 10.21608/ijicis.2023.210435.1270

22. Li Y, Shan Y, Liu Z, et al. Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese. Soft Computing. 2022; 27(18): 13435-13443. doi: 10.1007/s00500-022-07678-5

23. Badawi S. Transformer-Based Neural Network Machine Translation Model for the Kurdish Sorani Dialect. UHD Journal of Science and Technology. 2023; 7(1): 15-21. doi: 10.21928/uhdjst.v7n1y2023.pp15-21

24. Liu S, Gao P, Li Y, et al. Multi-modal fusion network with complementarity and importance for emotion recognition. Information Sciences. 2023; 619: 679-694. doi: 10.1016/j.ins.2022.11.076

25. Li L, Tayir T, Han Y, et al. Multimodality information fusion for automated machine translation. Information Fusion. 2023; 91: 352-363. doi: 10.1016/j.inffus.2022.10.018

26. Zheng W, Gong G, Tian J, et al. Design of a Modified Transformer Architecture Based on Relative Position Coding. International Journal of Computational Intelligence Systems. 2023; 16(1). doi: 10.1007/s44196-023-00345-z

27. Guo Z, Hou Y, Hou C, et al. Locality-Aware Transformer for Video-Based Sign Language Translation. IEEE Signal Processing Letters. 2023; 30: 364-368. doi: 10.1109/lsp.2023.3263808

28. Tian T, Song C, Ting J, et al. A French-to-English Machine Translation Model Using Transformer Network. Procedia Computer Science. 2022; 199: 1438-1443. doi: 10.1016/j.procs.2022.01.182

29. Sameer M, Talib A, Hussein A, Husni H. Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer. Journal of Techniques, 2023.

30. Xiang Y, Chen Y, Fan W, et al. Enhancing computer-aided translation system with BiLSTM and convolutional neural network using a knowledge graph approach. The Journal of Supercomputing. 2023; 80(5): 5847-5869. doi: 10.1007/s11227-023-05686-2

31. Safder I, Ali M, Aljohani NR, et al. Neural machine translation for in‐text citation classification. Journal of the Association for Information Science and Technology. 2023; 74(10): 1229-1240. doi: 10.1002/asi.24817

32. Rao YSN, Chong YT, Khan RU, et al. Dynamic Sign Language Recognition and Translation Through Deep Learning: A Systematic Literature Review. Journal of Theoretical and Applied Information Technology. 2024.

33. Gourisaria MK, Agrawal R, Sahni M, et al. Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques. Discover Internet of Things. 2024; 4(1). doi: 10.1007/s43926-023-00049-y

34. Biswas M, Rahaman S, Ahmadian A, et al. Automatic spoken language identification using MFCC based time series features. Multimedia Tools and Applications. 2022; 82(7): 9565-9595. doi: 10.1007/s11042-021-11439-1

35. Nayak SS, Darji AD, Shah PK. Machine learning approach for detecting Covid-19 from speech signal using Mel frequency magnitude coefficient. Signal, Image and Video Processing. 2023; 17(6): 3155-3162. doi: 10.1007/s11760-023-02537-8

Published
2025-02-24
How to Cite
Liang, Y. (2025). Development of multimodal English translation system based on biomechanics. Molecular & Cellular Biomechanics, 22(3), 1136. https://doi.org/10.62617/mcb1136
Section
Article