CALF-GAN: Multi-scale convolutional attention for latent feature-guided cross-modality MR image synthesis
Abstract
Multimodal medical image synthesis plays a crucial supportive role in research within the field of biomechanics, providing high-precision data and analytical methods for studies on anatomical structures, tissue characteristics, and mechanical modeling. However, due to practical constraints, certain modalities of medical images may be difficult to obtain, posing challenges for model training and high-accuracy biomechanical research. Existing methods employ convolutional neural network (CNN)-based generative adversarial models to synthesize missing modality information across modalities. However, CNNs are limited in their ability to model long-range dependencies. Transformers offer a new paradigm to address these limitations, yet their high computational and memory demands remain a significant drawback. To tackle these challenges, we propose a novel generative adversarial model, termed the Convolutional Attention Latent Feature GAN (CALF-GAN), which leverages multi-scale convolutional attention for cross-modal medical image synthesis. A dedicated latent attribute separation module is employed to disentangle modality-specific features between source and target modality images, enhancing the synthesis of medical semantics, such as pixel intensity values. Furthermore, to improve the model’s capacity for long-range dependency modeling while reducing computational overhead, we design a generation module based on multi-scale convolutional attention, capturing long-range dependencies using only convolutional operations. Extensive experiments conducted on various medical image datasets demonstrate that CALF-GAN achieves remarkable generalizability and outstanding overall performance under low memory requirements, making it well-suited for application in high-precision biomechanics research.
References
1. Wang L, Zhu Z. Applications and challenges of artificial intelligence-driven 3D vision in biomedical engineering: A biomechanics perspective. Molecular & Cellular Biomechanics. 2025; 22(2): 1006. doi: 10.62617/mcb1006
2. Barkaoui A, Ait Oumghar I, Ben Kahla R. Review on the use of medical imaging in orthopedic biomechanics: finite element studies. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. 2021; 9(5): 535–554. doi: 10.1080/21681163.2021.1888317
3. Seyedpour SM, Nabati M, Lambers L, et al. Application of Magnetic Resonance Imaging in Liver Biomechanics: A Systematic Review. Frontiers in Physiology. 2021; 12. doi: 10.3389/fphys.2021.733393
4. Galbusera F, Cina A, Panico M, et al. Image-based biomechanical models of the musculoskeletal system. European Radiology Experimental. 2020; 4(1). doi: 10.1186/s41747-020-00172-3
5. Zhang J, Sun K, Yang J, et al. A generalized dual-domain generative framework with hierarchical consistency for medical image reconstruction and synthesis. Communications Engineering. 2023; 2(1). doi: 10.1038/s44172-023-00121-z
6. Yang H, Zhou T, Zhou Y, et al. Flexible Fusion Network for Multi-Modal Brain Tumor Segmentation. IEEE Journal of Biomedical and Health Informatics. 2023; 27(7): 3349–3359. doi: 10.1109/jbhi.2023.3271808
7. Mahapatra D, Bozorgtabar B, Ge Z, et al. GANDALF: Graph-based transformer and Data Augmentation Active Learning Framework with interpretable features for multi-label chest Xray classification. Medical Image Analysis. 2024; 93: 103075. doi: 10.1016/j.media.2023.103075
8. van Herten RLM, Hampe N, Takx RAP, et al. Automatic Coronary Artery Plaque Quantification and CAD-RADS Prediction Using Mesh Priors. IEEE Transactions on Medical Imaging. 2024; 43(4): 1272–1283. doi: 10.1109/tmi.2023.3326243
9. Cao H, Wang Y, Chen J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation. In: Proceedings of the European conference on computer vision; 2022. pp. 205–218.
10. Thukral BB. Problems and preferences in pediatric imaging. Indian Journal of Radiology and Imaging. 2015; 25(04): 359–364. doi: 10.4103/0971-3026.169466
11. Krupa K, Bekiesińska-Figatowska M. Artifacts in Magnetic Resonance Imaging. Polish Journal of Radiology. 2015; 80: 93–106. doi: 10.12659/pjr.892628
12. Dalmaz O, Yurt M, Cukur T. ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis. IEEE Transactions on Medical Imaging. 2022; 41(10): 2598–2614. doi: 10.1109/tmi.2022.3167808
13. Liu J, Pasumarthi S, Duffy B, et al. One Model to Synthesize Them All: Multi-Contrast Multi-Scale Transformer for Missing Data Imputation. IEEE Transactions on Medical Imaging. 2023; 42(9): 2577–2591. doi: 10.1109/tmi.2023.3261707
14. Luo Y, Nie D, Zhan B, et al. Edge-preserving MRI image synthesis via adversarial network with iterative multi-scale fusion. Neurocomputing. 2021; 452: 63–77. doi: 10.1016/j.neucom.2021.04.060
15. Chartsias A, Joyce T, Giuffrida MV, et al. Multimodal MR Synthesis via Modality-Invariant Latent Representation. IEEE Transactions on Medical Imaging. 2018; 37(3): 803–814. doi: 10.1109/tmi.2017.2764326
16. Sevetlidis V, Giuffrida MV, Tsaftaris SA. Whole image synthesis using a deep encoder-decoder network. In: Simulation and Synthesis in Medical Imaging: First International Workshop, SASHIMI 2016. Springer; 2016. pp. 127–137.
17. Huang P, Li D, Jiao Z, et al. Common feature learning for brain tumor MRI synthesis by context-aware generative adversarial network. Medical Image Analysis. 2022; 79: 102472. doi: 10.1016/j.media.2022.102472
18. Ang SP, Lam Phung S, Field M, et al. An Improved Deep Learning Framework for MR-to-CT Image Synthesis with a New Hybrid Objective Function. 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). 2022: 1–5. doi: 10.1109/isbi52829.2022.9761546
19. Cao B, Zhang H, Wang N, et al. Auto-GAN: Self-Supervised Collaborative Learning for Medical Image Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence. 2020; 34(07): 10486–10493. doi: 10.1609/aaai.v34i07.6619
20. Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision; 2017. pp. 2223–2232.
21. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in neural information processing systems. 2017; 30.
22. Li Y, Zhou T, He K, et al. Multi-Scale Transformer Network with Edge-Aware Pre-Training for Cross-Modality MR Image Synthesis. IEEE Transactions on Medical Imaging. 2023; 42(11): 3395–3407. doi: 10.1109/tmi.2023.3288001
23. Chen J, Lu Y, Yu Q, Luo X, et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv; 2021.
24. Zhao B, Cheng T, Zhang X, et al. CT synthesis from MR in the pelvic area using Residual Transformer Conditional GAN. Computerized Medical Imaging and Graphics. 2023; 103: 102150. doi: 10.1016/j.compmedimag.2022.102150
25. Sun H, Wen Y, Feng H, et al. Unsupervised Bidirectional Contrastive Reconstruction and Adaptive Fine-Grained Channel Attention Networks for image dehazing. Neural Networks. 2024; 176: 106314. doi: 10.1016/j.neunet.2024.106314
26. Xu Z, Wu D, Yu C, et al. SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence. 2024; 38(6): 6378–6386. doi: 10.1609/aaai.v38i6.28457
27. Li X, Qin X, Huang C, et al. SUnet: A multi-organ segmentation network based on multiple attention. Computers in Biology and Medicine. 2023; 167: 107596. doi: 10.1016/j.compbiomed.2023.107596
28. Shen Y, Gu J, Tang X, et al. Interpreting the Latent Space of GANs for Semantic Face Editing. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. pp. 9243–9252.
29. Zhu J, Shen Y, Zhao D, Zhou B. In-domain gan inversion for real image editing. In: Proceedings of the European conference on computer vision; 2020. pp. 592–608.
30. Yang G, Fei N, Ding M, et al. L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021. pp. 2951–2960.
31. Masutani EM. Deep Learning Image Synthesis for MRI: From Super-Resolution to Cardiovascular Biomechanics. University of California, San Diego; 2022.
32. Peng X, Tang L. Biomechanics analysis of real-time tennis batting images using Internet of Things and deep learning. The Journal of Supercomputing. 2021; 78(4): 5883–5902. doi: 10.1007/s11227-021-04111-w
33. Shi Y, Ma S, Zhao Y, et al. A Physics-Informed Low-Shot Adversarial Learning for sEMG-Based Estimation of Muscle Force and Joint Kinematics. IEEE Journal of Biomedical and Health Informatics. 2024; 28(3): 1309–1320. doi: 10.1109/jbhi.2023.3347672
34. Sohail M, Riaz MN, Wu J, Long C, LiS. Unpaired multi- contrast mr image synthesis using generative adversarial networks. In: Proceedings of the International Workshop on Simulation and Synthesis in Medical Imaging; 2019. pp. 22–31.
35. Ge Y, Wei D, Xue Z, et al. Unpaired Mr to CT Synthesis with Explicit Structural Constrained Adversarial Learning. In: Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019); 2019. pp. 1096–1099.
36. Dong X, Wang T, Lei Y, et al. Synthetic CT generation from non-attenuation corrected PET images for whole-body PET imaging. Physics in Medicine & Biology. 2019; 64(21): 215016. doi: 10.1088/1361-6560/ab4eb7
37. Yurt M, Dar SUH, Özbey M, et al. Semi-supervised learning of mutually accelerated mri synthesis without fully-sampled ground truths. arXiv; 2020.
38. Liang X, Chen L, Nguyen D, et al. Generating synthesized computed tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation therapy. Physics in Medicine & Biology. 2019; 64(12): 125002. doi: 10.1088/1361-6560/ab22f9
39. Emami H, Dong M, Nejad-Davarani SP, Glide- Hurst CK. Sa-gan: Structure-aware gan for organ-preserving synthetic ct generation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference; 2021. pp. 471–481.
40. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv; 2020.
41. Perera S, Adhikari S, Yilmaz A. Pocformer: A Lightweight Transformer Architecture For Detection Of Covid-19 Using Point Of Care Ultrasound. In: Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP). 2021. pp. 195–199.
42. Chen J, He Y, Frey EC, et al. Vit-v-net: Vision transformer for unsupervised volumetric medical image registra- tion. arXiv; 2021.
43. Zhang X, He X, Guo J, et al. Ptnet: A high-resolution infant mri synthesizer based on transformer. arXiv; 2021.
44. Fetty L, Bylund M, Kuess P, et al. Latent space manipulation for high-resolution medical image synthesis via the StyleGAN. Zeitschrift für Medizinische Physik. 2020; 30(4): 305–314. doi: 10.1016/j.zemedi.2020.05.001
45. Ioffe S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv; 2015.
46. Choi Y, Uh Y, Yoo J, et al. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Published online June 2020. pp. 8188– 8197.
47. Isola P, Zhu JY, Zhou T, et al. Image-to-Image Translation with Conditional Adversarial Networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017: pp. 1125–1134.
48. Menze BH, Jakab A, Bauer S, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging. 2015; 34(10): 1993–2024. doi: 10.1109/tmi.2014.2377694
49. Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017. pp. 618–626. doi: 10.1109/iccv.2017.74
Copyright (c) 2025 Author(s)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright on all articles published in this journal is retained by the author(s), while the author(s) grant the publisher as the original publisher to publish the article.
Articles published in this journal are licensed under a Creative Commons Attribution 4.0 International, which means they can be shared, adapted and distributed provided that the original published version is cited.