Machine learning-based diagnosis of Type 2 Diabetes Mellitus using Social Determinants of Health

  • Guopeng Hu College of Physical Education, Huaqiao University, Quanzhou 362021, China
  • Lihan Lin College of Physical Education, Huaqiao University, Quanzhou 362021, China
  • Xiangju Hu School of Public Health, Fujian Medical University, Fuzhou 350005, China; Department for Chronic and Noncommunicable Disease Control and Prevention, Fujian Provincial Center for Disease Control and Prevention, Fuzhou 350001, China
  • Yikun Zheng College of Physical Education, Huaqiao University, Quanzhou 362021, China
  • Xiaoyang Liu College of Physical Education, Huaqiao University, Quanzhou 362021, China
  • Zhenduo Xu Institute of Advanced Manufacturing, Shantou Polytechnic, Shaotou 515078, China
  • Yuqi He College of Physical Education, Huaqiao University, Quanzhou 362021, China
  • Yinghui Zhang College of Physical Education, Huaqiao University, Quanzhou 362021, China
Keywords: Type 2 Diabetes prediction; risk factors; predictive medicine; Noncommunicable Disease; public health
Article ID: 1461

Abstract

Background: In China, half of Type 2 Diabetes Mellitus (T2DM) cases remain undiagnosed, worsening patient health and increasing complication risks and socioeconomic burdens. This study aims to develop a T2DM prediction model by integrating machine learning (ML) methods with Social Determinants of Health (SDoH) data from Fujian Province, China. Methods: This study utilized a cross-sectional design and multi-stage cluster random sampling to assess SDoH and T2DM prevalence in 26,298 participants from April 2019 to April 2020 in Fujian, China. To predict T2DM, the study leveraged 5 machine learning algorithms—Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), with the Synthesized Minority Oversampling Technique (SMOTE) algorithm balancing samples. hyperparameters were tuned through RandomizedSearchCV and GridSearchCV to obtain optimal parameters. Model evaluation metrics included accuracy, recall, precision, Area under Curve (AUC) and F1 Score. SHapley Additive exPlanations (SHAP) analysis elucidated the impact of specific SDoH variables on T2DM risk prediction. Results: Among the 26,298 participants in the study population, the mean (SD) age was 53.77 years (14.41) and 13.99% were T2DM (N = 3680). All ML models had AUC values above 0.70, with LightGBM performing best (AUC 0.723, Accuracy 0.659, Recall 0.709, Precision 0.641). SHAP analysis showed that older age and higher Body Mass Index (BMI) significantly increases diabetes risk, along with hypertension, poor self-rated health, and dyslipidemia. Conclusion: The predictive model, combined with SDoH data, provides a non-invasive, efficient, and low-cost tool for T2DM prediction, targeting China’s large undiagnosed diabetic population. Key factors influencing the model include older age, higher BMI, hypertension, dyslipidemia, and urban residency, which are critical T2DM risk factors. This model supports early detection and targeted interventions, helping to reduce healthcare burdens in resource-limited settings.

References

1. American Diabetes Association Professional Practice Committee. 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2022. Diabetes Care. 2022; 45: S17–S38. doi: 10.2337/dc2 2-S002

2. GBD 2017 Population and Fertility Collaborators. Population and fertility by age and sex for 195 countries and territories, 1950–2017: A systematic analysis for the Global Burden of Disease Study 2017. The Lancet. 2018; 392: 1995–2051. doi: 10.1016/S0140-6736(18)32278-5

3. Farrell K, Westlund H. China’s rapid urban ascent: An examination into the components of urban growth. Asian Geographer. 2018; 35: 85–106. doi: 10.1080/10225706.2018.1476256

4. Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Research and Clinical Practice. 2022; 183: 109119.

5. Ma RCW. Epidemiology of diabetes and diabetic complications in China. Diabetologia. 2018; 61: 1249–1260. doi: 10.1007/s00125-018-4557-7

6. Saeedi P, Petersohn I, Salpea P, et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Res. Clin. Pract. 2019; 157: 107843. doi: 10.1016/j.diabres.2019.107843

7. Avogaro A, Fadini GP. Microvascular complications in diabetes: A growing concern for cardiologists. International Journal of Cardiology. 2019; 291: 29–35. doi: 10.1016/j.ijcard.2019.02.030

8. Barber SR, Davies MJ, Khunti K, et al. Risk assessment tools for detecting those with pre-diabetes: A systematic review. Diabetes Res. Clin. Pract. 2014; 105: 1–13. doi: 10.1016/j.diabres.2014.03.007

9. Braveman P, Egerter S, Williams DR. The Social Determinants of Health: Coming of Age. Annu. Rev. Public Health. 2011; 32: 381–398. doi: 10.1146/annurev-publhealth-031210-101218

10. Preda A, Voigt K. The Social Determinants of Health: Why Should We Care? The American Journal of Bioethics. 2015; 15: 25–36. doi: 10.1080/15265161.2014.998374

11. Duncan GJ, Daly MC, McDonough P, et al. Optimal Indicators of Socioeconomic Status for Health Research. Am. J. Public Health. 2002; 92: 1151–1157. doi: 10.2105/AJPH.92.7.1151

12. Viner RM, Ozer EM, Denny S, et al. Adolescence and the social determinants of health. The Lancet. 2012; 379: 1641–1652. doi: 10.1016/S0140-6736(12)60149-4

13. Pan XR, Yang WY, Li GW, et al. Prevalence of Diabetes and Its Risk Factors in China, 1994. Diabetes Care. 1997; 20: 1664–1669. doi: 10.2337/diacare.20.11.1664

14. Tao X, Li J, Zhu X, et al. Association between socioeconomic status and metabolic control and diabetes complications: A cross-sectional nationwide study in Chinese adults with type 2 diabetes mellitus. Cardiovasc Diabetol. 2016; 15: 61. doi: 10.1186/s12933-016-0376-7

15. Zhang H, Xu W, Dahl AK, et al. Relation of socio‐economic status to impaired fasting glucose and Type 2 diabetes: Findings based on a large population‐based cross‐sectional study in Tianjin, China. Diabet. Med. 2013; 30. doi: 10.1111/dme.12156

16. Zhang Y, Wang Y, Zhang S, et al. Complex Association Among Diet Styles, Sleep Patterns, and Obesity in Patients with Diabetes. Diabetes Metab. Syndr. Obes. 2023; 16: 749–767. doi: 10.2147/DMSO.S390101

17. Li Y, Wang DD, Ley SH, et al. Time Trends of Dietary and Lifestyle Factors and Their Potential Impact on Diabetes Burden in China. Diabetes Care. 2017; 40: 1685–1694. doi: 10.2337/dc17-0571

18. Chen X, Wu Z, Chen Y, et al. Risk score model of type 2 diabetes prediction for rural Chinese adults: The Rural Deqing Cohort Study. J. Endocrinol. Invest. 2017; 40: 1115–1123. doi: 10.1007/s40618-017-0680-4

19. Shao X, Wang Y, Huang S, et al. Development and validation of a prediction model estimating the 10-year risk for type 2 diabetes in China. PLoS ONE. 2020; 15: e0237936. doi: 10.1371/journal.pone.0237936

20. Kish L. Sampling Organizations and Groups of Unequal Sizes. Am. Sociol. Rev. 1965; 30: 564. doi: 10.2307/2091346

21. Yu W, Li X, Zhong W, et al. Rural-urban disparities in the associations of residential greenness with diabetes and prediabetes among adults in southeastern China. Science of the Total Environment. 2023; 860: 160492. doi: 10.1016/j.scitotenv.2022.160492

22. Huang S, Lin X, Yin P, et al. Assessment of disability weights at the provincial and city levels based on 93,254 respondents in Fujian, China: Findings from the Fujian disability weight measurement study. Chinese Medical Journal. 2024; 137: 1375–1377. doi: 10.1097/CM9.0000000000002812

23. Xie XX, Zhou WM, Lin F, et al. Ischemic heart disease deaths, disability-adjusted life years and risk factors in Fujian, China during 1990–2013: Data from the Global Burden of Disease Study 2013. International Journal of Cardiology. 2016; 214: 265–269. doi: 10.1016/j.ijcard.2016.03.236

24. Hu X, Fang X, Wu M. Prevalence, awareness, treatment and control of type 2 diabetes in southeast China: A population-based study. J. of Diabetes Invest. 2024; 15(8): 1034-1041. doi: 10.1111/jdi.14213

25. Meng L, Zhao D, Pan Y, et al. Validation of Omron HBP-1300 professional blood pressure monitor based on auscultation in children and adults. BMC Cardiovasc. Disord. 2016; 16: 9. doi: 10.1186/s12872-015-0177-z

26. Zhou X, Pang Z, Gao W, et al. Performance of an A1C and Fasting Capillary Blood Glucose Test for Screening Newly Diagnosed Diabetes and Pre-Diabetes Defined by an Oral Glucose Tolerance Test in Qingdao, China. Diabetes Care. 2010; 33: 545–550. doi: 10.2337/dc09-1410

27. Bartoli E, Fra GP, Schianca GPC. The oral glucose tolerance test (OGTT) revisited. Eur. J. Intern. Med. 2011; 22: 8–12. doi: 10.1016/j.ejim.2010.07.008

28. Bennett DA. How can I deal with missing data in my study? Australian and New Zealand Journal of Public Health. 2001; 25: 464–469. doi: 10.1111/j.1467-842X.2001.tb00294.x

29. Chen X, He L, Shi K, et al. Interpretable Machine Learning for Fall Prediction Among Older Adults in China. American Journal of Preventive Medicine. 2023; 65: 579–586. doi: 10.1016/j.amepre.2023.04.006

30. American Diabetes Association. Diagnosis and Classification of Diabetes Mellitus. Diabetes Care. 2014; 37: S81–S90. doi: 10.2337/dc14-S081

31. Zhang N, Du SM, Ma GS. Current lifestyle factors that increase risk of T2DM in China. Eur. J. Clin. Nutr. 2017; 71: 832–838. doi: 10.1038/ejcn.2017.41

32. Dendup T, Feng X, Clingan S, et al. Environmental Risk Factors for Developing Type 2 Diabetes Mellitus: A Systematic Review. Int. J. Environ. Res. Public Health. 2018; 15: 78. doi: 10.3390/ijerph15010078

33. Lin L, Hu X, Liu X, et al. Key influences on dysglycemia across Fujian's urban-rural divide. PLoS One. 2024 Jul 31;19(7): e0308073. doi: 10.1371/journal.pone.0308073.

34. Wu Y, Ding Y, Tanaka Y, et al. Risk Factors Contributing to Type 2 Diabetes and Recent Advances in the Treatment and Prevention. Int. J. Med. Sci. 2014; 11: 1185–1200. doi: 10.7150/ijms.10001

35. Wei J, Liu X, Xue H, et al. Comparisons of Visceral Adiposity Index, Body Shape Index, Body Mass Index and Waist Circumference and Their Associations with Diabetes Mellitus in Adults. Nutrients. 2019; 11: 1580. doi: 10.3390/nu11071580

36. Lee PH, Macfarlane DJ, Lam T, et al. Validity of the international physical activity questionnaire short form (IPAQ-SF): A systematic review. Int. J. Behav. Nutr. Phy. 2011; 8: 115. doi: 10.1186/1479-5868-8-115

37. Collins GS, Reitsma JB, Altman DG, et al. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement. Circulation. 2015; 131: 211–219. doi: 10.1161/CIRCULATIONAHA.114.014508

38. Sperandei S. Understanding logistic regression analysis. Biochem. Med. 2014; 12–18. doi: 10.11613/BM.2014.003

39. Noble WS. What is a support vector machine? Nat. Biotechnol. 2006; 24: 1565–1567. doi: 10.1038/nbt1206-1565

40. Breiman L. Random Forests. Mach Learn. 2001; 45: 5–32. doi: 10.1023/A:1010933404324

41. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 13–17 August 2016; San Francisco, CA, USA. pp. 785–794.

42. Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Guyon I, Luxburg UV, Bengio S, et al. (editors). Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2017.

43. Liu FT, Ting KM, Zhou ZH. Isolation Forest. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining; 15–19 December 2008; Pisa, Italy. pp. 413–422.

44. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning Research. 2011; 12: 2825–2830.

45. McAndrew T, Codi A, Cambeiro J, et al. Chimeric forecasting: Combining probabilistic predictions from computational models and human judgment. BMC Infect. Dis. 2022; 22: 833. doi: 10.1186/s12879-022-07794-5

46. Younus S, Rönnstrand L, Kazi JU. Xputer: Bridging data gaps with NMF, XGBoost, and a streamlined GUI experience. Front. Artif. Intell. 2024; 7: 1345179. doi: 10.3389/frai.2024.1345179

47. Han H, Wang WY, Mao BH. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang DS, Zhang XP, Huang GB (editors). Advances in Intelligent Computing. Springer; 2005. pp. 878–887.

48. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012; 13: 281–305.

49. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on Artificial intelligence; 20–25 August 1995; Montreal, Canada. pp. 1137–1145.

50. Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015; 349: 255–260. doi: 10.1126/science.aaa8415

51. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017; 30.

52. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020; 2: 56–67. doi: 10.1038/s42256-019-0138-9

53. Pham TD. Classification of COVID-19 chest X-rays with deep learning: New models or fine tuning? Health Inf. Sci. Syst. 2021; 9: 2. doi: 10.1007/s13755-020-00135-3

54. Khanagar SB, Alkadi L, Alghilan MA, et al. Application and Performance of Artificial Intelligence (AI) in Oral Cancer Diagnosis and Prediction Using Histopathological Images: A Systematic Review. Biomedicines. 2023; 11: 1612. doi: 10.3390/biomedicines11061612

55. Patra BG, Sharma MM, Vekaria V, et al. Extracting social determinants of health from electronic health records using natural language processing: A systematic review. Journal of the American Medical Informatics Association. 2021; 28: 2716–2727. doi: 10.1093/jamia/ocab170

56. Segar MW, Hall JL, Jhund PS, et al. Machine Learning–Based Models Incorporating Social Determinants of Health vs Traditional Models for Predicting In-Hospital Mortality in Patients with Heart Failure. JAMA Cardiol. 2022; 7: 844. doi: 10.1001/jamacardio.2022.1900

57. Chen M, Tan X, Padman R. Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review. Journal of the American Medical Informatics Association. 2020; 27: 1764–1773. doi: 10.1093/jamia/ocaa143

58. Zhang P, Engelgau MM, Valdez R, et al. Costs of Screening for Pre-diabetes Among U.S. Adults. Diabetes Care. 2003; 26: 2536–2542. doi: 10.2337/diacare.26.9.2536

59. Icks A, Haastert B, Gandjour A, et al. Cost-Effectiveness Analysis of Different Screening Procedures for Type 2 Diabetes. Diabetes Care. 2004; 27: 2120–2128. doi: 10.2337/diacare.27.9.2120

60. Hao J, Yao Q, Lin Y, et al. Cost-effectiveness of two screening strategies based on Chinese diabetes risk score for pre-diabetes in China. Front. Public. Health. 2022; 10: 1018084. doi: 10.3389/fpubh.2022.1018084

61. Zhang L, Wang Y, Niu M, et al. Nonlaboratory-Based Risk Assessment Model for Type 2 Diabetes Mellitus Screening in Chinese Rural Population: A Joint Bagging-Boosting Model. IEEE J Biomed Health Inform. 2021; 25: 4005–4016. doi: 10.1109/JBHI.2021.3077114

62. Liu Q, Zhang M, He Y, et al. Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques. J. Pers. Med. 2022;12: 905. doi: 10.3390/jpm12060905

63. Wang A, Chen G, Su Z, et al. Risk scores for predicting incidence of type 2 diabetes in the Chinese population: The Kailuan prospective study. Sci. Rep. 2016; 6: 26548. doi: 10.1038/srep26548

64. Xiong XL, Zhang RX, Bi Y, et al. Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults. Curr. Med. Sci. 2019; 39: 582–588. doi: 10.1007/s11596-019-2077-4

65. Zhang L, Wang Y, Niu M, et al. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: The Henan Rural Cohort Study. Sci. Rep. 2020; 10: 4406. doi: 10.1038/s41598-020-61123-x

66. Wang Y, Zhang L, Niu M, et al. Genetic Risk Score Increased Discriminant Efficiency of Predictive Models for Type 2 Diabetes Mellitus Using Machine Learning: Cohort Study. Front. Public. Health. 2021; 9: 606711. doi: 10.3389/fpubh.2021.606711

67. Wang H, Liu T, Qiu Q, et al. Development and validation of a simple risk score for prevalent undiagnosed type 2 diabetes in Southern Chinese population. Int. J. Diabetes Dev. Ctries. 2015; 35: 318–326. doi: 10.1007/s13410-014-0285-9

68. Awa WL, Fach E, Krakow D, et al. Type 2 diabetes from pediatric to geriatric age: analysis of gender and obesity among 120 183 patients from the German/Austrian DPV database. European Journal of Endocrinology. 2012; 167(2): 245-254. doi: 10.1530/eje-12-0143

69. Seiglie JA, Marcus ME, Ebert C, et al. Diabetes Prevalence and Its Relationship With Education, Wealth, and BMI in 29 Low- and Middle-Income Countries. Diabetes Care. 2020; 43(4): 767-775. doi: 10.2337/dc19-1782

70. Kivimäki M, Virtanen M, Kawachi I, et al. Long working hours, socioeconomic status, and the risk of incident type 2 diabetes: a meta-analysis of published and unpublished data from 222 120 individuals. The Lancet Diabetes & Endocrinology. 2015; 3(1): 27-34. doi: 10.1016/S2213-8587(14)70178-0

71. Wang S, Ma W, Yuan Z, et al. Association between obesity indices and type 2 diabetes mellitus among middle-aged and elderly people in Jinan, China: a cross-sectional study. BMJ Open. 2016; 6(11): e012742. doi: 10.1136/bmjopen-2016-012742

72. Beulens JWJ, Pinho MGM, Abreu TC, et al. Environmental risk factors of type 2 diabetes—an exposome approach. Diabetologia. 2021; 65(2): 263-274. doi: 10.1007/s00125-021-05618-w

73. Gassasse Z, Smith D, Finer S, et al. Association between urbanisation and type 2 diabetes: an ecological study. BMJ Global Health. 2017; 2(4): e000473. doi: 10.1136/bmjgh-2017-000473

74. Karimi MA, Binaei S, Hashemi SH, et al. Marital status and risk of type 2 diabetes among middle-aged and elderly population: a systematic review and meta-analysis. Frontiers in Medicine. 2025; 11. doi: 10.3389/fmed.2024.1485490

75. Cornelis MC, Chiuve SE, Glymour MM, et al. Bachelors, Divorcees, and Widowers: Does Marriage Protect Men from Type 2 Diabetes? Sen U, ed. PLoS ONE. 2014; 9(9): e106720. doi: 10.1371/journal.pone.0106720

76. Knutson KL. Role of Sleep Duration and Quality in the Risk and Severity of Type 2 Diabetes Mellitus. Archives of Internal Medicine. 2006; 166(16): 1768. doi: 10.1001/archinte.166.16.1768

77. Shan Z, Ma H, Xie M, et al. Sleep Duration and Risk of Type 2 Diabetes: A Meta-analysis of Prospective Studies. Diabetes Care. 2015; 38(3): 529-537. doi: 10.2337/dc14-2073

78. Joseph JJ, Echouffo-Tcheugui JB, Golden SH, et al. Physical activity, sedentary behaviors and the incidence of type 2 diabetes mellitus: the Multi-Ethnic Study of Atherosclerosis (MESA). BMJ Open Diabetes Research & Care. 2016; 4(1): e000185. doi: 10.1136/bmjdrc-2015-000185

79. Deng MG, Cui HT, Lan YB, et al. Physical activity, sedentary behavior, and the risk of type 2 diabetes: A two-sample Mendelian Randomization analysis in the European population. Frontiers in Endocrinology. 2022; 13. doi: 10.3389/fendo.2022.964132

80. Noh JW, Chang Y, Park M, et al. Self-rated health and the risk of incident type 2 diabetes mellitus: A cohort study. Scientific Reports. 2019; 9(1). doi: 10.1038/s41598-019-40090-y

81. Hayes AJ, Clarke PM, Glasziou PG, et al. Can Self-Rated Health Scores Be Used for Risk Prediction in Patients With Type 2 Diabetes? Diabetes Care. 2008; 31(4): 795-797. doi: 10.2337/dc07-1391

Published
2025-02-27
How to Cite
Hu, G., Lin, L., Hu, X., Zheng, Y., Liu, X., Xu, Z., He, Y., & Zhang, Y. (2025). Machine learning-based diagnosis of Type 2 Diabetes Mellitus using Social Determinants of Health. Molecular & Cellular Biomechanics, 22(3), 1461. https://doi.org/10.62617/mcb1461
Section
Article