Email Alert | RSS

Journal of Tuberculosis and Lung Disease ›› 2023, Vol. 4 ›› Issue (5): 364-369.doi: 10.19983/j.issn.2096-8493.20230087

• Original Articles • Previous Articles     Next Articles

Random forest algorithm-based study of risk factors for tuberculosis incidence in an elderly mobile population

Ma Jianjun1, Zhang Tiejuan2, Zhao Qinglong2, Yu Shihui1, Mei Yang3()   

  1. 1Clinical Quality Evaluation Institute, Jilin Provincial Tuberculosis Prevention and Treatment Institute, Changchun 130062, China
    2Jilin Provincial Center of Disease Control and Prevention, Changchun 130062, China
    3Chinese Center for Disease Control and Prevention, Beijing 102206, China
  • Received:2023-08-12 Online:2023-10-20 Published:2023-10-16
  • Contact: Mei Yang, Email: meiyang@chinacdc.cn
  • Supported by:
    Jilin Province Health and Wellness Management Model Innovation Project(2020G007)

Abstract:

Objective: To use the machine learning algorithm—random forest to establish a risk model of tuberculosis incidence among elderly mobile population in Jilin Province, so as to provide a reference for the development of prevention and treatment strategies for key populations of tuberculosis. Methods: Using a case-control study with a 1∶1 matching design, 281 tuberculosis patients ≥60 years from the migrant population registered in Jilin Province in 2021 were selected as the case group, and 281 gender-matched healthy non-local household members were selected as the control group, 70% (393 cases) and 30% (169 cases) of the data were randomly selected as the training and test sets, and random forest algorithm was used to model the incidence risk of tuberculosis using R Software Version 4.2.1. Results: The top 5 risk factors for morbidity were history of exposure to tuberculosis patients, change of job, poor personal protection, smoking, and low intake of meat, eggs and milk, the average decline of Gini were 44.344, 29.007, 21.859, 19.703 and 15.242, respectively; the optimal number of trees in the model was 281, and the error rate of out-of-bag data was 6.44%; area under the ROC curve was 0.967; the random forest algorithm was cross-validated using the Caret package 10-fold with a 93.5% correct rate and a Kappa value of 0.870. Conclusion: Elderly mobile population with a history of contact with tuberculosis patients were at highest risk of infection, thus normalized tuberculosis prevention and control should emphasize on isolation of infectious tuberculosis patients and strengthening personal protection and nutritional intake.

Key words: Tuberculosis, Aged, Floating population, Machine learning, Risk factors

CLC Number: