Prediction of tuberculosis using an automated machine learning platform for models trained on synthetic data

Hooman Rashidi, Imran Khan, Luke Dang, Samer Albahra, Ujjwal Ratan, Nihir Chadderwala, Wilson To, Prathima Srinivas, Jeffery Wajda, Nam Tran

Research output: Contribution to journalArticlepeer-review


High-quality medical data is critical to the development and implementation of machine learning (ML) algorithms in healthcare; however, security, and privacy concerns continue to limit access. We sought to determine the utility of 'synthetic data' in training ML algorithms for the detection of tuberculosis (TB) from inflammatory biomarker profiles. A retrospective dataset (A) comprised of 278 patients was used to generate synthetic datasets (B, C, and D) for training models prior to secondary validation on a generalization dataset. ML models trained and validated on the Dataset A (real) demonstrated an accuracy of 90%, a sensitivity of 89% (95% CI, 83-94%), and a specificity of 100% (95% CI, 81-100%). Models trained using the optimal synthetic dataset B showed an accuracy of 91%, a sensitivity of 93% (95% CI, 87-96%), and a specificity of 77% (95% CI, 50-93%). Synthetic datasets C and D displayed diminished performance measures (respective accuracies of 71% and 54%). This pilot study highlights the promise of synthetic data as an expedited means for ML algorithm development.

Original languageEnglish (US)
Pages (from-to)10
Number of pages1
JournalJournal of Pathology Informatics
Issue number1
StatePublished - 2022


  • Artificial intelligence
  • biomarkers
  • data accessibility
  • electronic medical record
  • privacy
  • simulation

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Health Informatics
  • Computer Science Applications


Dive into the research topics of 'Prediction of tuberculosis using an automated machine learning platform for models trained on synthetic data'. Together they form a unique fingerprint.

Cite this