Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data

Sandra Taylor, Matthew Ponzini, Machelle Wilson, Kyoungmi Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Missing values are common in high-throughput mass spectrometry data. Two strategies are available to address missing values: (i) eliminate or impute the missing values and apply statistical methods that require complete data and (ii) use statistical methods that specifically account for missing values without imputation (imputation-free methods). This study reviews the effect of sample size and percentage of missing values on statistical inference for multiple methods under these two strategies. With increasing missingness, the ability of imputation and imputation-free methods to identify differentially and non-differentially regulated compounds in a two-group comparison study declined. Random forest and k-nearest neighbor imputation combined with a Wilcoxon test performed well in statistical testing for up to 50% missingness with little bias in estimating the effect size. Quantile regression imputation accompanied with a Wilcoxon test also had good statistical testing outcomes but substantially distorted the difference in means between groups. None of the imputation-free methods performed consistently better for statistical testing than imputation methods.

Original languageEnglish (US)
JournalBriefings in Bioinformatics
Volume23
Issue number1
DOIs
StatePublished - Jan 17 2022

Keywords

  • imputation
  • mass spectrometry
  • metabolomics
  • missing data
  • sample size

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data'. Together they form a unique fingerprint.

Cite this