Discriminant models for high-throughput proteomics mass spectrometer data

Parul V. Purohit, David M Rocke

Research output: Contribution to journalArticlepeer-review

42 Scopus citations


We use several different multivariate analysis methods to discriminate between diseased and healthy patients using protein mass spectrometer data provided by Duke University. Two problems were presented by the university; one in which the responses (diseased or healthy) of the patients were not known and second, when the responses were known. In the latter case, the data can be used as a 'training' set. We attempted both problems. In particular, we use principle component analysis along with clustering methods to discriminate for the first problem set and partial least squares coupled with logistic and discriminant methods when the responses were known. In addition, we were able to detect regions of interest in the spectrum where there were differences in the protein patterns between healthy and diseased patients. There was considerable effort involved in the preprocessing of the data. We used a binning approach to reduce the number of variables rather than peak heights or peak areas. We performed a square root transformation on the data to help stabilize the variance; this in turn made a significant improvement in clustering results.

Original languageEnglish (US)
Pages (from-to)1699-1703
Number of pages5
Issue number9
StatePublished - Sep 1 2003


  • Discriminant
  • Mass spectrometry
  • Multivariate

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics


Dive into the research topics of 'Discriminant models for high-throughput proteomics mass spectrometer data'. Together they form a unique fingerprint.

Cite this