Machine Vision Methods, Natural Language Processing, and Machine Learning Algorithms for Automated Dispersion Plot Analysis and Chemical Identification from Complex Mixtures

Danny Yeap, Paul T. Hichwa, Maneeshin Y. Rajapakse, Daniel J. Peirano, Mitchell M. McCartney, Nicholas Kenyon, Cristina E Davis

Research output: Contribution to journalArticle


Gas-phase trace chemical detection techniques such as ion mobility spectrometry (IMS) and differential mobility spectrometry (DMS) can be used in many settings, such as evaluating the health condition of patients or detecting explosives at airports. These devices separate chemical compounds in a mixture and provide information to identify specific chemical species of interest. Further, these types of devices operate well in both controlled lab environments and in-field applications. Frequently, the commercial versions of these devices are highly tailored for niche applications (e.g., explosives detection) because of the difficulty involved in reconfiguring instrumentation hardware and data analysis software algorithms. In order for researchers to quickly adapt these tools for new purposes and broader panels of chemical targets, it is critical to develop new algorithms and methods for generating libraries of these sensor responses. Microelectromechanical system (MEMS) technology has been used to fabricate DMS devices that miniaturize the platforms for easier deployment; however, concurrent advances in advanced data analytics are lagging. DMS generates complex three-dimensional dispersion plots for both positive and negative ions in a mixture. Although simple spectra of single chemicals are straightforward to interpret (both visually and via algorithms), it is exceedingly challenging to interpret dispersion plots from complex mixtures with many chemical constituents. This study uses image processing and computer vision steps to automatically identify features from DMS dispersion plots. We used the bag-of-words approach adapted from natural language processing and information retrieval to cluster and organize these features. Finally, a support vector machine (SVM) learning algorithm was trained using these features in order to detect and classify specific compounds in these represented conceptualized data outputs. Using this approach, we successfully maintain a high level of correct chemical identification, even when a gas mixture increases in complexity with interfering chemicals present.

Original languageEnglish (US)
Pages (from-to)10501-10508
Number of pages8
JournalAnalytical chemistry
Issue number16
Publication statusPublished - Aug 20 2019


ASJC Scopus subject areas

  • Analytical Chemistry

Cite this