Using the UMLS and Simple Statistical Methods to Semantically Categorize Causes of Death on Death Certificates

Bill Riedl, Nhan Than, Michael Hogarth

Research output: Contribution to journalArticle

10 Scopus citations


Cause of death data is an invaluable resource for shaping our understanding of population health. Mortality statistics is one of the principal sources of health information and in many countries the most reliable source of health data. 1 A quick classification process for this data can significantly improve public health efforts. Currently, cause of death data is captured in unstructured form requiring months to process. We think this process can be automated, at least partially, using simple statistical Natural Language Processing, NLP, techniques and the Unified Medical Language System, UMLS, as a vocabulary resource. A system, Medical Match Master, MMM, was built to exercise this theory. We evaluate this simple NLP approach in the classification of causes of death. This technique performed well if we engaged the use of a large biomedical vocabulary and applied certain syntactic maneuvers made possible by textual relationships within the vocabulary.

Original languageEnglish (US)
Pages (from-to)677-681
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2010


ASJC Scopus subject areas

  • Medicine(all)

Cite this