Using the UMLS and Simple Statistical Methods to Semantically Categorize Causes of Death on Death Certificates

Bill Riedl, Nhan Than, Michael Hogarth

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Cause of death data is an invaluable resource for shaping our understanding of population health. Mortality statistics is one of the principal sources of health information and in many countries the most reliable source of health data. 1 A quick classification process for this data can significantly improve public health efforts. Currently, cause of death data is captured in unstructured form requiring months to process. We think this process can be automated, at least partially, using simple statistical Natural Language Processing, NLP, techniques and the Unified Medical Language System, UMLS, as a vocabulary resource. A system, Medical Match Master, MMM, was built to exercise this theory. We evaluate this simple NLP approach in the classification of causes of death. This technique performed well if we engaged the use of a large biomedical vocabulary and applied certain syntactic maneuvers made possible by textual relationships within the vocabulary.

Original languageEnglish (US)
Pages (from-to)677-681
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume2010
StatePublished - 2010

Fingerprint

Unified Medical Language System
Death Certificates
Vocabulary
Cause of Death
Health
Natural Language Processing
Information Storage and Retrieval
Public Health
Mortality
Population

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Using the UMLS and Simple Statistical Methods to Semantically Categorize Causes of Death on Death Certificates. / Riedl, Bill; Than, Nhan; Hogarth, Michael.

In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, Vol. 2010, 2010, p. 677-681.

Research output: Contribution to journalArticle

@article{de4bcaf5c69d486a87e774cf1ecd17b2,
title = "Using the UMLS and Simple Statistical Methods to Semantically Categorize Causes of Death on Death Certificates",
abstract = "Cause of death data is an invaluable resource for shaping our understanding of population health. Mortality statistics is one of the principal sources of health information and in many countries the most reliable source of health data. 1 A quick classification process for this data can significantly improve public health efforts. Currently, cause of death data is captured in unstructured form requiring months to process. We think this process can be automated, at least partially, using simple statistical Natural Language Processing, NLP, techniques and the Unified Medical Language System, UMLS, as a vocabulary resource. A system, Medical Match Master, MMM, was built to exercise this theory. We evaluate this simple NLP approach in the classification of causes of death. This technique performed well if we engaged the use of a large biomedical vocabulary and applied certain syntactic maneuvers made possible by textual relationships within the vocabulary.",
author = "Bill Riedl and Nhan Than and Michael Hogarth",
year = "2010",
language = "English (US)",
volume = "2010",
pages = "677--681",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - Using the UMLS and Simple Statistical Methods to Semantically Categorize Causes of Death on Death Certificates

AU - Riedl, Bill

AU - Than, Nhan

AU - Hogarth, Michael

PY - 2010

Y1 - 2010

N2 - Cause of death data is an invaluable resource for shaping our understanding of population health. Mortality statistics is one of the principal sources of health information and in many countries the most reliable source of health data. 1 A quick classification process for this data can significantly improve public health efforts. Currently, cause of death data is captured in unstructured form requiring months to process. We think this process can be automated, at least partially, using simple statistical Natural Language Processing, NLP, techniques and the Unified Medical Language System, UMLS, as a vocabulary resource. A system, Medical Match Master, MMM, was built to exercise this theory. We evaluate this simple NLP approach in the classification of causes of death. This technique performed well if we engaged the use of a large biomedical vocabulary and applied certain syntactic maneuvers made possible by textual relationships within the vocabulary.

AB - Cause of death data is an invaluable resource for shaping our understanding of population health. Mortality statistics is one of the principal sources of health information and in many countries the most reliable source of health data. 1 A quick classification process for this data can significantly improve public health efforts. Currently, cause of death data is captured in unstructured form requiring months to process. We think this process can be automated, at least partially, using simple statistical Natural Language Processing, NLP, techniques and the Unified Medical Language System, UMLS, as a vocabulary resource. A system, Medical Match Master, MMM, was built to exercise this theory. We evaluate this simple NLP approach in the classification of causes of death. This technique performed well if we engaged the use of a large biomedical vocabulary and applied certain syntactic maneuvers made possible by textual relationships within the vocabulary.

UR - http://www.scopus.com/inward/record.url?scp=84902266281&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902266281&partnerID=8YFLogxK

M3 - Article

VL - 2010

SP - 677

EP - 681

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -