Automatic online news monitoring and classification for syndromic surveillance

Yulei Zhang, Yan Dang, Hsinchun Chen, Mark Thurmond, Cathy Larson

Research output: Contribution to journalArticle

39 Citations (Scopus)

Abstract

Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments, we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases, and Named Entities features outperformed the Bag of Words feature subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset.

Original languageEnglish (US)
Pages (from-to)508-517
Number of pages10
JournalDecision Support Systems
Volume47
Issue number4
DOIs
StatePublished - Nov 1 2009

Fingerprint

Disease Outbreaks
Monitoring
Public health
Communicable Diseases
Set theory
Health Personnel
Learning algorithms
Learning systems
Feature extraction
Public Health
News
Surveillance
Personnel
Infectious Diseases
Infectious diseases
Experiments
Bag
Economy
Machine Learning
Experiment

Keywords

  • Feature selection
  • News classification
  • News monitoring
  • Syndromic surveillance

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Information Systems and Management

Cite this

Automatic online news monitoring and classification for syndromic surveillance. / Zhang, Yulei; Dang, Yan; Chen, Hsinchun; Thurmond, Mark; Larson, Cathy.

In: Decision Support Systems, Vol. 47, No. 4, 01.11.2009, p. 508-517.

Research output: Contribution to journalArticle

Zhang, Yulei ; Dang, Yan ; Chen, Hsinchun ; Thurmond, Mark ; Larson, Cathy. / Automatic online news monitoring and classification for syndromic surveillance. In: Decision Support Systems. 2009 ; Vol. 47, No. 4. pp. 508-517.
@article{bdb140eb62d14904a945c0bca298e3c0,
title = "Automatic online news monitoring and classification for syndromic surveillance",
abstract = "Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments, we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases, and Named Entities features outperformed the Bag of Words feature subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset.",
keywords = "Feature selection, News classification, News monitoring, Syndromic surveillance",
author = "Yulei Zhang and Yan Dang and Hsinchun Chen and Mark Thurmond and Cathy Larson",
year = "2009",
month = "11",
day = "1",
doi = "10.1016/j.dss.2009.04.016",
language = "English (US)",
volume = "47",
pages = "508--517",
journal = "Decision Support Systems",
issn = "0167-9236",
publisher = "Elsevier",
number = "4",

}

TY - JOUR

T1 - Automatic online news monitoring and classification for syndromic surveillance

AU - Zhang, Yulei

AU - Dang, Yan

AU - Chen, Hsinchun

AU - Thurmond, Mark

AU - Larson, Cathy

PY - 2009/11/1

Y1 - 2009/11/1

N2 - Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments, we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases, and Named Entities features outperformed the Bag of Words feature subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset.

AB - Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments, we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases, and Named Entities features outperformed the Bag of Words feature subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset.

KW - Feature selection

KW - News classification

KW - News monitoring

KW - Syndromic surveillance

UR - http://www.scopus.com/inward/record.url?scp=70350564445&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350564445&partnerID=8YFLogxK

U2 - 10.1016/j.dss.2009.04.016

DO - 10.1016/j.dss.2009.04.016

M3 - Article

VL - 47

SP - 508

EP - 517

JO - Decision Support Systems

JF - Decision Support Systems

SN - 0167-9236

IS - 4

ER -