Performance of automated and manual coding systems for occupational data

A case study of historical records

Mehul D. Patel, Kathryn M. Rose, Cindy R. Owens, Heejung Bang, Jay S. Kaufman

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background: Occupational data are a common source of workplace exposure and socioeconomic information in epidemiologic research. We compared the performance of two occupation coding methods, an automated software and a manual coder, using occupation and industry titles from U.S. historical records. Methods: We collected parental occupational data from 1920-40s birth certificates, Census records, and city directories on 3,135 deceased individuals in the Atherosclerosis Risk in Communities (ARIC) study. Unique occupation-industry narratives were assigned codes by a manual coder and the Standardized Occupation and Industry Coding software program. We calculated agreement between coding methods of classification into major Census occupational groups. Results: Automated coding software assigned codes to 71% of occupations and 76% of industries. Of this subset coded by software, 73% of occupation codes and 69% of industry codes matched between automated and manual coding. For major occupational groups, agreement improved to 89% (kappa=0.86). Conclusions: Automated occupational coding is a cost-efficient alternative to manual coding. However, some manual coding is required to code incomplete information. We found substantial variability between coders in the assignment of occupations although not as large for major groups.

Original languageEnglish (US)
Pages (from-to)228-231
Number of pages4
JournalAmerican Journal of Industrial Medicine
Volume55
Issue number3
DOIs
StatePublished - Mar 2012

Fingerprint

Occupations
Information Systems
Industry
Software
Occupational Groups
Censuses
Birth Certificates
Directories
Workplace
Atherosclerosis
Costs and Cost Analysis
Research

Keywords

  • Automatic data processing
  • Computer systems
  • Occupation classification
  • Occupational coding
  • Social class

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health

Cite this

Performance of automated and manual coding systems for occupational data : A case study of historical records. / Patel, Mehul D.; Rose, Kathryn M.; Owens, Cindy R.; Bang, Heejung; Kaufman, Jay S.

In: American Journal of Industrial Medicine, Vol. 55, No. 3, 03.2012, p. 228-231.

Research output: Contribution to journalArticle

Patel, Mehul D. ; Rose, Kathryn M. ; Owens, Cindy R. ; Bang, Heejung ; Kaufman, Jay S. / Performance of automated and manual coding systems for occupational data : A case study of historical records. In: American Journal of Industrial Medicine. 2012 ; Vol. 55, No. 3. pp. 228-231.
@article{c40911f481744679b6c2a94447c35578,
title = "Performance of automated and manual coding systems for occupational data: A case study of historical records",
abstract = "Background: Occupational data are a common source of workplace exposure and socioeconomic information in epidemiologic research. We compared the performance of two occupation coding methods, an automated software and a manual coder, using occupation and industry titles from U.S. historical records. Methods: We collected parental occupational data from 1920-40s birth certificates, Census records, and city directories on 3,135 deceased individuals in the Atherosclerosis Risk in Communities (ARIC) study. Unique occupation-industry narratives were assigned codes by a manual coder and the Standardized Occupation and Industry Coding software program. We calculated agreement between coding methods of classification into major Census occupational groups. Results: Automated coding software assigned codes to 71{\%} of occupations and 76{\%} of industries. Of this subset coded by software, 73{\%} of occupation codes and 69{\%} of industry codes matched between automated and manual coding. For major occupational groups, agreement improved to 89{\%} (kappa=0.86). Conclusions: Automated occupational coding is a cost-efficient alternative to manual coding. However, some manual coding is required to code incomplete information. We found substantial variability between coders in the assignment of occupations although not as large for major groups.",
keywords = "Automatic data processing, Computer systems, Occupation classification, Occupational coding, Social class",
author = "Patel, {Mehul D.} and Rose, {Kathryn M.} and Owens, {Cindy R.} and Heejung Bang and Kaufman, {Jay S.}",
year = "2012",
month = "3",
doi = "10.1002/ajim.22005",
language = "English (US)",
volume = "55",
pages = "228--231",
journal = "American Journal of Industrial Medicine",
issn = "0271-3586",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Performance of automated and manual coding systems for occupational data

T2 - A case study of historical records

AU - Patel, Mehul D.

AU - Rose, Kathryn M.

AU - Owens, Cindy R.

AU - Bang, Heejung

AU - Kaufman, Jay S.

PY - 2012/3

Y1 - 2012/3

N2 - Background: Occupational data are a common source of workplace exposure and socioeconomic information in epidemiologic research. We compared the performance of two occupation coding methods, an automated software and a manual coder, using occupation and industry titles from U.S. historical records. Methods: We collected parental occupational data from 1920-40s birth certificates, Census records, and city directories on 3,135 deceased individuals in the Atherosclerosis Risk in Communities (ARIC) study. Unique occupation-industry narratives were assigned codes by a manual coder and the Standardized Occupation and Industry Coding software program. We calculated agreement between coding methods of classification into major Census occupational groups. Results: Automated coding software assigned codes to 71% of occupations and 76% of industries. Of this subset coded by software, 73% of occupation codes and 69% of industry codes matched between automated and manual coding. For major occupational groups, agreement improved to 89% (kappa=0.86). Conclusions: Automated occupational coding is a cost-efficient alternative to manual coding. However, some manual coding is required to code incomplete information. We found substantial variability between coders in the assignment of occupations although not as large for major groups.

AB - Background: Occupational data are a common source of workplace exposure and socioeconomic information in epidemiologic research. We compared the performance of two occupation coding methods, an automated software and a manual coder, using occupation and industry titles from U.S. historical records. Methods: We collected parental occupational data from 1920-40s birth certificates, Census records, and city directories on 3,135 deceased individuals in the Atherosclerosis Risk in Communities (ARIC) study. Unique occupation-industry narratives were assigned codes by a manual coder and the Standardized Occupation and Industry Coding software program. We calculated agreement between coding methods of classification into major Census occupational groups. Results: Automated coding software assigned codes to 71% of occupations and 76% of industries. Of this subset coded by software, 73% of occupation codes and 69% of industry codes matched between automated and manual coding. For major occupational groups, agreement improved to 89% (kappa=0.86). Conclusions: Automated occupational coding is a cost-efficient alternative to manual coding. However, some manual coding is required to code incomplete information. We found substantial variability between coders in the assignment of occupations although not as large for major groups.

KW - Automatic data processing

KW - Computer systems

KW - Occupation classification

KW - Occupational coding

KW - Social class

UR - http://www.scopus.com/inward/record.url?scp=84856950657&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84856950657&partnerID=8YFLogxK

U2 - 10.1002/ajim.22005

DO - 10.1002/ajim.22005

M3 - Article

VL - 55

SP - 228

EP - 231

JO - American Journal of Industrial Medicine

JF - American Journal of Industrial Medicine

SN - 0271-3586

IS - 3

ER -