Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department

Paul Walsh, Justin Thornton, Julie Asato, Nicholas Walker, Gary McCoy, Joe Baal, Jed Baal, Nanse Mendoza, Faried Banimahd

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Objectives: To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so. Study Design and Setting: We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial 'gestalt' assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other's assessment. Our primary analysis was graphical. We also calculated Cohen'sκ, Gwet's agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement. Results: We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9-14.6), 99/159 (62%) were boys and 22/159 (14%) were admitted. Overall 118/159 (74%) and 119/159 (75%) were classified as well appearing on initial 'gestalt' impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet's AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of 'not ill appearing' were more reliable than others. Conclusion. The inter-rater reliability of emergency providers' assessment of overall clinical appearance was adequate when described graphically and by Gwet's AC. Different summary statistics yield different results for the same dataset.

Original languageEnglish (US)
Article number651
JournalPeerJ
Volume2014
Issue number1
DOIs
StatePublished - 2014

Fingerprint

toddlers
fever
Hospital Emergency Service
Fever
Statistics
Medicine
statistics
County Hospitals
urban population
Urban Population
rural population
Emergency Medicine
Hospital Departments
observational studies
Rural Population
Health Personnel
health services
Observational Studies
medicine
Emergencies

Keywords

  • Clinical appearance
  • Cohen's kappa
  • Emergency medicine
  • Fever
  • Graphical analysis
  • Gwet's AC
  • Inter-rater agreement
  • Pediatric

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)
  • Neuroscience(all)

Cite this

Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department. / Walsh, Paul; Thornton, Justin; Asato, Julie; Walker, Nicholas; McCoy, Gary; Baal, Joe; Baal, Jed; Mendoza, Nanse; Banimahd, Faried.

In: PeerJ, Vol. 2014, No. 1, 651, 2014.

Research output: Contribution to journalArticle

Walsh, P, Thornton, J, Asato, J, Walker, N, McCoy, G, Baal, J, Baal, J, Mendoza, N & Banimahd, F 2014, 'Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department', PeerJ, vol. 2014, no. 1, 651. https://doi.org/10.7717/peerj.651
Walsh, Paul ; Thornton, Justin ; Asato, Julie ; Walker, Nicholas ; McCoy, Gary ; Baal, Joe ; Baal, Jed ; Mendoza, Nanse ; Banimahd, Faried. / Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department. In: PeerJ. 2014 ; Vol. 2014, No. 1.
@article{0fe84d5e89ec472fa6a9e28856070de3,
title = "Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department",
abstract = "Objectives: To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so. Study Design and Setting: We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial 'gestalt' assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other's assessment. Our primary analysis was graphical. We also calculated Cohen'sκ, Gwet's agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement. Results: We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9-14.6), 99/159 (62{\%}) were boys and 22/159 (14{\%}) were admitted. Overall 118/159 (74{\%}) and 119/159 (75{\%}) were classified as well appearing on initial 'gestalt' impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet's AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of 'not ill appearing' were more reliable than others. Conclusion. The inter-rater reliability of emergency providers' assessment of overall clinical appearance was adequate when described graphically and by Gwet's AC. Different summary statistics yield different results for the same dataset.",
keywords = "Clinical appearance, Cohen's kappa, Emergency medicine, Fever, Graphical analysis, Gwet's AC, Inter-rater agreement, Pediatric",
author = "Paul Walsh and Justin Thornton and Julie Asato and Nicholas Walker and Gary McCoy and Joe Baal and Jed Baal and Nanse Mendoza and Faried Banimahd",
year = "2014",
doi = "10.7717/peerj.651",
language = "English (US)",
volume = "2014",
journal = "PeerJ",
issn = "2167-8359",
publisher = "PeerJ",
number = "1",

}

TY - JOUR

T1 - Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department

AU - Walsh, Paul

AU - Thornton, Justin

AU - Asato, Julie

AU - Walker, Nicholas

AU - McCoy, Gary

AU - Baal, Joe

AU - Baal, Jed

AU - Mendoza, Nanse

AU - Banimahd, Faried

PY - 2014

Y1 - 2014

N2 - Objectives: To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so. Study Design and Setting: We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial 'gestalt' assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other's assessment. Our primary analysis was graphical. We also calculated Cohen'sκ, Gwet's agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement. Results: We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9-14.6), 99/159 (62%) were boys and 22/159 (14%) were admitted. Overall 118/159 (74%) and 119/159 (75%) were classified as well appearing on initial 'gestalt' impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet's AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of 'not ill appearing' were more reliable than others. Conclusion. The inter-rater reliability of emergency providers' assessment of overall clinical appearance was adequate when described graphically and by Gwet's AC. Different summary statistics yield different results for the same dataset.

AB - Objectives: To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so. Study Design and Setting: We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial 'gestalt' assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other's assessment. Our primary analysis was graphical. We also calculated Cohen'sκ, Gwet's agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement. Results: We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9-14.6), 99/159 (62%) were boys and 22/159 (14%) were admitted. Overall 118/159 (74%) and 119/159 (75%) were classified as well appearing on initial 'gestalt' impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet's AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of 'not ill appearing' were more reliable than others. Conclusion. The inter-rater reliability of emergency providers' assessment of overall clinical appearance was adequate when described graphically and by Gwet's AC. Different summary statistics yield different results for the same dataset.

KW - Clinical appearance

KW - Cohen's kappa

KW - Emergency medicine

KW - Fever

KW - Graphical analysis

KW - Gwet's AC

KW - Inter-rater agreement

KW - Pediatric

UR - http://www.scopus.com/inward/record.url?scp=84911360390&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911360390&partnerID=8YFLogxK

U2 - 10.7717/peerj.651

DO - 10.7717/peerj.651

M3 - Article

VL - 2014

JO - PeerJ

JF - PeerJ

SN - 2167-8359

IS - 1

M1 - 651

ER -