Purpose: To test the reliability of concept map assessment, which can be used to assess an individual's "knowledge structure," in a medical education setting. Method: In 2004, 52 senior residents (pediatrics and internal medicine) and fourth-year medical students at the University of California-Davis School of Medicine created separate concept maps about two different subject domains (asthma and diabetes) on two separate occasions each (four total maps). Maps were rated using four different scoring systems: structural (S; counting propositions), quality (Q; rating the quality of propositions), importance/quality (I/Q; rating importance and quality of propositions), and a hybrid system (H; combining elements of S with I/Q). The authors used generalizability theory to determine reliability. Results: Learners (universe score) contributed 40% to 44% to total score variation for the Q, I/Q, and H scoring systems, but only 10% for the S scoring system. There was a large learner-occasion-domain interaction effect (19%-23%). Subsequent analysis of each subject domain separately demonstrated a large learner-occasion interaction effect (31%-37%) and determined that administration on four to five occasions was necessary to achieve adequate reliability. Rater variation was uniformly low. Conclusions: The Q, I/Q, and H scoring systems demonstrated similar reliability and were all more reliable than the S system. The findings suggest that training and practice are required to perform the assessment task, and, as administered in this study, four to five testing occasions are required to achieve adequate reliability. Further research should focus on whether alterations in the concept mapping task could allow it to be administered over fewer occasions while maintaining adequate reliability.
ASJC Scopus subject areas