Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies

Christopher Ochs, James Geller, Yehoshua Perl, Yan Chen, Junchuan Xu, Hua Min, James Case, Zhi Wei

Research output: Contribution to journalArticle

30 Citations (Scopus)

Abstract

Objective Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. Methods An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. Results We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. Discussion The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. Conclusions An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.

Original languageEnglish (US)
Pages (from-to)507-518
Number of pages12
JournalJournal of the American Medical Informatics Association
Volume22
Issue number3
DOIs
StatePublished - Jan 1 2015
Externally publishedYes

Fingerprint

Systematized Nomenclature of Medicine
Terminology

Keywords

  • Abstraction network
  • Scalable quality assurance
  • SNOMED CT
  • Standards quality assurance
  • Subject-based terminology quality assurance
  • Terminology quality assurance

ASJC Scopus subject areas

  • Health Informatics

Cite this

Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies. / Ochs, Christopher; Geller, James; Perl, Yehoshua; Chen, Yan; Xu, Junchuan; Min, Hua; Case, James; Wei, Zhi.

In: Journal of the American Medical Informatics Association, Vol. 22, No. 3, 01.01.2015, p. 507-518.

Research output: Contribution to journalArticle

Ochs, Christopher ; Geller, James ; Perl, Yehoshua ; Chen, Yan ; Xu, Junchuan ; Min, Hua ; Case, James ; Wei, Zhi. / Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies. In: Journal of the American Medical Informatics Association. 2015 ; Vol. 22, No. 3. pp. 507-518.
@article{4b828603a1894c679c0952d2f3e1943c,
title = "Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies",
abstract = "Objective Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. Methods An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. Results We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. Discussion The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. Conclusions An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.",
keywords = "Abstraction network, Scalable quality assurance, SNOMED CT, Standards quality assurance, Subject-based terminology quality assurance, Terminology quality assurance",
author = "Christopher Ochs and James Geller and Yehoshua Perl and Yan Chen and Junchuan Xu and Hua Min and James Case and Zhi Wei",
year = "2015",
month = "1",
day = "1",
doi = "10.1136/amiajnl-2014-003151",
language = "English (US)",
volume = "22",
pages = "507--518",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies

AU - Ochs, Christopher

AU - Geller, James

AU - Perl, Yehoshua

AU - Chen, Yan

AU - Xu, Junchuan

AU - Min, Hua

AU - Case, James

AU - Wei, Zhi

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Objective Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. Methods An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. Results We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. Discussion The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. Conclusions An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.

AB - Objective Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. Methods An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. Results We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. Discussion The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. Conclusions An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.

KW - Abstraction network

KW - Scalable quality assurance

KW - SNOMED CT

KW - Standards quality assurance

KW - Subject-based terminology quality assurance

KW - Terminology quality assurance

UR - http://www.scopus.com/inward/record.url?scp=84940374235&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940374235&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2014-003151

DO - 10.1136/amiajnl-2014-003151

M3 - Article

C2 - 25336594

AN - SCOPUS:84940374235

VL - 22

SP - 507

EP - 518

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 3

ER -