Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

Licong Cui, Wei Zhu, Shiqiang Tao, James Case, Olivier Bodenreider, Guo Qiang Zhang

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.

Original languageEnglish (US)
Article numberocw175
Pages (from-to)788-798
Number of pages11
JournalJournal of the American Medical Informatics Association
Volume24
Issue number4
DOIs
StatePublished - Jul 1 2017
Externally publishedYes

Fingerprint

Systematized Nomenclature of Medicine
Terminology

Keywords

  • Non-lattice subgraph
  • Ontology
  • Quality assurance
  • SNOMED CT

ASJC Scopus subject areas

  • Health Informatics

Cite this

Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT. / Cui, Licong; Zhu, Wei; Tao, Shiqiang; Case, James; Bodenreider, Olivier; Zhang, Guo Qiang.

In: Journal of the American Medical Informatics Association, Vol. 24, No. 4, ocw175, 01.07.2017, p. 788-798.

Research output: Contribution to journalArticle

Cui, Licong ; Zhu, Wei ; Tao, Shiqiang ; Case, James ; Bodenreider, Olivier ; Zhang, Guo Qiang. / Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT. In: Journal of the American Medical Informatics Association. 2017 ; Vol. 24, No. 4. pp. 788-798.
@article{ec6ee07c42864276924075f028264dca,
title = "Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT",
abstract = "Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.",
keywords = "Non-lattice subgraph, Ontology, Quality assurance, SNOMED CT",
author = "Licong Cui and Wei Zhu and Shiqiang Tao and James Case and Olivier Bodenreider and Zhang, {Guo Qiang}",
year = "2017",
month = "7",
day = "1",
doi = "10.1093/jamia/ocw175",
language = "English (US)",
volume = "24",
pages = "788--798",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

AU - Cui, Licong

AU - Zhu, Wei

AU - Tao, Shiqiang

AU - Case, James

AU - Bodenreider, Olivier

AU - Zhang, Guo Qiang

PY - 2017/7/1

Y1 - 2017/7/1

N2 - Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.

AB - Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.

KW - Non-lattice subgraph

KW - Ontology

KW - Quality assurance

KW - SNOMED CT

UR - http://www.scopus.com/inward/record.url?scp=85026429812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026429812&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocw175

DO - 10.1093/jamia/ocw175

M3 - Article

VL - 24

SP - 788

EP - 798

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 4

M1 - ocw175

ER -