Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts

P. R. Srinivas, Shang Heng Wei, Nello Cristianini, E. G. Jones, Fredric A Gorin

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Generating informational thesauri that classify, cross-reference, and retrieve diverse and highly detailed neuroscientific information requires identifying related neuroanatomical terms and acronyms within and between species (Gorin et al., 2001) Manual construction of such informational thesauri is laborious, and we describe implementing and evaluating a neuroanatomical term and acronym reconciliation (NTAR) system to assist domain experts with this task. NTAR is composed of two modules. The neuroanatomical term extraction (NTE) module employs a hidden Markov model (HMM) in conjunction with lexical rules to extract neuroanatomical terms (NT) and acronyms (NA) from textual material. The output of the NTE is formatted into collections of term- or acronym-indexed documents composed of sentences and word phrases extracted from textual material. The second information retrieval (IR) module utilizes a vector space model (VSM) and includes a novel, automated relevance feedback algorithm. The IR module retrieves statistically related neuroanatomical terms and acronyms in response to queried neuroanatomical terms and acronyms. Neuroanatomical terms and acronyms retrieval obtained from term-based inquiries were compared with (1) term retrieval obtained by including automated relevance feedback and with (2) term retrieval using "document-to-document" comparisons (context-based VSM). The retrieval of synonymous and similar primate and macaque thalamic terms and acronyms in response to a query list of human thalamic terminology by these three IR approaches was compared against a previously published, manually constructed concordance table of homologous cross-species terms and acronyms. Term-based VSM with automated relevance feedback retrieved 70% and 80% of these primate and macaque terms and acronyms, respectively, listed in the concordance table. Automated feedback algorithm correctly identified 87% of the macaque terms and acronyms that were independently selected by a domain expert as being appropriate for manual relevance feedback. Context-based VSM correctly retrieved 97% and 98% of the primate and macaque terms and acronyms listed in the term homology table. These results indicate that the NTAR system could assist neuroscientists with thesauri creation for closely related, highly detailed neuroanatomical domains.

Original languageEnglish (US)
Pages (from-to)115-131
Number of pages17
JournalNeuroinformatics
Volume3
Issue number2
DOIs
StatePublished - 2005

Fingerprint

Space Simulation
Vector spaces
Macaca
Controlled Vocabulary
Thesauri
Information Storage and Retrieval
Feedback
Information retrieval
Primates
Hidden Markov models
Terminology

Keywords

  • Automated relevance feedback
  • Information retrieval
  • Neuroanatomical domains
  • Neuroanatomical term extraction
  • Vector space model

ASJC Scopus subject areas

  • Neuroscience(all)
  • Health Informatics

Cite this

Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts. / Srinivas, P. R.; Wei, Shang Heng; Cristianini, Nello; Jones, E. G.; Gorin, Fredric A.

In: Neuroinformatics, Vol. 3, No. 2, 2005, p. 115-131.

Research output: Contribution to journalArticle

Srinivas, P. R. ; Wei, Shang Heng ; Cristianini, Nello ; Jones, E. G. ; Gorin, Fredric A. / Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts. In: Neuroinformatics. 2005 ; Vol. 3, No. 2. pp. 115-131.
@article{13c17202f9ac4eb49e926b69ad00e317,
title = "Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts",
abstract = "Generating informational thesauri that classify, cross-reference, and retrieve diverse and highly detailed neuroscientific information requires identifying related neuroanatomical terms and acronyms within and between species (Gorin et al., 2001) Manual construction of such informational thesauri is laborious, and we describe implementing and evaluating a neuroanatomical term and acronym reconciliation (NTAR) system to assist domain experts with this task. NTAR is composed of two modules. The neuroanatomical term extraction (NTE) module employs a hidden Markov model (HMM) in conjunction with lexical rules to extract neuroanatomical terms (NT) and acronyms (NA) from textual material. The output of the NTE is formatted into collections of term- or acronym-indexed documents composed of sentences and word phrases extracted from textual material. The second information retrieval (IR) module utilizes a vector space model (VSM) and includes a novel, automated relevance feedback algorithm. The IR module retrieves statistically related neuroanatomical terms and acronyms in response to queried neuroanatomical terms and acronyms. Neuroanatomical terms and acronyms retrieval obtained from term-based inquiries were compared with (1) term retrieval obtained by including automated relevance feedback and with (2) term retrieval using {"}document-to-document{"} comparisons (context-based VSM). The retrieval of synonymous and similar primate and macaque thalamic terms and acronyms in response to a query list of human thalamic terminology by these three IR approaches was compared against a previously published, manually constructed concordance table of homologous cross-species terms and acronyms. Term-based VSM with automated relevance feedback retrieved 70{\%} and 80{\%} of these primate and macaque terms and acronyms, respectively, listed in the concordance table. Automated feedback algorithm correctly identified 87{\%} of the macaque terms and acronyms that were independently selected by a domain expert as being appropriate for manual relevance feedback. Context-based VSM correctly retrieved 97{\%} and 98{\%} of the primate and macaque terms and acronyms listed in the term homology table. These results indicate that the NTAR system could assist neuroscientists with thesauri creation for closely related, highly detailed neuroanatomical domains.",
keywords = "Automated relevance feedback, Information retrieval, Neuroanatomical domains, Neuroanatomical term extraction, Vector space model",
author = "Srinivas, {P. R.} and Wei, {Shang Heng} and Nello Cristianini and Jones, {E. G.} and Gorin, {Fredric A}",
year = "2005",
doi = "10.1385/NI:3:2:115",
language = "English (US)",
volume = "3",
pages = "115--131",
journal = "Neuroinformatics",
issn = "1539-2791",
publisher = "Humana Press",
number = "2",

}

TY - JOUR

T1 - Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts

AU - Srinivas, P. R.

AU - Wei, Shang Heng

AU - Cristianini, Nello

AU - Jones, E. G.

AU - Gorin, Fredric A

PY - 2005

Y1 - 2005

N2 - Generating informational thesauri that classify, cross-reference, and retrieve diverse and highly detailed neuroscientific information requires identifying related neuroanatomical terms and acronyms within and between species (Gorin et al., 2001) Manual construction of such informational thesauri is laborious, and we describe implementing and evaluating a neuroanatomical term and acronym reconciliation (NTAR) system to assist domain experts with this task. NTAR is composed of two modules. The neuroanatomical term extraction (NTE) module employs a hidden Markov model (HMM) in conjunction with lexical rules to extract neuroanatomical terms (NT) and acronyms (NA) from textual material. The output of the NTE is formatted into collections of term- or acronym-indexed documents composed of sentences and word phrases extracted from textual material. The second information retrieval (IR) module utilizes a vector space model (VSM) and includes a novel, automated relevance feedback algorithm. The IR module retrieves statistically related neuroanatomical terms and acronyms in response to queried neuroanatomical terms and acronyms. Neuroanatomical terms and acronyms retrieval obtained from term-based inquiries were compared with (1) term retrieval obtained by including automated relevance feedback and with (2) term retrieval using "document-to-document" comparisons (context-based VSM). The retrieval of synonymous and similar primate and macaque thalamic terms and acronyms in response to a query list of human thalamic terminology by these three IR approaches was compared against a previously published, manually constructed concordance table of homologous cross-species terms and acronyms. Term-based VSM with automated relevance feedback retrieved 70% and 80% of these primate and macaque terms and acronyms, respectively, listed in the concordance table. Automated feedback algorithm correctly identified 87% of the macaque terms and acronyms that were independently selected by a domain expert as being appropriate for manual relevance feedback. Context-based VSM correctly retrieved 97% and 98% of the primate and macaque terms and acronyms listed in the term homology table. These results indicate that the NTAR system could assist neuroscientists with thesauri creation for closely related, highly detailed neuroanatomical domains.

AB - Generating informational thesauri that classify, cross-reference, and retrieve diverse and highly detailed neuroscientific information requires identifying related neuroanatomical terms and acronyms within and between species (Gorin et al., 2001) Manual construction of such informational thesauri is laborious, and we describe implementing and evaluating a neuroanatomical term and acronym reconciliation (NTAR) system to assist domain experts with this task. NTAR is composed of two modules. The neuroanatomical term extraction (NTE) module employs a hidden Markov model (HMM) in conjunction with lexical rules to extract neuroanatomical terms (NT) and acronyms (NA) from textual material. The output of the NTE is formatted into collections of term- or acronym-indexed documents composed of sentences and word phrases extracted from textual material. The second information retrieval (IR) module utilizes a vector space model (VSM) and includes a novel, automated relevance feedback algorithm. The IR module retrieves statistically related neuroanatomical terms and acronyms in response to queried neuroanatomical terms and acronyms. Neuroanatomical terms and acronyms retrieval obtained from term-based inquiries were compared with (1) term retrieval obtained by including automated relevance feedback and with (2) term retrieval using "document-to-document" comparisons (context-based VSM). The retrieval of synonymous and similar primate and macaque thalamic terms and acronyms in response to a query list of human thalamic terminology by these three IR approaches was compared against a previously published, manually constructed concordance table of homologous cross-species terms and acronyms. Term-based VSM with automated relevance feedback retrieved 70% and 80% of these primate and macaque terms and acronyms, respectively, listed in the concordance table. Automated feedback algorithm correctly identified 87% of the macaque terms and acronyms that were independently selected by a domain expert as being appropriate for manual relevance feedback. Context-based VSM correctly retrieved 97% and 98% of the primate and macaque terms and acronyms listed in the term homology table. These results indicate that the NTAR system could assist neuroscientists with thesauri creation for closely related, highly detailed neuroanatomical domains.

KW - Automated relevance feedback

KW - Information retrieval

KW - Neuroanatomical domains

KW - Neuroanatomical term extraction

KW - Vector space model

UR - http://www.scopus.com/inward/record.url?scp=21344438418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21344438418&partnerID=8YFLogxK

U2 - 10.1385/NI:3:2:115

DO - 10.1385/NI:3:2:115

M3 - Article

C2 - 15988041

AN - SCOPUS:21344438418

VL - 3

SP - 115

EP - 131

JO - Neuroinformatics

JF - Neuroinformatics

SN - 1539-2791

IS - 2

ER -