Text-mining protein-protein interaction corpus using concept clustering to identify intermittency

Leif E. Peterson, Matthew A Coleman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs in the Human Protein Reference Database(HPRD). A randomization test for intermittency in the form of spikes and holes in frequency distributions of cluster-specific word frequencies was developed using scaled factorial moments. The test was based on a permutation form of a log-linear regression model to determine differences in slopes for ln(F 2) vs. ln(M) in the intermittent and null distributions. Significant intermittency (p < 0.0005) in PPI was detected for prostate and testis tissue after a Bonferroni adjustment for multiple tests. The presence of intermittency reflects spikes and holes in histograms of cluster-specific word frequencies and possibly suggests identification of novel large signal transduction pathways or networks.

Original languageEnglish (US)
Title of host publicationProceedings of the International Joint Conference on Neural Networks
Pages3634-3640
Number of pages7
DOIs
StatePublished - 2008
Externally publishedYes
Event2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong, China
Duration: Jun 1 2008Jun 8 2008

Other

Other2008 International Joint Conference on Neural Networks, IJCNN 2008
CountryChina
CityHong Kong
Period6/1/086/8/08

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Text-mining protein-protein interaction corpus using concept clustering to identify intermittency'. Together they form a unique fingerprint.

  • Cite this

    Peterson, L. E., & Coleman, M. A. (2008). Text-mining protein-protein interaction corpus using concept clustering to identify intermittency. In Proceedings of the International Joint Conference on Neural Networks (pp. 3634-3640). [4634318] https://doi.org/10.1109/IJCNN.2008.4634318