Biclustering techniques have become very popular in cancer genetics studies, as they are tools that are expected to connect phenotypes to genotypes, i.e. to identify subgroups of cancer patients based on the fact that they share similar gene expression patterns as well as to identify subgroups of genes that are specific to these subtypes of cancer and therefore could serve as biomarkers. In this paper we propose a new approach for identifying such relationships or biclusters between patients and gene expression profiles. This method, named biDCG, rests on two key concepts. First, it uses a new clustering technique, DCG-tree [Fushing et al, PLos One, 8, e56259 (2013)] that generates ultrametric topological spaces that capture the geometries of both the patient data set and the gene data set. Second, it optimizes the definitions of bicluster membership through an iterative two-way reclustering procedure in which patients and genes are reclustered in turn, based respectively on subsets of genes and patients defined in the previous round. We have validated biDCG on simulated and real data. Based on the simulated data we have shown that biDCG compares favorably to other biclustering techniques applied to cancer genomics data. The results on the real data sets have shown that biDCG is able to retrieve relevant biological information.
ASJC Scopus subject areas
- Agricultural and Biological Sciences(all)
- Biochemistry, Genetics and Molecular Biology(all)