Abstract
DNA microarrays are powerful tools for exploring gene expression and predicting disease state. However, since the number of variables (genes) typically exceeds the number of samples (tissue specimens), many potentially spurious genes may be selected for a predictor function. Principle component analysis (PCA) can greatly reduce the high-dimensional microarray data space while retaining most of the inherent variability. We propose a methodology that uses PCA to identify a predictor vector between two mutually exclusive and collectively exhaustive classes. By projecting the training set upon this vector a distribution of projections can be computedfor each class. A log-likelihood ratio is then calculatedfor class membership. We used this methodology to classify 48 biopsy specimens as either oral squamous cell carcinoma or normal oral mucosa using oligonucleotide microarrays. The system was trained using a set of halfthe samples, and correctly predicted the membership of the other half. The three most highly positively and three most highly negative predictive genes were all keratins that are known markers of squamous cell carcinoma.
Original language | English (US) |
---|---|
Title of host publication | Studies in Health Technology and Informatics |
Pages | 823-826 |
Number of pages | 4 |
Volume | 107 |
DOIs | |
State | Published - 2004 |
Externally published | Yes |
Keywords
- Carcinoma
- Oligonucleotide Array Sequence Analysis
- Principal Component Analysis
- Squamous Cell
ASJC Scopus subject areas
- Biomedical Engineering
- Health Informatics
- Health Information Management