Microarrays are among the most powerful tools in biological research, but in order to attain its full potentialities, it is imperative to develop techniques capable to effectively exploit the huge quantity of data which they produce. In this paper two machine learning methodologies for microarray data analysis are proposed: (1) Probabilistic Principal Surfaces (PPS), which is a nonlinear latent variable model which offers very appealing visualization and classification abilities and can be effectively employed for clustering purposes. More specifically, the PPS method builds a probability density function of a given data set of patterns, lying in a D dimensional space (with D 3), expressed in terms of a fixed number of latent variables, lying in a Q-dimensional space (Q is usually 2 or 3), which can be used (after a proper manipulation) to visualize, classify and cluster the data; (2) Competitive Evolution on Data (CED) is instead an evolutionary system in which the possible solutions (cluster centroids) compete to conquer the largest possible number of resources (data) and thus partition the input data set in clusters. We discuss the application of both methods to the analysis of microarray data obtained for the yeast genome.

Novel Techniques for Microarray Data Analysis: Probabilistic Principal Surfaces and Competitive Evolution on Data

CIARAMELLA, Angelo;STAIANO, Antonino;
2005

Abstract

Microarrays are among the most powerful tools in biological research, but in order to attain its full potentialities, it is imperative to develop techniques capable to effectively exploit the huge quantity of data which they produce. In this paper two machine learning methodologies for microarray data analysis are proposed: (1) Probabilistic Principal Surfaces (PPS), which is a nonlinear latent variable model which offers very appealing visualization and classification abilities and can be effectively employed for clustering purposes. More specifically, the PPS method builds a probability density function of a given data set of patterns, lying in a D dimensional space (with D 3), expressed in terms of a fixed number of latent variables, lying in a Q-dimensional space (Q is usually 2 or 3), which can be used (after a proper manipulation) to visualize, classify and cluster the data; (2) Competitive Evolution on Data (CED) is instead an evolutionary system in which the possible solutions (cluster centroids) compete to conquer the largest possible number of resources (data) and thus partition the input data set in clusters. We discuss the application of both methods to the analysis of microarray data obtained for the yeast genome.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11367/28728
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact