This paper aims to develop methods for protein structural representation and to implement structural blocks retrieval into a macromolecule or even into the entire protein data-base (PDB). A first problem to deal with is that of representing structural blocks. A proposal is to exploit the Extended Gaussian Image (EGI) which maps on the unitary sphere the histogram of the orientations of the object surface. In fact, we propose to adopt a particular ’abstract’ data-structure named Protein Gaussian Image (PGI) for representing the orientation of the protein secondary structures (helices and sheets). The ’concrete’ data structure is the same as for the EGI, however, in this case the points of the Gaussian sphere surface do not contain the area of patches having that orientation but features of the secondary structures (SSs) having that direction. Among the features we may include the versus (e.g. + origin versus surface or - vice versa), the length of the structure (number of amino), biochemical properties, and even the sequence of the amino (such as in a list). We consider this representation very effective for a preliminary screening when searching on the PDB. We propose to employ the PGI in a fold recognition problem in which the learning task is performed by means an unsupervised method for structured data.

Protein Structural Blocks Representation and Search through Unsupervised NN

FERONE, Alessio;PETROSINO, Alfredo
2012

Abstract

This paper aims to develop methods for protein structural representation and to implement structural blocks retrieval into a macromolecule or even into the entire protein data-base (PDB). A first problem to deal with is that of representing structural blocks. A proposal is to exploit the Extended Gaussian Image (EGI) which maps on the unitary sphere the histogram of the orientations of the object surface. In fact, we propose to adopt a particular ’abstract’ data-structure named Protein Gaussian Image (PGI) for representing the orientation of the protein secondary structures (helices and sheets). The ’concrete’ data structure is the same as for the EGI, however, in this case the points of the Gaussian sphere surface do not contain the area of patches having that orientation but features of the secondary structures (SSs) having that direction. Among the features we may include the versus (e.g. + origin versus surface or - vice versa), the length of the structure (number of amino), biochemical properties, and even the sequence of the amino (such as in a list). We consider this representation very effective for a preliminary screening when searching on the PDB. We propose to employ the PGI in a fold recognition problem in which the learning task is performed by means an unsupervised method for structured data.
978-3-642-33265-4
978-3-642-33266-1
978-3-642-33265-4
978-3-642-33266-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11367/30618
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact