Etude bioinformatique des lectines: nouvelle classification et prédiction dans les génomes

par François Bonnardel

Projet de thèse en Chimie Biologie

Sous la direction de Anne Imberty.

Thèses en préparation à l'Université Grenoble Alpes en cotutelle avec l'Université de Genève , dans le cadre de École doctorale chimie et science du vivant , en partenariat avec CEntre de Recherche sur les MAcromolécules Végétales (laboratoire) et de Glycobiologie moléculaire (equipe de recherche) depuis le 29-09-2017 .

  • Résumé

    Mise en place d'une classification des lectines, qui seront ensuite utilisées pour la mise en place de méthodes bio-informatiques de prédiction des lectines dans les génomes. Les connaissances et bases de données développées par le CERMAV et PIG seront utilisées ensemble et améliorées dans le cadre de ce projet.

  • Titre traduit

    Bioinformatics study of lectins: new classification and prediction in genomes

  • Résumé

    Lectins are proteins that interact non‐covalently with carbohydrates reversibly and specifically while displaying no catalytic or immunological activity. Lectins are thought to have the ability to decipher the glycocode, i.e., the structural information present on complex carbohydrates at the cell surface. Lectins are involved in many biological processes, their interactions with carbohydrates often play an essential role to the onset, detection, and potentially prevention of human diseases such as cancer, inflammation, diabetes, neurodegenerative diseases and bacterial and viral infections. In pathogenic microorganisms, lectins are often involved in host recognition and tissue adhesion. In invertebrates, they are often active partners in symbiosis and are part of the innate immunity system. Lectins exhibit a variable oligomeric assembly that ranges from mono‐ to deca‐valency. Such a unique property provides lectins with the ability to bind complex carbohydrates in a multivalent fashion, a feature that underlines the reversibility of biological recognition. In view of the sequence data deluge and the 130,000 protein 3D structures that have been solved, lectins are still poorly characterised. Besides, lectins are not classified according to clear structural and functional criteria. As a result, knowledge of lectins is rather scattered and difficult to use in predictive methods. We propose to gather the knowledge about lectins from CERMAV and the expertise in bioinformatics from SIB to give new dimensions to the lectin3D database and to develop new tools for identifying new lectins and annotating them. The first step will be to elaborate a lectin classification from different perspectives. The knowledge of 3D structures (approximately 1500 in which 70% form a complex with a carbohydrate) can be used to define precise criteria for characterizing folding and oligomerisation. The possible correlation between multivalence, function, subcellular location and the topology of monomer assemblies will be investigated as a source of further useful criteria. In the absence of 3D data, the composition in domains as described in large protein family databases can provide another view on the possible relationship between domain architecture, subcellular location and function.So far, there is no such mapping between lectins and domains. The second step will be to propose tools for correct annotation of lectins in newly sequenced genomes, lectins especially in fungal, algi, invertebrates and bacterial species. At the present time, lectins are not correctly annotated due to the lack of glycoscience expertise in bioinformatics. The proposed research will lead to designing appropriate software to be included in annotation packages in order to identify the many lectins currently recorded as “hypothetical proteins” in sequence or genome databases. The developed tool should take into account the knowledge about 3D structures of lectins, and should be able to overpass difficulties such as the presence of tandem repeats, or other particularity of lectins. Finally, this approach will be applied on a couple of genomes of interest. The team at CERMAV is involved in identifying mucus binding strategy for bacteria/fungi that threaten the life of cystic fibrosis patients and emergent such pathogens (Scedosporium apiospermium, Mycobacterium abcessus…) could be investigated. If promising sequences are identified, the corresponding lectins will be cloned, produced and purified and their structure‐function characterization will be performed by the team at CERMAV, by allocating a master student to the project. Besides the production of useful classification and predictive tools, another expected output of this work is the assessment of the evolutionary steps that lectins underwent to acquire their functionality in the different domains of life. Expected results/Integration in the Glyco@Alps program : 1‐ a classification of lectins that is requested by the international glycol‐community. Such classification has already been performed for CBM (see cazy) or from chemokines, and give much better visibility to the domain. 2‐ a new version of the lectin‐3D database with a novel architecture and an upgrade 3 – an integration of data between the Swiss and French databases focused on lectins 4 – a way for better annotation for lectins in protein and genome database 5 – a tool for searching for lectins in genome of pathogens. All these results are in lines with WP4 of Glyco@Alps that aims at proposing novel tools for the glycosciences, in particular for the semantic web