Interactive analysis of phylomemetic structures

par Ian Jeantet

Projet de thèse en Informatique

Sous la direction de David Gross-amblard.

Thèses en préparation à Rennes 1 , dans le cadre de Mathématiques et STIC , en partenariat avec INSTITUT DE RECHERCHES EN INFORMATIQUE ET SYSTEMES ALEATOIRES (equipe de recherche) depuis le 10-01-2018 .

  • Titre traduit

    Analyse interactive de structures phylomémétiques

  • Résumé

    Understanding the evolution of various scientific fields is important for our society. Obtaining a general picture of important evolutions of entire scientific fields is rather challenging in the light of the proliferation of scientific publishing and in the presence of overspecialized scientific journals. Recent papers [1,2] propose text analysis techniques to reconstruct important aspects of evolution, based on large corpora of scientific publications (such as Web of Science, PubMed). The Epique project proposes to develop automated tools that can assist (social) scientists to study empirically particular aspects of the social dynamics of science. The existing methods for phylomemetic structure reconstruction rely on the following schema. 1) Extraction of key terms from the articles. 2) Construction of a term cooccurrence graph (in the scientific publications), 3) identifying densely connected subgraphs in this term co-occurrence graph and 4) inter-temporal analysis of dense subgraphs. The result of the analysis is represented in the form of phylomemetic lattices (which are analogous to phylogenetic trees that are used in biology, for representing the evolution natural species). While automatic phylomemetic structure reconstruction gives promising results, the scientist studying the evolution of science would like to interact with the tools and influence the construction algorithms. The thesis should develop techniques that can enable the interactive construction of phylomemetic structures. Through the interaction the scientists can add or precise pieces of information in order to reduce the uncertainties present at the various stages of the reconstruction procedure. The thesis will focus on some of the following aspects. • Developing a model of phylomemetic structure as a (structured) knowledge extraction • Enriching the extraction model with quality metrics • We would like to develop algorithms that can support scientists exploring the graph (lattice). This requires data exploration techniques [8,9], as the phylomemetic structure is rather large in practice. • Provenance. As provenance questions can be important in the reconstruction process, our model should also deal with provenance information [10]. • Developing a workflow model of phylomemetic structure maintenance that can update parts of the network, in particular in the case of quality problems.