HEMIC

HEMIC Meta-analysis and HEterogeneous data agregation: an application to predictive models in food MICrobiology

Ph.D thesis project of Maëva Caillat

Context:

Producers, processors, and distributors in the agri-food sector must guarantee the safety and quality of the food products they place on the market. To do so, they can use mathematical predictions based on microbiological knowledge and data from the literature. The Sym'Previus GIS provides food microbiology prediction tools that can be used by these stakeholders (https://symprevius.eu/fr/). These tools can also facilitate the development of new recipes, identify microbiological hazards in new products, and qualify the use of new ingredients. These needs are becoming increasingly important in the context of ecological transition, which involves industrial, food, and energy transition.
In this context, it is important to diversify and consolidate the information used to produce simulations of microbial behavior in order to ensure food safety and quality. There are few tools available to stakeholders in the agri-food sector, and they require continuous updating. The literature is full of data that can be exploited. However, they are difficult to exploit directly because they come from different products, different strategies or methodologies, both biological and statistical, and are of different natures because the behaviors studied are diverse (growth, inactivation).

Objectives:

The work will be divided into three phases:

1-    Develop statistical methods for aggregating different types of data. 
This involves semi-automated data collection (data mining), particularly parcel data, taking into account new environmental factors that explain the growth or inactivation of microorganisms, estimating and taking into account experimental variability (species, strains, serovars, environments, etc.), and applying a confidence index based on data quality.
2-    Develop quantitative approaches (models) to estimate predictive microbiology parameters. 
In order to enrich the database produced by the Sym'Previus GIS, the data collected in step 1 will need to be integrated and aggregated with existing data. This modeling stage (with potential extrapolation) will use probabilistic models to integrate the biological variability of microorganisms, the physicochemical variability of foods, and the uncertainty of predictions. Simulations of the behavior of different serovars, strains, and species on a wider variety of foods can be carried out. 
3-    Explore different approaches based on machine learning or artificial intelligence:
In addition to these statistical methods, we will explore numerical approaches such as machine learning and artificial intelligence to assess whether they can provide additional information or partially replace the modeling/simulation tools currently used in the Sym'Previus software.

Start date: October 2025

Duration: 3 years

Coordination: Jeanne-Marie Membré

Funding: Cifre thesis funded by ADRIA