Curriculum Vitae

Education

Doctor of Philosophy (Ph.D.) in Machine Learning

Oct. 2016 - Mar. 2021

Poznan University of Technology, Poznan, Poland

Thesis: End-to-end approach to classification in unstructured spaces with application to judicial decisions

Elaboration of new AI methods for justice prediction and explanation.
A new mathematical theory for classification, with ”good” properties (explainability, no metric required, no hyperparameter,...) based on hypergraphs and metric learning
A generic method to automate most of data preparation using standard hyperparameter tuning techniques
The largest curated datasets about the European Court of Human Rights, on which I reached over 94% accuracy predicting the outcome of a judgment.

Master of Science in Mathematics and Computer Science

Oct. 2009 - Mar. 2015

INSA Rouen (National Institute of Applied Sciences), Rouen, France

Mathematics and Software Engineering Department - Top 5 student
“Musique Étude” - additional cursus for musicians

Master of Science in High Performance Computing and Artificial Intelligence

Oct. 2013 - Jun. 2014

Cracow University of Technology, Cracow, Poland

Experience

Mar. 2024 -
PRESENT

Proofs.io

Technical Staff

Core member of the R&D/engineer team, focusing on the development of the LLM-based platform for building complex software PoCs in minutes.

Research and development of multi-agent systems for autonomous learning for building complex software PoCs in minutes.
Continuous and auto-learning via RAG agents and complex nested multi-agent systems..
Improving the robustness, reliability and reproducibility of LLM-based sytems, notably via advanced Prompt Engineering.
CI/CD and tooling around LLM, multi-agent systems and knowledge management.

Nov. 2023 -
PRESENT

Consultant via Hother.io

Data and AI Consultant

I accept specific missions and short-term consulting for specific data and AI related problems. Some examples include:

Synthetic data generation and simulation with counter-factual analysis for clinical studies data.
Semantic search engine and conversational agent over SQL database.
SaaS architecture and cost estimation based on scope statements.
Predictive modeling and recommendation for Mergers & Acquisition.

Apr. 2022 -
Mar. 2024

YData

Head of Data and AI / Technical Lead

I joined YData to take over the data science and AI team from the CDO in order to build a SaaS platform for synthetic data generation and automated data quality assessment and remediation. The responsibilities include: building the team, leading the implementation effort, shaping the core of the company’s intellectual property, researching and answering fundamental research questions at a fast pace.

Co-author and maintainer of ydata-profiling (previously pandas-profiling), a library for Data Profiling (+10k stars on Github).
Conception and development of methods to automate data quality assessment and issue remediation.
Research, PoV/PoC and implementation of new generative models for synthetic data for tabular, timeseries and multitable data.
Distributed systems using Dask to scale our models and data processing to TB of data
R&D to integrate LLM to our synthesizers flow and data profiler, notably for table-to-text and row-to-text generation.

Dec. 2020 -
Mar. 2022

HSBC

Senior Assistant Vice President @ Financial Crime Threat Mitigation, Research

As Assistant Vice President of Data Science, I follow the whole governance lifecycle process of global models (>200M users, millions to billions transactions per month), from data quality to model monitoring.

Leading cross-functional teams from PoC to operational deployment and monitoring in the field of financial crimes detection.
Design, conception, implementation and validation of one of the main global Name Screening model used accross the group.
Product owner and main developer of some internal innovative projects to improve the quality of our data and model governance.
Improving internal model governance lifecycle, including scientific culture, code governance, engineering risks and internal controls.
As part of the Research team, I propose and lead R&D projects for the Surveillance & Name Screening value stream, including new methodologies and improvement of existing models or processes.

Lead Manager (AVP) @ Financial Crime Threat Mitigation, Compliance Analytics

Leading cross-functional teams from PoC to operational deployment and monitoring in the field of financial crimes detection
Focus on global scale Name Screening systems
Establishing design standards for machine learning based models
Identifying and closing technical gaps in team member skills by providing effective training

May. 2015 -
Dec. 2020

IBM

Senior Engineer & Data Scientist

Software Architecture and Engineering on IBM Integrated Analytics System and IBM Cloud Pak for Data
Responsible for the monitoring solutions of GPFS and Red Hat OpenShift in Hybrid Cloud environment
Architecture and R&D of an AI platform for financial process automation (IBM Cobee)
Architecture and R&D of an AI platform to analyze code quality and predict regressions (IBM Code Quality Center)
Machine Learning and Data Science local trainer
Conception and implementation of a NLP and Topic Modeling service.
Responsible for the collaboration with universities (conference, lectures, joint research projects)
Following master students and interns on research oriented subjects

Engineer

Design, implementation and tests of the software stack for IBM Integrated Analytics System (Python)
Development of a server to transfer and synchronize data between on premise appliances and the Cloud with High Availability and Disaster Recovery constraints (C++)
Prototype of a semantic search engine for jurisprudence and law related documents

Aug. 2017 -

Watussi

Data Scientist

Conception and implementation of a NLP tool dedicated to Search Engine Optimization (SEO)

Nov. 2014 -
Apr. 2015

X-Formation

Software Engineer & Product Owner

Conception of a predictive module in Go (based on temporal series)
Improvement of the company development workflow
Development of the sustainability and valorization strategy
Database optimization and normalization
Software architecture, including highly distributed architecture

Jun. 2012 -
Apr. 2015

Inria

Research Collaborator

Solving a temporal planning problem (MultiZenoTravel) from the International Planning Competition
Development of the ZenoSolver, a C++14 solver for MultiZenoTravel instances
Conception, programmation and experimental validation (using R) of the online adaptation of (hyper)parameters
Software engineering on Descarwin (french National Agency for Research: ANR-09-COSI-002)

Research Collaborator

Genetic algorithms, stochastic optimization and statistics (ANOVA model),...
Implementation in C++11, JSON parser, UML modelling, parallel computing and HPC...

Internship

Software engineering on ParadisEO, a C++ framework for metaheuristics
Development of a module for shared memory parallelism
Stabilisation for a new release of ParadisEO
Integration of engineering tools : bug tracking, profiling, builds, continuous integration,...