Curriculum Vitae

Education

Doctor of Philosophy (Ph.D.) in Machine Learning

Oct. 2016 - Mar. 2021

Poznan University of Technology, Poznan, Poland

Thesis: End-to-end approach to classification in unstructured spaces with application to judicial decisions

  • Elaboration of new AI methods for justice prediction and explanation.
  • A new mathematical theory for classification, with ”good” properties (explainability, no metric required, no hyperparameter,...) based on hypergraphs and metric learning
  • A generic method to automate most of data preparation using standard hyperparameter tuning techniques
  • The largest curated datasets about the European Court of Human Rights, on which I reached over 94% accuracy predicting the outcome of a judgment.

Master of Science in Mathematics and Computer Science

Oct. 2009 - Mar. 2015

INSA Rouen (National Institute of Applied Sciences), Rouen, France

  • Mathematics and Software Engineering Department - Top 5 student
  • “Musique Étude” - additional cursus for musicians

Master of Science in High Performance Computing and Artificial Intelligence

Oct. 2013 - Jun. 2014

Cracow University of Technology, Cracow, Poland

Experience

Mar. 2024 -
PRESENT

Proofs.io

Technical Staff

Core member of the R&D/engineer team, focusing on the development of the LLM-based platform for building complex software PoCs in minutes.

  • Research and development of multi-agent systems for autonomous learning for building complex software PoCs in minutes.
  • Continuous and auto-learning via RAG agents and complex nested multi-agent systems..
  • Improving the robustness, reliability and reproducibility of LLM-based sytems, notably via advanced Prompt Engineering.
  • CI/CD and tooling around LLM, multi-agent systems and knowledge management.
Nov. 2023 -
PRESENT

Consultant via Hother.io

Data and AI Consultant

I accept specific missions and short-term consulting for specific data and AI related problems. Some examples include:

  • Synthetic data generation and simulation with counter-factual analysis for clinical studies data.
  • Semantic search engine and conversational agent over SQL database.
  • SaaS architecture and cost estimation based on scope statements.
  • Predictive modeling and recommendation for Mergers & Acquisition.
Apr. 2022 -
Mar. 2024

YData

Head of Data and AI / Technical Lead

I joined YData to take over the data science and AI team from the CDO in order to build a SaaS platform for synthetic data generation and automated data quality assessment and remediation. The responsibilities include: building the team, leading the implementation effort, shaping the core of the company’s intellectual property, researching and answering fundamental research questions at a fast pace.

  • Co-author and maintainer of ydata-profiling (previously pandas-profiling), a library for Data Profiling (+10k stars on Github).
  • Conception and development of methods to automate data quality assessment and issue remediation.
  • Research, PoV/PoC and implementation of new generative models for synthetic data for tabular, timeseries and multitable data.
  • Distributed systems using Dask to scale our models and data processing to TB of data
  • R&D to integrate LLM to our synthesizers flow and data profiler, notably for table-to-text and row-to-text generation.
Dec. 2020 -
Mar. 2022

HSBC

Senior Assistant Vice President @ Financial Crime Threat Mitigation, Research

As Assistant Vice President of Data Science, I follow the whole governance lifecycle process of global models (>200M users, millions to billions transactions per month), from data quality to model monitoring.

  • Leading cross-functional teams from PoC to operational deployment and monitoring in the field of financial crimes detection.
  • Design, conception, implementation and validation of one of the main global Name Screening model used accross the group.
  • Product owner and main developer of some internal innovative projects to improve the quality of our data and model governance.
  • Improving internal model governance lifecycle, including scientific culture, code governance, engineering risks and internal controls.
  • As part of the Research team, I propose and lead R&D projects for the Surveillance & Name Screening value stream, including new methodologies and improvement of existing models or processes.

Lead Manager (AVP) @ Financial Crime Threat Mitigation, Compliance Analytics

  • Leading cross-functional teams from PoC to operational deployment and monitoring in the field of financial crimes detection
  • Focus on global scale Name Screening systems
  • Establishing design standards for machine learning based models
  • Identifying and closing technical gaps in team member skills by providing effective training
May. 2015 -
Dec. 2020

IBM

Senior Engineer & Data Scientist

  • Software Architecture and Engineering on IBM Integrated Analytics System and IBM Cloud Pak for Data
  • Responsible for the monitoring solutions of GPFS and Red Hat OpenShift in Hybrid Cloud environment
  • Architecture and R&D of an AI platform for financial process automation (IBM Cobee)
  • Architecture and R&D of an AI platform to analyze code quality and predict regressions (IBM Code Quality Center)
  • Machine Learning and Data Science local trainer
  • Conception and implementation of a NLP and Topic Modeling service.
  • Responsible for the collaboration with universities (conference, lectures, joint research projects)
  • Following master students and interns on research oriented subjects

Engineer

  • Design, implementation and tests of the software stack for IBM Integrated Analytics System (Python)
  • Development of a server to transfer and synchronize data between on premise appliances and the Cloud with High Availability and Disaster Recovery constraints (C++)
  • Prototype of a semantic search engine for jurisprudence and law related documents
Aug. 2017 -

Watussi

Data Scientist

  • Conception and implementation of a NLP tool dedicated to Search Engine Optimization (SEO)
Nov. 2014 -
Apr. 2015

X-Formation

Software Engineer & Product Owner

  • Conception of a predictive module in Go (based on temporal series)
  • Improvement of the company development workflow
  • Development of the sustainability and valorization strategy
  • Database optimization and normalization
  • Software architecture, including highly distributed architecture
Jun. 2012 -
Apr. 2015

Inria

Research Collaborator

  • Solving a temporal planning problem (MultiZenoTravel) from the International Planning Competition
  • Development of the ZenoSolver, a C++14 solver for MultiZenoTravel instances
  • Conception, programmation and experimental validation (using R) of the online adaptation of (hyper)parameters
  • Software engineering on Descarwin (french National Agency for Research: ANR-09-COSI-002)

Research Collaborator

  • Genetic algorithms, stochastic optimization and statistics (ANOVA model),...
  • Implementation in C++11, JSON parser, UML modelling, parallel computing and HPC...

Internship

  • Software engineering on ParadisEO, a C++ framework for metaheuristics
  • Development of a module for shared memory parallelism
  • Stabilisation for a new release of ParadisEO
  • Integration of engineering tools : bug tracking, profiling, builds, continuous integration,...