Education
-
Title: End-to-end approach to classification in unstructured spaces with application to judicial decisions
- Elaboration of new AI methods for justice prediction and explanation.
- A new mathematical theory for classification, with ”good” properties (explainability, no metric required, no hyperparameter,...) based on hypergraphs and metric learning
- A generic method to automate most of data preparation using standard hyperparameter tuning techniques
- The largest curated datasets about the European Court of Human Rights, on which I reached over 94% accuracy predicting the outcome of a judgment.
-
- Mathematics and Software Engineering Department - Top 5 student
- “Musique Étude” - additional cursus for musicians
-
Experiences
-
Proofs.io
Mar. 2024 - PRESENT
Core member of the R&D/engineer team, focusing on the development of the LLM-based platform for building complex software PoCs in minutes.
- Research and development of multi-agent systems for autonomous learning for building complex software PoCs in minutes.
- Continuous and auto-learning via RAG agents and complex nested multi-agent systems..
- Improving the robustness, reliability and reproducibility of LLM-based sytems, notably via advanced Prompt Engineering.
- CI/CD and tooling around LLM, multi-agent systems and knowledge management.
-
Consultant via Hother.io
Nov. 2023 - PRESENT
I accept specific missions and short-term consulting for specific data and AI related problems. Some examples include:
- Synthetic data generation and simulation with counter-factual analysis for clinical studies data.
- Semantic search engine and conversational agent over SQL database.
- SaaS architecture and cost estimation based on scope statements.
- Predictive modeling and recommendation for Mergers & Acquisition.
-
YData
Apr. 2022 - Mar. 2024
I joined YData to take over the data science and AI team from the CDO in order to build a SaaS platform for synthetic data generation and automated data quality assessment and remediation. The responsibilities include: building the team, leading the implementation effort, shaping the core of the company’s intellectual property, researching and answering fundamental research questions at a fast pace.
- Co-author and maintainer of ydata-profiling (previously pandas-profiling), a library for Data Profiling (+10k stars on Github).
- Conception and development of methods to automate data quality assessment and issue remediation.
- Research, PoV/PoC and implementation of new generative models for synthetic data for tabular, timeseries and multitable data.
- Distributed systems using Dask to scale our models and data processing to TB of data
- R&D to integrate LLM to our synthesizers flow and data profiler, notably for table-to-text and row-to-text generation.
-
HSBC
Dec. 2020 - Mar. 2022
As Assistant Vice President of Data Science, I follow the whole governance lifecycle process of global models (>200M users, millions to billions transactions per month), from data quality to model monitoring.
- Leading cross-functional teams from PoC to operational deployment and monitoring in the field of financial crimes detection.
- Design, conception, implementation and validation of one of the main global Name Screening model used accross the group.
- Product owner and main developer of some internal innovative projects to improve the quality of our data and model governance.
- Improving internal model governance lifecycle, including scientific culture, code governance, engineering risks and internal controls.
- As part of the Research team, I propose and lead R&D projects for the Surveillance & Name Screening value stream, including new methodologies and improvement of existing models or processes.
- Leading cross-functional teams from PoC to operational deployment and monitoring in the field of financial crimes detection
- Focus on global scale Name Screening systems
- Establishing design standards for machine learning based models
- Identifying and closing technical gaps in team member skills by providing effective training
-
IBM
May. 2015 - Dec. 2020
- Software Architecture and Engineering on IBM Integrated Analytics System and IBM Cloud Pak for Data
- Responsible for the monitoring solutions of GPFS and Red Hat OpenShift in Hybrid Cloud environment
- Architecture and R&D of an AI platform for financial process automation (IBM Cobee)
- Architecture and R&D of an AI platform to analyze code quality and predict regressions (IBM Code Quality Center)
- Machine Learning and Data Science local trainer
- Conception and implementation of a NLP and Topic Modeling service.
- Responsible for the collaboration with universities (conference, lectures, joint research projects)
- Following master students and interns on research oriented subjects
- Design, implementation and tests of the software stack for IBM Integrated Analytics System (Python)
- Development of a server to transfer and synchronize data between on premise appliances and the Cloud with High Availability and Disaster Recovery constraints (C++)
- Prototype of a semantic search engine for jurisprudence and law related documents
-
Watussi
Aug. 2017
- Conception and implementation of a NLP tool dedicated to Search Engine Optimization (SEO)
-
X-Formation
Nov. 2014 - Apr. 2015
- Conception of a predictive module in Go (based on temporal series)
- Improvement of the company development workflow
- Development of the sustainability and valorization strategy
- Database optimization and normalization
- Software architecture, including highly distributed architecture
-
Inria
Jun. 2012 - Apr. 2015
Title: Insertion of adaptive modalities in the mono or multi objectives evolutionary planner Divide-and-Evolve
Supervisor: Marc Schoenauer, TAO team
- Solving a temporal planning problem (MultiZenoTravel) from the International Planning Competition
- Development of the ZenoSolver, a C++14 solver for MultiZenoTravel instances
- Conception, programmation and experimental validation (using R) of the online adaptation of (hyper)parameters
- Software engineering on Descarwin (french National Agency for Research: ANR-09-COSI-002)
Title: Parallel Island Model for Metaheuristics.
Supervisor: Clive Canape, DOLPHIN team
- Genetic algorithms, stochastic optimization and statistics (ANOVA model),...
- Implementation in C++11, JSON parser, UML modelling, parallel computing and HPC...
- Software engineering on ParadisEO, a C++ framework for metaheuristics
- Development of a module for shared memory parallelism
- Stabilisation for a new release of ParadisEO
- Integration of engineering tools : bug tracking, profiling, builds, continuous integration,...