Hello, It's me πŸ‘‹

SoulaΓ―man Marsou

Your go-to

I'm an IT Engineer passionate about technology and innovation.
Specialize in Data & AI, with a solid background in Software Development, I'm always eager to learn and grow in new areas.
And there's so much more I'm excited to share with you...

More About Me
AI Ambassador

What I do? πŸ’ͺ

Development

πŸ’» I develop robust software, web applications and REST APIs in Python. I'm experienced with frameworks like Flask and FastAPI. 🧠 I'm gaining experience with React and Next.js, which I enjoy using to develop better UIs. πŸ”„ I use efficient CI/CD pipelines with GitHub, Gitlab and Docker to streamline the deployment process. πŸ–₯️ I also manage Linux server configurations and possess a broad range of skills in System Engineering.

Data Science

πŸ“Š I extract meaningful features from complex datasets through Exploratory Data Analysis (EDA). 🐍 I leverage Python in Jupyter Notebooks with libraries like pandas, matplotlib, and scikit-learn to clean, transform, and unlock the full potential of data. πŸ“¦ I develop strong data pipelines using low-level tools like Python for data collection, cleaning, and transformation, and I use CRON jobs on Linux servers to automate processing tasks.

AI & LLM

πŸ€– I train ML models using TensorFlow and PyTorch to solve complex problems. πŸ› οΈ I fine-tune, optimize and serve LLM models using Hugging Face, PEFT methods and llama.cpp. 🌐 I integrate these solutions into web applications by implementing REST APIs, making AI accessible and functional in real-world scenarios.

Big Data

πŸ—„οΈ I design and manage databases using robust technologies, handling both SQL and NoSQL systems. 🐘 I'm comfortable with PostgreSQL and MongoDB. 🌐 I am familiar with Hadoop and its HDFS ecosystem. πŸš€ To stay up-to-date, I continually train on modern tools and technologies, including Azure Data solutions like Databricks.

Latest Experiences

Project Image
CMDB & ETL Software Development

During my internship at Anidris S.A. in Contern, Luxembourg, I developed a comprehensive Configuration Management Database (CMDB) and automated data processing systems.
πŸ’» Designed and implemented the CMDB as a Tomcat application (CMDBuild) with a PostgreSQL database, ensuring robust and scalable architecture.
πŸ”„ Developed a fully automated ETL process using Python for multi-source data acquisition, transformation, and integration.
🌐 Integrated external APIs, including the Microsoft Graph API, and internal REST APIs in the Python software implemented.
🧩 Managed a Red Hat Enterprise Linux environment and configured CI/CD pipelines using GitLab, ensuring efficient deployment workflows.
πŸ“š Authored user documentation, trained employees on tool usage, and provided ongoing technical support for seamless adoption.
πŸ› οΈ Technologies: Python, PostgreSQL, Tomcat, MS Graph API, Red Hat Enterprise Linux, and GitLab CI/CD.

Project Image
LLM Research & Development

During my time at Sopra Steria in Strasbourg, France, I contributed to the research and development of Large Language Models (LLMs) within the scope of the company, helping them explore and understand their potential applications.
πŸ“š Co-authored a detailed state-of-the-art review on LLM enhancement techniques, including training, optimization, and evaluation methodologies.
🧼 Conducted data collection, cleaning, and transformation to prepare a high-quality text dataset for training and evaluation.
πŸš€ Trained and optimized a CodeLLaMa LLM, providing insights and conclusions based on the dataset and performance metrics.
πŸ› οΈ Technologies: Python, HuggingFace, Jupyter Notebook, IBM Watson, and LaTeX.

Latest Projects

RAG Chatbot API – AI & Flask Integration

A chatbot API project built with Flask, Langchain and OpenAI's API, powering the assistant on my portfolio website.
πŸ”Œ Developed a modular REST API using Flask Blueprints.
🧠 Enhanced with a RAG system using LangChain & PostgreSQL Vector DB.
πŸ“„ Documents are ingested from a file and processed via a Python script.
πŸ”’ Includes input validation and rate limiting for secure use.
🐳 Deployed using Docker on an Ubuntu server alongside other services.

BI Analysis – Tax Gap Detection on Built Property Tax (TFPB)

A Business Intelligence project based on French fiscal open data (2021).
🎯 Goal: detect anomalies between expected and collected tax revenue.
🧹 Data preparation using Python + DuckDB for transformation and gap calculation.
πŸ—οΈ Designed a star-schema data model (fact + dimensions).
πŸ“Š Built an interactive Power BI dashboard including: 4 KPIs, top-10 outliers histogram, scatter plot, and a detailed table.
πŸ•΅οΈ Findings: tax collection is globally consistent, but smaller towns show significant individual gaps worth reviewing manually.

DockerizeLLM – Serve LLMs Easily with Docker

A Python automation tool to deploy LLMs from Hugging Face using llama.cpp and Docker.
πŸ” Search for gguf models by keywords directly from the script.
⬇️ Download model files and auto-build a Docker image.
🧠 Serve the LLM with an OpenAI-compatible API using `llama-cpp-python[server]`.
🐳 Fully Dockerized for easy deployment and reproducibility.
πŸ’¬ Test endpoints with curl using the exposed REST API.

Project Image
AI Text Detector - Web Application

I developed, with my team members, an AI web application that detects whether an English text was written by a human or a large language model (LLM).
πŸ€– The AI model is a Random Forest classifier, selected after performing a GridSearch across various models.
πŸ” Explore the Exploratory Data Analysis (EDA) and training process here.
🧼 The dataset underwent thorough analysis and cleaning prior to training.
πŸ“š FastText was used as the NLP method for data vectorization.
πŸš€ The AI model is deployed via a REST API built with Flask.
🌐 It is hosted on an Ubuntu VM from OVH Cloud, configured with Nginx and Docker.
πŸ“¨ The web page interacts with the REST API using AJAX requests.

Project Image
EDA for Tree Defect Analysis and Prediction

A project aimed at predicting tree defects and enhancing vegetation management through data analysis and machine learning.
🧹 Clean and prepare data by handling missing values and removing redundant attributes.
πŸ“Š Analyze data correlations using Pearson correlation, χ² tests, and Cramer’s V.
πŸ€– Build predictive models with Random Forest for both unilabel and multilabel classification.
βš–οΈ Handle imbalanced data using SMOTE and class weight adjustments for improved accuracy.
πŸ–ΌοΈ Visualize trends and patterns using Matplotlib.
πŸ’» Developed in a collaborative environment using Jupyter Notebook with Python.

Project Image
Portfolio Web Development

The current website on which you are is a web app developed in Java with the framework Spring Boot.
βš™οΈ Backend in Java with Spring Boot.
πŸ§ͺ JUnit and Mockito for unit and integration testing.
✨ SonarCloud for code quality assurance.
πŸ“¦ Microservice REST API for data management through a dedicated Spring Boot application.
🐳 Docker with Docker Compose and Systemd for deploying applications (Web app, API, Nginx).
🌐 Nginx used as a reverse proxy.
πŸš€ Workflow with GitHub Actions for managing the CI/CD pipeline.

AI Bots Multitenant Manager – Deployment Platform

A full-stack infrastructure designed to automate and manage Mindcraft AI servers for multiple users.
πŸ€– Mindcraft is an open-source project that connects Minecraft bots to LLMs to interact with players in-game.
🎯 Problem: Too complex to deploy for regular Minecraft users.
βœ… My goal: allow users to launch their own AI Minecraft bot server in a few clicks.

Key Features:
- Deploy Mindcraft servers dynamically in Docker containers
- Custom Python API (FastAPI) to control servers remotely (start/stop, edit config, interact with bots)
- Web dashboard built with React + Next.js to manage servers visually
- LLM Proxy (LiteLLM) to control API usage per user (limits, tracking, rules)
- Architecture uses 3 Linux servers: dashboard, Mindcraft hosts, and LLM proxy

πŸ–ΌοΈ Below are screenshots from the dashboard showing server and bot management interfaces.
πŸ”’ Source code is private but demo access can be provided upon request.

Project Image
Wildfire Data Visualization Mapping - D3.js

A visualization project to enhance the evolution of forest fires in France over the years.
🌐 Collect CSV data from official government websites.
πŸ—ΊοΈ Find relevant SVG maps of France.
🧹 Clean and transform data to apply it to maps.
πŸ’» Create a simple and efficient static website to display data.
πŸ“Š Implement visualizations with JavaScript using the D3.js library.

Who Am I? 🫣

Adaptable IT Engineer with a Passion for Innovation

Hi! I'm an IT Engineer with diverse skills and a passion for problem-solving.
πŸ’‘ My expertise spans Data Science, generative AI, backend developement and system administration.
πŸ› οΈ I work with tools like Docker and GitHub Actions to streamline workflows, and I have solid experience in Linux server management.
🀝 Combining technical expertise with strong soft skills, I focus on teamwork, adaptability, and delivering impactful results.

Check my LinkedIn profile

Why work with me? πŸ€”

πŸŽ›οΈ Technical expertise with broad skills

I possess strong technical skills across various domains: Development, Generative AI, Data Science, and System Administration. With this diverse experience, no new IT field is beyond my reach, and I'm eager to become an expert in one of these fields.

⚑ A Quick Learner

My greatest strength lies in my adaptability and ability to learn quickly. I excel at mastering the best tools and techniques required to meet project goals. I am eager to embrace new technologies and innovative approaches, believing that success stems from leveraging both innovation and the valuable insights of others.

πŸ’‘ Soft skills matter

Soft skills are the key to a project's success. They allow me to understand the needs and concerns of others and to communicate effectively, fostering collaboration to achieve shared goals. Communication is pivotal, and I also excel in crisis management when needed.