Jeremy Vachier

Theoretical Physicist | Associate Director Data Science

Projects

Scientific Literature RAG System

AI-powered introduction generator for scientific papers

RAG System Demo

Live demonstration of the Scientific Literature RAG system generating introductions with semantic search and citation extraction.

A production-ready RAG system that automatically generates well-structured, literature-informed introductions for scientific papers. The system indexes research papers and leverages LLMs to synthesize relevant literature into comprehensive introductions with properly formatted citations.

Key Features:

Performance:

Technologies: Python 3.11+, GPU, LLM, ChromaDB, SPECTER2, RAG, Dash
Status: Production Ready | Apache 2.0 License
→ View on GitHub


Speech-to-Text with Sentiment Analysis and Translation

Real-time multilingual processing pipeline

Transformer Architecture

Detailed architecture of the Transformer model showing encoder-decoder structure with multi-head attention mechanisms.

A comprehensive end-to-end system integrating speech recognition, sentiment analysis, and neural machine translation. Built with from-scratch Transformer implementation demonstrating deep understanding of attention mechanisms and encoder-decoder architectures.

Key Features:

Performance Metrics:

Architecture Highlights:

Technologies: Python 3.11+, Transformer, Bidirection LSTM, TensorFlow, Keras, Vosk, Dash, Optuna
Status: Research/Educational | Apache 2.0 License
→ View on GitHub


Active Particles in 3D Confinement

GPU-accelerated molecular dynamics simulation

Performance Benchmark

Performance comparison across CPU and GPU implementations showing up to 27× speedup on Apple Silicon.

Particles Video

Simulation of 1000 active Brownian particles under 3D cylindrical confinement, demonstrating collective motion and particle-particle interactions with Lennard-Jones repulsion.

A high-performance C++ simulation framework for modeling active Brownian particles (ABPs) under cylindrical confinement in 3D space. Implements Euler-Maruyama algorithm for Langevin dynamics with dual CPU/GPU implementations.

Key Features:

Performance Benchmarks (Apple M2, 1000 timesteps):

Particles CPU (1 thread) CPU (OpenMP 6) GPU (Metal) GPU vs 1 CPU GPU vs OpenMP
100 0.026s 0.103s 0.096s 0.3× 0.9×
200 0.075s 0.125s 0.115s 0.7× 1.1×
500 0.401s 0.269s 0.225s 1.8× 1.2×
1,000 1.514s 0.747s 0.445s 3.4× 1.7×
2,000 5.638s 2.389s 0.601s 9.4× 4.0×
5,000 34.393s 13.903s 1.262s 27.3× 11.0×

Physical Model:

Technologies: C++17, OpenMP, Metal GPU Status: Research-Grade | Apache 2.0 License
→ View on GitHub


Personality Classification with Ensemble Learning

Production ML pipeline achieving Kaggle top 5%

Dashboard Demo

Interactive dashboard for real-time personality classification with model explainability and performance metrics.

A production-ready machine learning pipeline for personality classification using ensemble learning, achieving top 5% (200/4,329) in the Kaggle Personality Classification Competition. Features modular architecture, automated hyperparameter optimization, and interactive visualization.

Key Features:

Ensemble Architecture:

Stack Focus Algorithms Special Features
A Traditional ML (Narrow) RF, LR, XGBoost, LightGBM, CatBoost Stable baseline
B Traditional ML (Wide) Same as A Extended search space
C Gradient Boosting XGBoost, CatBoost Tree specialists
D Sklearn Ensemble Extra Trees, Hist GB, SVM, Gaussian NB Diverse mix
E Neural Networks MLPClassifier, Deep architectures Non-linear patterns
F Noise-Robust Same as A + label noise Improved generalization

Competition Results:

Technologies: Python 3.11+, scikit-learn, XGBoost, LightGBM, CatBoost, Optuna, Dash, Plotly Status: Production Ready | Apache 2.0 License
→ View on GitHub


Open Source Contributions

All public projects available on GitHub