Data Science Projects

Peer-reviewed ML research turned into interactive presentations — more demos and open-source releases coming soon. See quick summary of all projects

14
Projects
3
Domains
3
Publications
3
HF Demos
Surgical Duration Predictor
🏥
● Live
01
Healthcare · OR Scheduling
Neural Network vs. Surgeon: OR Duration Prediction — ANN, XGBoost, scikit-learn
Can AI predict how long a surgery will take better than the surgeon?
Benchmarked 8 ML models — ANN, XGBoost, Random Forest, GBM, and more — against real surgeon estimates on 17,246 procedures. ANN achieved near-zero bias (−0.37 min) vs. surgeons' consistent −18.52 min underestimation. Published in Surgical Endoscopy 2025.
Neural NetworkXGBoostscikit-learnRandom Forest
NLP Benchmark
💬
● Live
02
NLP · Clinical Text
Clinical NLP Benchmark: Sentence-BERT vs ClinicalBERT — TF-IDF, 5 Strategies, 180K Cases
Which AI reading strategy works best on doctor's notes?
Benchmarked 5 text-encoding strategies — Label Encoding, Count Vectorization, TF-IDF, ClinicalBERT, and Sentence-BERT — on 180K surgical cases. Sentence-BERT reduced prediction error by up to 16% over traditional encodings. SSRN preprint 2025.
Sentence-BERTClinicalBERTNLPBenchmarking
Energy Anomaly XAI
● Live
03
Energy · Explainable AI
Explainable AI (XAI) for Energy Anomaly Detection: SHAP, LSTM, Deep Learning
Why is this building wasting energy — and how do we know for sure?
Context-aware SHAP explanations for deep learning anomaly detection in building energy data. Reduces explanation variability by 38% on average. Published in Energy & Buildings 2025.
SHAPLSTMXAIAnomaly Detection
Sarcopenia Genetics
🧬
● Live
04
Healthcare · Genetics
GWAS for Sarcopenia Risk: IL10 Gene Discovery — MultiPhen, GATES, Biostatistics
How do genes shape muscle loss in aging?
Multivariate candidate-gene GWAS using MultiPhen and GATES on 2,772 elderly participants. First genomic validation of IL10 as a sarcopenia risk gene in an Iranian population. Published in J. of Biostatistics & Epidemiology 2022.
GWASSNPBiostatisticsMultiPhen
ASCVD Metabolomics
🫀
● Live
05
Healthcare · Metabolomics
ASCVD Risk with Metabolomics: PCA, Logistic Regression & Biomarker Discovery Pipeline
Can blood metabolites reveal hidden cardiovascular risk?
PCA and logistic regression analysis of 50 plasma metabolites (acylcarnitines + amino acids) for ASCVD risk prediction. Identified 14 significant biomarkers across four ACC/AHA risk groups. Published in Frontiers in Cardiovascular Medicine 2023.
MetabolomicsASCVDPCALogistic Regression
CABG Gamification
🎮
● Live
06
Healthcare · Gamification
Gamified CABG Recovery RCT: Delban App vs. Teach-Back — ANOVA, SPSS, STATA
Can gamification improve surgical outcomes?
3-arm randomized clinical trial comparing a gamified Android app (Delban) against teach-back training and usual care for post-CABG recovery. Statistical analysis via one-way ANOVA with Dunnett post-hoc tests. Published in JMIR 2021.
GamificationRCTANOVAClinical Trial
Drug Target AI
🧪
● Live
07
Healthcare · Drug Discovery
LLM + AlphaFold Drug Target Discovery: Hugging Face, UniProt, Protein AI Pipeline
Can AI find the right protein to target for drug discovery?
LLM pipeline that takes a plain-language biomedical query and identifies the most relevant human protein targets via live UniProt data, complete with 3D AlphaFold structures rendered in the browser.
LLMUniProtAlphaFoldDrug Discovery
RiskSight Pro
🩺
● Live
08
Data Science · Risk Analytics
RiskSight Pro: Production ML Platform — XGBoost, Scikit-learn, Credit & Fraud Risk
Full-stack risk intelligence platform for banking, insurance, and financial risk.
Seven interactive risk modules — credit default, fraud detection, market risk (VaR), insurance analytics, and more — each powered by a real ML model on live synthetic data. Demonstrates end-to-end production risk analytics with Flask, Plotly, and REST API endpoints.
Risk AnalyticsBankingMLFlask
VAE Project
🧠
● Live
09
Deep Learning · Generative Models
Variational Autoencoder (VAE): Generative AI & Deep Learning — PyTorch, Neural Networks
Train and explore a Variational Autoencoder in your browser.
Fully interactive deep learning playground: configure and train a VAE on MNIST from scratch, then navigate its 2-D latent space with sliders, compare reconstructions side-by-side, and generate new digit images — all streamed live from a PyTorch backend.
VAEPyTorchGenerative ModelsFlask
Transformer From Scratch
🤖
● Live
10
NLP · Education
Transformer Training: Attention Visualization & Seq2Seq — PyTorch, NLP, Deep Learning
Build, train, and watch a Transformer learn — step by step in your browser.
A fully interactive 6-step educational playground: configure the dataset, build vocabulary, design architecture, watch live training, evaluate BLEU scores, and translate sentences — powered by a real PyTorch training loop streamed via SSE.
PyTorchTransformerSeq2SeqFlask
ExperimentLab
🧪
● Live
11
Data Science · Experimentation
End-to-End A/B Testing Platform: SciPy, Statistical Experimentation & Power Analysis
End-to-end statistical experimentation platform for controlled experiments.
Interactive experimentation suite covering the full controlled experiment lifecycle — from pre-experiment power planning to post-experiment decision-making. Four production-quality modules with rigorous statistical methods and interactive Plotly visualizations.
StatisticsSciPyPower AnalysisHypothesis Testing
Research RAG Benchmark
🔍
● Live
12
NLP · Information Retrieval
Hybrid RAG Pipeline: BM25, FAISS, Dense Vector & Reciprocal Rank Fusion on arXiv
Full-stack hybrid RAG system with BM25, dense vectors, and Reciprocal Rank Fusion.
A benchmarking platform that fetches real arXiv papers, indexes them with BM25 and dense vector retrieval, fuses results using Reciprocal Rank Fusion, and evaluates answer quality live. No API key required — runs entirely on open-source models.
RAGLLMFAISSBM25Flask
DocMind
📄
● Live
13
Agentic AI · LLM Orchestration
DocMind: Agentic RAG Document Q&A — LangChain, FAISS, Multi-Agent LLM Pipeline
Multi-agent document Q&A with planning, retrieval, grading, and verification.
An intelligent document analysis system using a 5-agent pipeline: Planner routes user queries, Retriever finds relevant chunks with hybrid search, Grader scores chunks in-place of LLM calls, Generator produces answers, and Critic verifies factuality. Reduces latency by 60% compared to naive RAG.
RAGMulti-AgentLLMHybrid SearchFlask
TechStore Support Agent
💬
● Live
14
Agentic AI · Conversational AI
LangGraph ReAct Agent: LangChain, Tool Calling, Streaming LLM, Production AI System
AI-powered customer support with LangGraph ReAct, 5 tools, and live token streaming.
An intelligent customer support chatbot using LangGraph's ReAct architecture. Features 4 switchable free HuggingFace LLMs, 5 real dispatch tools (order lookup, FAQ search, ticket creation, etc.), real-time token streaming, and full execution transparency. Watch every agent decision unfold.
LangGraphReActAgentLLMFlask
No projects match your search.
Try a different keyword or filter.