Banking · Insurance · FinRisk Python · Flask · Scikit-learn · Plotly Live on HuggingFace · Industry Project

RiskSight Pro — 7 ML Risk Models, One Live Dashboard

Credit defaults. Fraudulent transactions. Market drawdowns. Insurance loss ratios. RiskSight Pro unifies seven risk modules — each powered by a real ML model and live synthetic data — into a single interactive dashboard deployable on HuggingFace Spaces. Built entirely in Python with Flask and Plotly, it demonstrates how production risk analytics looks end-to-end.

Industry Project · 2025 Noorchenarboo 5,200+ Synthetic Records 3 ML Models (RF · GBM · LogReg)
7
Interactive risk modules across banking & insurance
3
Live ML models: Random Forest, GBM, Logistic Regression
5,200+
Synthetic records — credit, fraud, insurance, market data
VaR 99%
Market risk quantified via rolling Value-at-Risk engine
Live
Real-time ML scoring via REST API endpoints
Architecture Overview

From Synthetic Data to a Live Risk Dashboard

RiskSight Pro is structured as a single Flask application with a shared dark-theme shell injected into every page. On startup, three ML models are trained in memory on 5,200+ synthetic records covering credit borrowers, fraud transactions, insurance policies, and one year of daily market returns. Every chart is a server-side Plotly figure serialised to JSON and rendered client-side — no static images, no stale data.

🗄️
Synthetic Data
NumPy/Pandas — credit, fraud, insurance, market
🧠
ML Training
RF · GBM · LogReg trained at startup
📊
Plotly Figures
Server-side JSON, rendered in browser
🌐
Flask Routes
7 pages + 3 REST API endpoints
🚀
HF Spaces
Docker container on port 7860
💡

Why a Shared Shell Template?

All seven pages share a single SHELL string injected via render_template_string. This gives a consistent dark sidebar + topbar with zero static file dependencies. The entire frontend — sidebar, nav, all charts, all forms — is generated dynamically by one Python file. No HTML files needed on the server.

Module Breakdown

Seven Risk Modules in One App

💳 Credit Risk
Credit Risk Scorer
Random Forest trained on 1,200 borrower records predicts Probability of Default. An interactive form lets users score any applicant in real time.
ModelRandom Forest (100 trees)
OutputPD · LGD proxy · Expected Loss
🕵️ Fraud Detection
Fraud Transaction Monitor
Gradient Boosting classifier on 3,000 transactions flags fraud by amount, hour, merchant risk, foreign flag, and velocity. Live flagged transaction table included.
ModelGradient Boosting (100 est.)
ThresholdFraud probability > 0.25
📈 Market Risk
VaR / CVaR Engine
252 days of simulated daily returns compute VaR at 95% & 99%, Expected Shortfall, Sharpe ratio, max drawdown, and a rolling 21-day VaR chart.
MethodHistorical simulation
Portfolio size$10M synthetic book
🏦 Loan Portfolio
Portfolio Concentration Analyser
Explores loan volume by purpose, credit grade mix, income vs loan scatter, and a region × purpose default-rate heatmap for concentration risk identification.
MetricsEL · Gross exposure · Grade mix
Breakdown4 purposes · 4 regions
🏥 Claims
Insurance Claims Analytics
Monthly claim volume trends, smoker vs non-smoker distributions, BMI vs claim scatter, and policy-type breakdowns across 1,000 synthetic policyholders.
SegmentsBasic · Standard · Premium
Risk flagsSmoker · BMI>35 · Age>60
⚖️ Loss Ratio
Loss & Combined Ratio Monitor
Tracks loss ratio vs combined ratio (LR + 25% expense ratio) by month, with a region × policy-type heatmap and profitability threshold line at LR = 1.0.
ThresholdBreak-even at LR = 1.0
Expense ratioFixed 25% for combined ratio
Machine Learning Stack

Three Production-Style Models + Statistical Risk Engine

Each model is trained at application startup using Scikit-learn on in-memory synthetic data, then serialised via a StandardScaler pipeline. The REST API endpoints receive JSON from browser forms and return scored results in milliseconds — the same pattern used in real-world risk systems.

Random Forest Classifier — Credit Risk
Features: Age, Income, Debt Ratio, Credit Score, Employment Years, Loan Amount
100 trees · sklearn
Gradient Boosting Classifier — Fraud Detection
Features: Amount, Hour, Foreign flag, Velocity, Merchant Risk (OHE)
100 estimators · sklearn
Logistic Regression — Underwriting Risk
Features: Age, BMI, Smoker, Children, Vehicle Age, Region (OHE)
Binary · L2 reg · sklearn
Historical Simulation — Market Risk (VaR/CVaR)
252 daily returns · Rolling 21-day window · Sharpe, Drawdown, Expected Shortfall
NumPy · percentile-based
⚙️

REST API Endpoints for Live Scoring

Three POST routes — /api/credit, /api/fraud, and /api/underwriting — accept JSON from browser forms and return model predictions without a page reload. This mirrors the microservice pattern of real-world risk systems where scoring engines are decoupled from the reporting layer.

Interactive Explorer

Simulated Risk Outputs by Module

Select a risk module below to see representative outputs. These illustrate what the live app returns for typical inputs.

Illustrative outputs based on synthetic data patterns in the app. Live app scores inputs in real time via trained ML models.

Performance Snapshot

Risk Metrics Across the Synthetic Portfolio

Default Rate by Credit Grade
Fraud by Transaction Channel
Loss Ratio by Policy Type

Grade F borrowers show ~4× the default rate of Grade A. The Random Forest feature importance highlights credit score and debt ratio as the two dominant predictors.

ATM and online channels carry the highest fraud rates — consistent with card-not-present and skimming patterns. Night-hour (0–5h) transactions account for a disproportionate share of fraud flags.

All three policy tiers sit above the break-even LR of 1.0 when the 25% expense ratio is added — illustrating the underwriting risk challenge the platform is designed to surface.

Design Decisions

Three Engineering Choices That Define the App

🧩
Single-file deploy
The entire app — data generation, ML training, 7 pages, 3 REST endpoints, and all CSS — lives in one app.py. HuggingFace Spaces needs only a Dockerfile and requirements.txt.
🔁
Train-on-boot
Models are trained fresh at startup on synthetic data with random_state=42. This keeps the deployment artefact-free while ensuring reproducible, deterministic outputs every time.
📡
JSON-first charts
Plotly figures are serialised server-side and injected into HTML as JSON literals. The browser calls Plotly.react() — ensuring charts are responsive, interactive, and theme-aware without extra round-trips.