Operating rooms run late every day — and it's costing everyone.
Average scheduling error
Surgeons estimate case duration from memory and experience. Their guesses are systematically off by over 18 minutes — always in the same direction: too short.
Per hour of OR time wasted
Operating rooms are the most expensive resource in a hospital. One overrun case delays the next patient, causes staff overtime, and can cancel planned surgeries.
Patients are the same
A hernia repair in a healthy 30-year-old takes far less time than in a 70-year-old with diabetes and heart disease. Traditional scheduling ignores these differences entirely.
5 years of real surgery records from 3 hospitals.
Cases by procedure type
From hospital records to accurate predictions — 4 steps.
Collect & Clean
5 years of de-identified records. Outliers removed. Only elective adult cases included. 17,246 clean records.
Extract Patient Signals
20 pre-surgery variables: age, health complexity (ASA score), BMI, procedure type, surgeon, and day of week.
Train 8 AI Models
From simple statistics to deep neural networks. Each tested with 10-fold cross-validation to ensure fairness.
Benchmark vs. Surgeons
Every AI model compared directly against real surgeon estimates on 5 accuracy and bias metrics.
- Cerner EMR system
- 3 tertiary hospitals
- Elective cases only
- No emergency or cancelled
- Adults (18+) only
- Procedure type & approach
- ASA health complexity score
- Patient age & BMI
- Surgeon identity
- First/last case of day
- Day of week & ICD code
- Linear / Ridge / Lasso
- Support Vector Regression
- Random Forest
- Gradient Boosting (GBM)
- XGBoost
- Neural Network (ANN) ✓ Best
- 10-fold cross-validation
- Per-fold data preprocessing
- Bayesian hyperparameter tuning
- Statistical bias testing
- TRIPOD-AI reporting standard
The AI eliminates scheduling bias — outperforming surgeons on every metric.
vs 35.3 min for surgeons
= 10% more accurate
vs 34% for surgeons
= 8 percentage points better
78% of what drives surgery length is captured by the model
Only model with no significant bias
(all others p < 0.05)
How does each model compare?
Average prediction error in minutes — lower bars are better. The red dashed line shows how surgeons perform today.
Which models have a scheduling bias?
A bar extending left means the model consistently underestimates. Zero is perfect. Only the Neural Network achieves near-zero systematic error.
| Model | Avg. Error ↓ | % Error ↓ | Fit (R²) ↑ | Scheduling Bias | Systematically Biased? |
|---|---|---|---|---|---|
| 🤖 Neural Network (ANN) BEST | 31.8 min | 26% | 0.78 | −0.37 min ✓ | No (p=0.34) |
| XGBoost | 32.5 min | 27% | 0.78 | −2.63 min | Yes |
| Gradient Boosting (GBM) | 32.4 min | 27% | 0.78 | −2.67 min | Yes |
| Random Forest | 36.8 min | 26% | 0.77 | −2.69 min | Yes |
| Linear / Ridge / Lasso | 36.9 min | 31% | 0.72 | −1.7 to −2.1 min | Yes |
| 👨⚕️ Surgeon Estimate (current practice) | 35.3 min | 34% | 0.78 | −18.52 min ✗ | Yes (p<0.001) |
For Hospital Administrators
More accurate OR scheduling means fewer overruns, less overtime pay, and more procedures completed per day — directly improving throughput and revenue per OR.
For Surgical Teams
A schedule that reflects reality — not wishful thinking. Less rushed finishes, fewer late-night cases, and better preparation time between procedures.
For Patients
Fewer cancellations of same-day surgeries due to time overruns. Less waiting. When your procedure is scheduled, it happens — on time.