Code (GitHub) → GitHub Actions
├─ Train + test (pytest)
├─ Lint (ruff)
├─ Build Docker image
├─ Push to GHCR
└─ Deploy to Azure Container Apps
↓
https://<endpoint>.azurecontainerapps.io
/health · /predictThe problem
A trained model in a Kaggle notebook is not a product. To use it from anywhere, it needs to be:
- Packaged into a reproducible artefact (Docker image).
- Served over HTTP with a stable contract (FastAPI).
- Deployed on infrastructure that scales when called, sleeps when idle.
- Re-deployable safely on every code change (CI/CD).
- Monitored so you don't wake up to a €300 cloud bill.
This project ticks all five — on Microsoft Azure specifically, because the CV signal I needed was Azure / MLOps / cloud / CI-CD.
The model
- Dataset: Forest Cover Type Prediction (Kaggle / UCI). 581k samples, 54 features. 7 forest cover classes.
- Algorithm: Random Forest (scikit-learn). Tuned with
RandomizedSearchCV. - Accuracy: ~87 % on held-out test set.
- Serialisation:
joblib.dump()→ loaded by FastAPI on startup.
The model itself is intentionally simple — the project is about the infrastructure around it, not the algorithm.
The architecture
- FastAPI —
/healthfor liveness,/predictfor inference. - Docker —
python:3.11-slim, copy model + code, runuvicorn. - GitHub Actions — two workflows: CI on every push, deploy on merge to main.
- OIDC federated credential — no long-lived secrets stored in GitHub.
- Azure Container Apps — managed, scale-to-zero, autoscale on traffic.
The cost discipline
Cloud bills are the silent risk. Three guardrails:
- Scale-to-zero —
--min-replicas 0. Idle cost: €0. - Budget alert at €5/month. Email + Slack on spend.
- GHCR public package — Container Apps pulls without registry credentials, no extra ACR cost.
Real monthly cost: €0–2. Worst case (scale-to-zero fails): ~€30/month.
What I learned
- The IE alumni Microsoft account is a mess — IE tenant, blocked trial. Used a personal Gmail-tied account instead.
- OIDC federated credentials > stored secrets. Setting it up takes 20 min, saves you from a leaked key every quarter.
scale-to-zerois non-negotiable for portfolio projects. Without it, your "live endpoint" is a slow bleed.- The model serialisation is the boring part. The boring part is what breaks in production.
Tech stack
| Layer | Tool |
|---|---|
| Model | scikit-learn (Random Forest) |
| Serving | FastAPI + Uvicorn |
| Container | Docker, python:3.11-slim |
| Registry | GitHub Container Registry (ghcr.io) |
| CI/CD | GitHub Actions (CI + Deploy workflows) |
| Auth | OIDC federated credential (Azure AD ↔ GitHub) |
| Hosting | Azure Container Apps |
| Cost mgmt | Azure budget alerts |
| Test | pytest |
| Lint | ruff |