ML in Production
Models that ship and stay up
A prototype in a notebook is the easy 20%. These are the systems built for the other 80% — serving, monitoring, retraining, and graceful failure.
Real-time ranking service
In productionLow-latency ranking model serving personalized results, retrained nightly with automated evaluation gates.
- ▸p99 latency under 40ms at production traffic
- ▸Automated offline-eval gate blocks regressions before deploy
- ▸Shadow deployment + canary rollout for every model version
PythonFastAPIDockerRedisMLflow
Forecasting pipeline (batch)
In productionOrchestrated daily forecasting jobs with data-quality checks, backfills, and drift monitoring.
- ▸Idempotent DAGs with automated backfill on late data
- ▸Data-quality tests fail the run before bad numbers ship
- ▸Drift alerts route to on-call with a one-click runbook
AirflowdbtBigQueryPython
LLM-assisted text classification
PilotHybrid pipeline using a small fine-tuned model with an LLM fallback for long-tail cases.
- ▸Cost-aware routing: cheap model first, LLM only when uncertain
- ▸Human-in-the-loop review queue for low-confidence outputs
- ▸Full request/response logging for evaluation and audits
PythonTransformersLLM APIPrometheus