ML in Production

Models that ship and stay up

A prototype in a notebook is the easy 20%. These are the systems built for the other 80% — serving, monitoring, retraining, and graceful failure.

Real-time ranking service

In production

Low-latency ranking model serving personalized results, retrained nightly with automated evaluation gates.

  • p99 latency under 40ms at production traffic
  • Automated offline-eval gate blocks regressions before deploy
  • Shadow deployment + canary rollout for every model version
PythonFastAPIDockerRedisMLflow

Forecasting pipeline (batch)

In production

Orchestrated daily forecasting jobs with data-quality checks, backfills, and drift monitoring.

  • Idempotent DAGs with automated backfill on late data
  • Data-quality tests fail the run before bad numbers ship
  • Drift alerts route to on-call with a one-click runbook
AirflowdbtBigQueryPython

LLM-assisted text classification

Pilot

Hybrid pipeline using a small fine-tuned model with an LLM fallback for long-tail cases.

  • Cost-aware routing: cheap model first, LLM only when uncertain
  • Human-in-the-loop review queue for low-confidence outputs
  • Full request/response logging for evaluation and audits
PythonTransformersLLM APIPrometheus