diff options
Diffstat (limited to 'goals')
| -rw-r--r-- | goals | 18 |
1 files changed, 18 insertions, 0 deletions
| @@ -0,0 +1,18 @@ | |||
| 1 | ### Task | ||
| 2 | 1. Stand up a local K8s cluster with `kind`, `k3d`, or `minikube`. Document exact versions. | ||
| 3 | 2. Write a Helm chart (or use the upstream vLLM/SGLang chart and extend it) that deploys a small open-weights model — e.g. `Qwen2.5-0.5B-Instruct`, `Llama-3.2-1B-Instruct`, or any model that fits on CPU/small GPU. CPU-only inference is acceptable. | ||
| 4 | 3. Wrap it in Terraform (or OpenTofu) using the `helm` and `kubernetes` providers. | ||
| 5 | 4. Expose an OpenAI-compatible endpoint through a K8s Service / Ingress and prove it works with a `curl` example in the README. | ||
| 6 | 5. Observability: scrape `/metrics` from the inference pod with Prometheus and show at least one dashboard or PromQL query for request latency and GPU/CPU utilization. | ||
| 7 | 6. Two environments — `dev` and `prod` — differ by at least: replica count, resource requests/limits, and model choice. Use Terraform workspaces, tfvars, or environment directories; justify your choice. | ||
| 8 | |||
| 9 | Stretch Goals | ||
| 10 | - Deploy a separate application container containing an agentic system utilizing the deployed vLLM/SGLang as the backend model server. The agent system's use-case is free to you to choose. | ||
| 11 | - HPA based on a custom metric (e.g. queue depth or tokens/sec) | ||
| 12 | - Image digest pinning and an `atlantis.yaml` or equivalent GitOps config | ||
| 13 | - A smoke-test job that runs post-deploy and fails the apply if the endpoint is unhealthy | ||
| 14 | |||
| 15 | You will be assessed on the following criteria: | ||
| 16 | - the correctness of its output (stochastic functions notwithstanding); | ||
| 17 | - how reliable, testable, modular and clean your code is; | ||
| 18 | - other interesting add-ons you can think of. | ||
