1 files changed, 18 insertions, 0 deletions
diff --git a/goals b/goals
new file mode 100644
index 0000000..9bbec82
--- /dev/null
+++ b/goals
@@ -0,0 +1,18 @@
+### Task
+1. Stand up a local K8s cluster with `kind`, `k3d`, or `minikube`. Document exact versions.
+2. Write a Helm chart (or use the upstream vLLM/SGLang chart and extend it) that deploys a small open-weights model — e.g. `Qwen2.5-0.5B-Instruct`, `Llama-3.2-1B-Instruct`, or any model that fits on CPU/small GPU. CPU-only inference is acceptable.
+3. Wrap it in Terraform (or OpenTofu) using the `helm` and `kubernetes` providers.
+4. Expose an OpenAI-compatible endpoint through a K8s Service / Ingress and prove it works with a `curl` example in the README.
+5. Observability: scrape `/metrics` from the inference pod with Prometheus and show at least one dashboard or PromQL query for request latency and GPU/CPU utilization.
+6. Two environments — `dev` and `prod` — differ by at least: replica count, resource requests/limits, and model choice. Use Terraform workspaces, tfvars, or environment directories; justify your choice.
+Stretch Goals
+- Deploy a separate application container containing an agentic system utilizing the deployed vLLM/SGLang as the backend model server. The agent system's use-case is free to you to choose.
+- HPA based on a custom metric (e.g. queue depth or tokens/sec)
+- Image digest pinning and an `atlantis.yaml` or equivalent GitOps config
+- A smoke-test job that runs post-deploy and fails the apply if the endpoint is unhealthy
+You will be assessed on the following criteria:
+- the correctness of its output (stochastic functions notwithstanding);
+- how reliable, testable, modular and clean your code is;
+- other interesting add-ons you can think of.

diff --git a/goals b/goals new file mode 100644 index 0000000..9bbec82 --- /dev/null +++ b/goals
@@ -0,0 +1,18 @@
	1	### Task
	2	1. Stand up a local K8s cluster with `kind`, `k3d`, or `minikube`. Document exact versions.
	3	2. Write a Helm chart (or use the upstream vLLM/SGLang chart and extend it) that deploys a small open-weights model — e.g. `Qwen2.5-0.5B-Instruct`, `Llama-3.2-1B-Instruct`, or any model that fits on CPU/small GPU. CPU-only inference is acceptable.
	4	3. Wrap it in Terraform (or OpenTofu) using the `helm` and `kubernetes` providers.
	5	4. Expose an OpenAI-compatible endpoint through a K8s Service / Ingress and prove it works with a `curl` example in the README.
	6	5. Observability: scrape `/metrics` from the inference pod with Prometheus and show at least one dashboard or PromQL query for request latency and GPU/CPU utilization.
	7	6. Two environments — `dev` and `prod` — differ by at least: replica count, resource requests/limits, and model choice. Use Terraform workspaces, tfvars, or environment directories; justify your choice.
	8
	9	Stretch Goals
	10	- Deploy a separate application container containing an agentic system utilizing the deployed vLLM/SGLang as the backend model server. The agent system's use-case is free to you to choose.
	11	- HPA based on a custom metric (e.g. queue depth or tokens/sec)
	12	- Image digest pinning and an `atlantis.yaml` or equivalent GitOps config
	13	- A smoke-test job that runs post-deploy and fails the apply if the endpoint is unhealthy
	14
	15	You will be assessed on the following criteria:
	16	- the correctness of its output (stochastic functions notwithstanding);
	17	- how reliable, testable, modular and clean your code is;
	18	- other interesting add-ons you can think of.