Fixing the UX for one-time tasks on Kubernetes
I wanted to run a container for a customer only once, but the UX just wasn't simple enough. So I created a new OSS utility with Go and the Kubernetes API.
A Deployment will usually restart, so will a Pod. You can set the "restartPolicy" to "Never", but it's still not a good fit for something that's meant to start - run and complete.
Why would you want to run a one-time job?
Here are a few ideas:
- Running kubectl within the cluster under a specific Service Account
- Checking
curl
works from within the cluster - Checking inter-pod DNS is working
- Cleaning up a DB index
- A network scan with nmap
- Tasks that you'd usually run on cron
- Dynamic DNS updates
- Generating a weekly feed / report / update / sync
- Running a container build with Kaniko
Another use-case may be where you're running bash, but need to execute something within the cluster and get results back before going further.
I'll show you what working with a Job looks like today and how we can make it better with a little Go code.
What a Job looks like
So I looked again at the Kubernetes Job, which is an API that I rarely see used. They look a bit like this:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
suspend: true
parallelism: 1
completions: 5
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Always
backoffLimit: 4
Example from the Kubernetes docs
So the overall spec is similar to a Pod, but it's unfamiliar enough to cause syntax errors when composing these by hand.
What's more: a Job isn't something that you run, then get its logs and continue with your day.
You have to describe the job, to find a Pod that it created with a semi-random name, and then get the logs from that.
kubectl describe jobs/myjob
Name: myjob
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12m job-controller Created pod: myjob-hlrpl
Normal SuccessfulDelete 11m job-controller Deleted pod: myjob-hlrpl
Normal Suspended 11m job-controller Job suspended
Normal SuccessfulCreate 3s job-controller Created pod: myjob-jvb44
Normal Resumed 3s job-controller Job resumed
So you end up with something like this:
apiVersion: batch/v1
kind: Job
metadata:
name: checker
namespace: openfaas
spec:
completions: 1
parallelism: 1
template:
metadata:
name: checker
spec:
serviceAccount: openfaas-checker
containers:
- name: checker
image: ghcr.io/openfaas/config-checker:latest
imagePullPolicy: Always
restartPolicy: Never
And in my instance, I also wanted RBAC.
Then you'd:
- Apply the RBAC file
- Apply the Job
- Describe the Job
- Find the Pod name
- Get the Pod logs
- Delete the Job/Pod
#!/bin/bash
JOBUUID=$(kubectl get job -n openfaas $JOBNAME -o "jsonpath={.metadata.labels.controller-uid}")
PODNAME=$(kubectl get po -n openfaas -l controller-uid=$JOBUUID -o name)
kubectl logs -n openfaas $PODNAME > $(date '+%Y-%m-%d_%H_%M_%S').txt
kubectl delete -n openfaas $PODNAME
kubectl delete -f ./artifacts/job.yaml
Can we do better?
I think we can. And in a past life, all the way back in 2017, before Kubernetes won me over, I was using Docker Swarm.
I wrote a little tool called "jaas" and it turned out to be quite popular with people using it for one-time tasks that would run on Docker Swarm.
View the code: alexellis/jaas
I also remembered that Stefan Prodan who was a friend of mine and a past contributor to OpenFaaS had once had this itch and created something called kjob.
Stefan's not touched the code for three years, but his approach was to take a CronJob as a template, then to add in a few alterations. It's flexible because it means you can do just about anything you want with the spec, then you override one or two fields.
But I wanted to move away from lesser-known Kubernetes APIs and get a user-experience that could be as simple as:
kubectl apply -f ./artifacts/rbac.yaml
run-job ./job.yaml -o report.txt
Enter run-job
So that's how run-job was born.
My previous example became:
name: checker
image: ghcr.io/openfaas/config-checker:latest
namespace: openfaas
service_account: openfaas-checker
And this was run with run-job -f job.yaml
.
Then I thought, let's make this more fun. The cows function in OpenFaaS is one of the most popular for demos and was also part of the codebase for jaas.
Here's its Dockerfile:
FROM --platform=${TARGETPLATFORM:-linux/amd64} alpine:3.16 as builder
ARG TARGETPLATFORM
ARG BUILDPLATFORM
ARG TARGETOS
ARG TARGETARCH
RUN mkdir -p /home/app
RUN apk add --no-cache nodejs npm
# Add non root user
RUN addgroup -S app && adduser app -S -G app
RUN chown app /home/app
WORKDIR /home/app
USER app
COPY package.json .
RUN npm install --omit=dev
COPY index.js .
CMD ["/usr/bin/node", "./index.js"]
It's multi-arch and you can either use buildx or faas-cli to compile it for various architectures.
$ cat <<EOF > cows.yaml
# Multi-arch image for arm64, amd64 and armv7l
image: alexellis2/cows:2022-09-05-1955
name: cows
EOF
Here's what it looks like:
$ run-job -f cows.yaml
() ()
()()
(oo)
/-------UU
/ | ||
* ||w---||
^^ ^^
Eh, What's up Doc?
With kubectl get events -w
we can see what's actually happening behind the scenes:
0s Normal SuccessfulCreate job/cows Created pod: cows-5qld5
0s Normal Scheduled pod/cows-5qld5 Successfully assigned default/cows-5qld5 to k3s-eu-west-agent-1
0s Normal Pulling pod/cows-5qld5 Pulling image "alexellis2/cows:2022-09-05-1955"
0s Normal Pulled pod/cows-5qld5 Successfully pulled image "alexellis2/cows:2022-09-05-1955" in 877.707423ms
1s Normal Created pod/cows-5qld5 Created container cows
0s Normal Started pod/cows-5qld5 Started container cows
0s Normal Completed job/cows Job completed
An example to use at work
But we can also do for-work kinds of things with one-shot tasks.
Imagine that you wanted to get metadata on all your Kubernetes nodes.
You'd need an RBAC file for that:
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubectl-run-job
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app: run-job
name: kubectl-run-job
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app: run-job
name: kubectl-run-job
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kubectl-run-job
subjects:
- kind: ServiceAccount
name: kubectl-run-job
namespace: default
---
Then you'd need a container image with kubectl pre-installed, so let's make that:
FROM --platform=${TARGETPLATFORM:-linux/amd64} alpine:3.16 as builder
ARG TARGETPLATFORM
ARG BUILDPLATFORM
ARG TARGETOS
ARG TARGETARCH
RUN mkdir -p /home/app
# Add non root user
RUN addgroup -S app && adduser app -S -G app
RUN chown app /home/app
RUN apk add --no-cache curl && curl -SLs https://get.arkade.dev | sh
WORKDIR /home/app
USER app
RUN arkade get kubectl@v1.24.1 --quiet
CMD ["/home/app/.arkade/bin/kubectl", "get", "nodes", "-o", "wide"]
We'd also create a YAML file for the job:
name: get-nodes
image: alexellis2/kubectl:2022-09-05-2243
namespace: default
service_account: kubectl-run-job
Finally, we'd run the job:
$ kubectl apply ./examples/kubectl/rbac.yaml
$ run-job -f ./examples/kubectl/kubectl_get_nodes_job.yaml
Created job get-nodes.default (4097ed06-9422-41c2-86ac-6d4a447d10ab)
....
Job get-nodes.default (4097ed06-9422-41c2-86ac-6d4a447d10ab) succeeded
Deleted job get-nodes
Recorded: 2022-09-05 21:43:57.875629 +0000 UTC
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3s-server-1 Ready control-plane,etcd,master 25h v1.24.4+k3s1 192.168.2.1 <none> Raspbian GNU/Linux 10 (buster) 5.10.103-v7l+ containerd://1.6.6-k3s1
k3s-server-2 Ready control-plane,etcd,master 25h v1.24.4+k3s1 192.168.2.2 <none> Raspbian GNU/Linux 10 (buster) 5.10.103-v7l+ containerd://1.6.6-k3s1
k3s-server-3 Ready control-plane,etcd,master 25h v1.24.4+k3s1 192.168.2.3 <none> Raspbian GNU/Linux 10 (buster) 5.10.103-v7l+ containerd://1.6.6-k3s1
Now, you can also specify the command and arguments for the command in the job file, so you could get the same output in JSON and pipe it through to jq to automate things in bash:
name: get-nodes
image: alexellis2/kubectl:2022-09-05-2243
namespace: default
service_account: kubectl-run-job
command:
- kubectl
args:
- get nodes
- -o
- json
Summing up
The point of run-job really is to make running a one-time container on Kubernetes simpler than the experience of crafting YAML and typing in half a dozen kubectl commands.
If you think it may be useful for you, then run-job is available via arkade get run-job
along with over 100 other CLIs that download much quicker than brew.
We've also had the first contribution to the repo, which was to add the "command" and "args" options to the YAML file. The idea isn't to replicate every field in the Pod and Job specification, but just enough to make "run-job -f job-file.yaml` useful and convenient for supporting customers and running ad-hoc tasks.
Star on GitHub: alexellis/run-job
At OpenFaaS Ltd, we've used run-job with a customer to check their OpenFaaS Deployments and the configuration of their functions. The container is a Go program and you may like to take a look at it, to adapt it to check your own systems?
For those of us who are interested in manipulating and querying a Kubernetes cluster from code, the Go client provides a really good experience with hundreds of OSS examples available.
Read the code: openfaas/config-checker
If you're really into checking remote Kubernetes clusters for your customers, then Replicated have a similar, but more generic tool to openfaas/config-checker called "troubleshoot" which can be used to create bundles.
The bundles collect data from the customer's environment for you and you get to write in a DSL.