In a previous post, I created a simple deployment pipeline using Github Actions. A major drawback was having to trigger deployments from outside the Kubernetes cluster which risks exposing credentials. Additionally, a push-based approach means that a transient error when invoking the deployment operation would fail the pipeline and require manual intervention.
Kubernetes-native CI/CD tools promise to solve these problems. The idea is to run operators in Kubernetes that will run the pipeline within the cluster based on changes in your code repo.
My plan was to start with the CD portion first, keeping the CI portion in Github Actions. After some cursory research, ArgoCD seemed to be the clear winner here, so I chose it.
Installation was straightforward. However, to expose it using an ingress rule, I had to pass “–enable-ssl-passthrough” to my ingress-nginx controller. On DigitalOcean Kubernetes, this option is not set by default, so I needed to patch the controller deployment:
To make it easier to access from my local machine, I added a
/etc/hosts entry with a fake hostname (argocd.empapi.io) pointing to the public IP of the ingress load balancer. Then, I created an ingress rule (note the annotations for ssl passthrough):
We’ll keep our manifests in a separate git repo and structure it in the following way:
manifests ├── base │ ├── emp-api-deployment.yaml │ ├── emp-api-ingress.yaml │ ├── emp-api-svc.yaml │ └── kustomization.yaml ├── production │ ├── increase_replicas.yaml │ ├── ingress_hostname_patch.yaml │ └── kustomization.yaml └── staging ├── ingress_hostname_patch.yaml └── kustomization.yaml
This is the recommended layout for gitops with kustomize - base configs with overlays for different environments. You can view the manifests for the sample app on GitHub
What ArgoCD does is conceptually very simple - it takes a repo with a set of manifests and makes sure they are synced with the cluster. You can set it to continually watch a repo for changes, or manually trigger the sync process. It supports many kinds of manifests, including kustomize which is perfect for us.
The basic resource type is an “Application”. Let’s create an ArgoCD app for the staging environment:
The command is fairly self-explanatory. Note that the destination server is the local kubernetes API server. ArgoCD can deploy to different clusters which is probably what you want in an actual production setup – separate clusters for dev/staging/prod and ArgoCD running in an “admin” cluster capable of deploying to each.
When you first create the app, the resources in the manifest are in state “OutOfSync”:
Let’s sync it:
Then, wait for it to complete:
You can verify that the deploy was successful by checking the deployment status as well.
ArgoCD, as the name suggests, only does the deployment portion of a proper CI/CD pieline. We’ll need to pair it up with another tool that will do the CI portion and hook into ArgoCD to do deployment. I was planning to use Jenkins since I have experience with it, but coincidentally, I came across the DigitalOcean challenge. The challenge requires you to use Tekton. I’d never heard of Tekton before, and their claim of “K8s native CI/CD tool” got me interested.
Tekton is “serverless” in that you don’t need to maintain a Jenkins-like server. Everything is run in containers on Kubernetes. Pipelines and tasks are specified as CRDs so you can just
kubectl apply them. Keep in mind that the learning curve is a bit steep and there’s a lot of concepts to learn. I recommend going through the official docs, but here’s a summary:
- A step is some command(s) that run in a container
- A task is a series of steps in order. A task runs in a K8s pod. You can write your own tasks but there’s also a collection of reusable tasks in the Tekton Catalog
- A pipeline is a DAG of tasks. Pipelines are what you’d normally trigger when events of interest occur (a git push for example)
- An eventListener is a K8s service that listens for events from external systems e.g a webhook from Github.
- A taskRun is a specific execution of a task
- A pipelineRun is a specific execution of a pipeline. You’ll want to inspect the pipelineRun to debug issues.
Test PR pipeline
We’ll create two pipelines; a pipeline to test PRs, and a post-merge deployment pipeline. Let’s start with a barebones version of the the “test PR” pipeline, which simply checks out a git repo and merge a PR branch locally. First, we’ll need to install the
git-cli task from the Tekton catalog:
The pipeline YAML that I’ve annotated with comments:
I generated an SSH key-pair using
ssh-keygen and added the public key as a deploy key to my repo. Then, I created a directory
git-ssh-creds with the private key inside it and created a secret out of it:
kubectl create secret generic git-ssh-creds --from-file=git-ssh-creds
Apply the above yaml using
kubectl apply -f. Let’s create a
PipelineRun resource that will enable us to actually run the pipeline:
To test the pipeline, I created a dummy PR on the repo
juggernaut-patch-1. As mentioned earlier, we bind the SSH credentials secret to the
git-ssh-creds workspace. For the build workspace, we’ll request a K8s Persistent Volume Claim (PVC) with the specified storage. The storage will be dynamically provisioned at runtime. Applying the pipeline run resource will kick off pipeline execution. You can tail logs from the pipeline run:
NOTE: You will need to manually delete the pipeline run to clean up the associated PVC that is created dynamically. Use
tkn pipelinerun delete --all to clean up all finished pipeline runs. You can also specify a pre-provisioned static PV instead to avoid dynamic provisioning.
Next, we’ll verify that the PR is good by running tests. Let’s create a custom task to run golang tests. There already exists a catalog task to do this, but we need one that will output a “succeded” or “failed” result that we can pass back to GitHub to set the commit status. The catalog task simply returns a non-zero status and fails the pipeline if the tests fail. Again, I’ve provided inline comments where required:
We can now include this task in our pipeline. Notice the
runAfter directive to make sure we run the tests after checking out the repo - otherwise, tekton is free to run the tasks in any order (or in parallel).
I’ve also added tasks to set Github commit status based on test success/failure, which I’ll omit from this post for brevity. You can view the full pipeline YAML here
Hooking up the pipeline to GitHub
So far, we’ve only manually triggered the pipeline. For it to be actually useful, we’ll need to hook it up to listen to GitHub webhooks. The pipeline should be run whenever a PR is created or updated.
EventListeners will listen to external webhooks and associated
Triggers will kick off pipeline execution. First, we’ll create a dedicated service account to run the pipeline, instead of running it as admin:
Next, specify the
The idea is that EventListener receives webhooks and passes it to a referenced
TriggerBinding that can extract values from the webhook body and bind them to parameters. These parameters are accessible in the
TriggerTemplate that in turn passes them to the pipeline run (yes, this seems quite convoluted!).
After you apply the above resources, you’ll see a k8s service running for the event listener. Let’s get the name of the service:
The generated name of the service in this case is
el-pr-listener. We need to expose this service externally for GitHub to be able to access it. I have an nginx ingress already set up, so I created an ingress rule for it:
Now, we’re ready to create a GitHub webhook pointing to https://
Securing the webhook
We’ll need to ensure that we run the pipeline only on requests legitimately originating from GitHub. GitHub allows setting a secret token that we can verify on the server-side. Fortunately, Tekton already implements this validation as part of interceptors.
Let’s generate a secret string:
Configure this string as a secret in GitHub. Then, create a K8s secret:
Add the Github interceptor to the event listener:
We still have one more minor issue – we don’t want to run the pipeline when the PR is “closed”. Currently, any
pull_request event will kick off the pipeline. CEL interceptors help us solve this. Add a filter that only allows “opened”, “reopened” and “synchronize” PR events:
Find the full YAML for EventListener, Tirgger and TriggerTemplate here.
Let’s now focus on building the continuous deployment pipeline after the PR is merged1. At a high level, it’ll do the following steps:
- On “push” event from branch “main”, start the pipeline
- Check out repo
- Run unit tests
- Build container image and push to registry
- Deploy image in the
- Run cluster tests
- Deploy image to
Building the container image
Steps 1-3 are similar to what we already covered earlier, so I’ll skip them for brevity (get the code here) . Building Docker images from within a container environment could cause security issues. Since all Tekton steps run in a container, this presents a problem for us. Cue Kaniko. Kaniko can build images from
Dockerfiles without needing access to the docker daemon.
Again, we’ll make use of the Tekton catalog to reference the kaniko task.
Kaniko can also push images to a registry. It looks within the “dockerconfig” workspace for a docker-style
config.json. Create a config.json secret with your docker hub credentials:
We’ll mount the docker-config secret as a workspace similar to how we did the git credentials.
Deploying to staging
Now that we have our application image, we’ll deploy it to staging and run some integration tests to validate it before promoting it to production. We already have the deployment piece set up with ArgoCD. All we need to do is update the image version in Git and have ArgoCD sync it. Let’s run a task to
kustomize edit to bump the image tag – unfortunately, the Tekton catalog doesn’t have one handy, so I ended up writing one myself:
And reference it from the pipeline:
Next, we’ll commit the change to Git using the
git-cli task (skipping details here, see code).
We’re now ready to sync the app using ArgoCD – there’s a minor catch, though. We’ve so far run the
argocd commands as the default
admin user which has all possible permissions. Following the principle of least privilege, we should only give the Tekton pipeline enough permissions to be able to sync the app, and not delete it for example. Additionally, we should avoid providing the admin password to the pipeline.
ArgoCD has a built-in RBAC system we can use for this. Let’s create a
syncbot ArgoCD user:
Create a role that can only get or sync applications, and bind it to the
As usual, we’ll leverage the argocd-task-sync-and-wait catalog task in our pipeline. The task requires a secret named
argocd-env-secret containing credentials. Let’s generate an API token for the
syncbot user and create the required secret:
We also need to pass the ArgoCD server address. Since we are running it on the same cluster, simply give it the K8s service DNS name:
Finally, use the task in our pipeline:
That’s it! I’ll skip describing a couple more steps like running integration tests and promoting the deployment to prod because they’re similar to what I’ve already covered. In any case, you can find the full pipeline configuration here.
ArgoCD and Tekton are powerful tools that can be combined to build Kubernetes-native CI/CD pipelines. The learning curve (esp. Tekton) and initial setup time are high, but in the end you get a more capable and flexible result than say, Jenkins. Also, the pain of maintaining a Jenkins server goes away. That said, there are downsides too. Writing Tekton pipelines felt similar to programming, but in YAML, and YAML is not a programming language2. The experience feels clunky and error-prone, requiring a lot of trial-and-error to get right. To add to that, the documentation is subpar and provided examples use deprecated features like PipelineResources.