Maël Valais
Antoine
Le Squéren
Code and live slides!
Maël Valais, Software Engineer
Antoine Le Squéren, DevOps Engineer
A story about Vault, external-secrets, slow
skaffold run
, and how a one-liner Bash controller
did the trick.
"I maintain the cert-manager project. I aim to build the best Let's Encrypt experience on Kubernetes."
"I improve the developer experience at OneStock by providing an efficient development environment."
website
(3) order
(1) visit
(2) fetch stock
warehouse
(4) deliver
t-shirt in stock
website
(3) order
(1) visit
warehouse
store
(4) deliver
t-shirt not in stock
t-shirt in stock
(2) fetch stock
60 clients
=> 300 secrets
Internal secrets
10 secrets
3 environments
(dev, staging, prod)
External secrets:
5 secrets
Prod
deploy
ssh
docker-compose.yaml
secrets.env
bastion
Swarm
Staging (on 40 dev laptops)
secrets.env
docker daemon
deploy
docker-compose.yaml
🔥
(2) skaffold
run
Dev1 laptop
OVH managed cloud
Secret
Deployment
dev1-ns
Secret
Deployment
dev2-ns
skaffold.yaml
helm manifests
secrets.env
(1) load
(2) skaffold
run
Dev laptop
Kubernetes
Secret
Deployment
dev1-ns
Secret
Deployment
dev2-ns
skaffold.yaml
helm manifests
secrets.env
(1) load
leak!
Vault
Dev laptop
skaffold.yaml
helm manifests
(1) vault kv get
(2) skaffold
run
Secret
Deployment
dev1-ns
/secrets/prod/postgres
/secrets/dev/postgres
⏳
Kubernetes
(1) skaffold run
Secret
"postgres"
(2) fetch
(3)
create
Kubernetes
ExternalSecret
"postgres"
/secrets/prod/postgres
/secrets/dev/postgres
Vault
External secrets operator
Developer commands
Self-healing, Consistency, Desired vs. Observed state
edge-triggered action
Kubernetes
Desired state
"replicas=5"
"linux processes=2"
Observed state
≠
level-triggered action
user interaction
user interaction
action
transfer money
consistent state (desired = observed)
(transactional)
observed state
desired state
replicas=5
action
linux processes=2
kubelet creates container
but always consistent
not able to recover data inconsistencies
no data consistency
but can recover from inconsistencies
SUM(balance)
FROM accounts;
fact
constraint
Bank
Desired state
"sum of balances is 0"
Observed state
"sum of balances in DB"
=
observed state
desired state
action
Self-healing in external-secrets operator
postgres
in sync
redis
out of sync
postgres in sync
password in Vault matches Secret in Kubernetes
postgres out of sync
password in Vault does not match Secret in Kubernetes
vault get &&
kubectl patch secret
Kubernetes
leak!
Secret
ingress API
dev2-ns
Data
Secret
ingress API
dev1-ns
Data
Vault
/secrets/dev/api
dev1-ns
ingress API
Secret
40 developers
=> 400 random passwords
60 clients
Internal secrets
10 secrets
External secrets:
5 secrets
(0) vault put <randompass>
(1) skaffold run
Secret
"postgres"
(2) fetch
(3) create
Kubernetes
ExternalSecret
"postgres"
/secrets/dev-2/postgres
/secrets/dev-1/postgres
Vault
⏳
External secrets operator
Developer commands
Defining a controller: what are the desired and observed states?
Action:
$ vault kv put secret/dev-1/postgres password=random
======= Metadata =======
Key Value
--- -----
created_time 2022-06-26T15:37:26.01313574Z
custom_metadata <nil>
deletion_time n/a
destroyed false
version 30
"Run vault put password=random
"
Desired state:
"No external secret is stuck with 'secret not found' due to a missing secret in Vault."
$ kubectl get externalsecret
NAME KEY PROPERTY READY REASON MESSAGE
redis secret/dev-1/redis password True SecretSynced Secret was synced
postgres secret/dev-1/postgres password False SecretSyncedError Could not get secret data from provider
This means 'secret not found'
apiVersion: external-secrets.io/v1beta
kind: ExternalSecret
status:
conditions:
- type: Ready
status: False
reason: SecretSyncedError
message: Secret key was not found
Observed state:
"Run
kubectl get externalsecret
and I
look for
SecretSyncedError
"
kubectl --watch
to avoid polling ExternalSecrets
Get alerted as soon as
SecretSyncedError
appears
Writing our one-liner controller
kubectl get externalsecret --watch -ojson \
| jq 'select(.status.conditions[]?.reason == "SecretSyncedError")' --unbuffered \
| jq '.spec.data[0].remoteRef' --unbuffered \
| jq '"\(.key) \(.property)"' -r \
| while read key property
do
vault kv put $key $property=somerandomvalue
done
so that we can use
jq
because we are piping jq
We only need to take action when
SecretSyncedError
exists
Action
Observe state
Our one-liner controller in action!
A real controller runs inside a Pod, right?
Kubernetes
helm install
dev-1
dev-1 namespace
ExternalSecret
"postgres"
Secret
"postgres"
= external-secrets operator
= our controller
./controller.sh
controller pod
kubectl get --watch
(observe state)
(reconcile action)
vault put
vault
secret/dev-1/postgres
Visualising the controller pod in the cluster
fetch
create
A real controller runs inside a Pod, right?
Let us write a Dockerfile and a Deployment manifest
#! /bin/bash
kubectl get externalsecret --watch -ojson \
| jq 'select(.status.conditions[]?.reason == "SecretSyncedError")' --unbuffered \
| jq '.spec.data[0].remoteRef' --unbuffered \
| jq '"\(.key) \(.property)"' -r \
| while read key property
do
vault kv put $key $property=somerandomvalue
done
controller.sh
FROM alpine:3.16
# The "setcap -r" is detailed in https://github.com/hashicorp/vault/issues/10924.
RUN tee -a /etc/apk/repositories <<<"@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" \
&& apk add --update --no-cache bash curl jq kubectl@testing vault libcap \
&& setcap -r /usr/sbin/vault
COPY controller.sh /usr/local/bin/controller.sh
CMD ["controller.sh"]
Dockerfile
apiVersion: apps/v1
kind: Deployment
metadata:
name: controller
spec:
replicas: 1
selector:
matchLabels: {name: controller}
template:
metadata:
labels: {name: controller}
spec:
containers:
- name: controller
image: controller:local
imagePullPolicy: Never
env:
- name: VAULT_ADDR
value: http://vault.vault:8200
- name: VAULT_TOKEN
valueFrom:
secretKeyRef:
name: vault-token
key: vault-token
serviceAccountName: controller
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: controller
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: controller
subjects:
- kind: ServiceAccount
name: controller
roleRef:
name: external-secrets-reader
kind: Role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: external-secrets-reader
rules:
- apiGroups: [external-secrets.io]
resources: [externalsecrets]
verbs: [get, list, watch, update, patch]
deploy.yaml
A real controller runs inside a Pod, right?
The controller pod in action
What now?
Use conditions to alert user when something goes wrong
Users won't know when something goes wrong
apiVersion: external-secrets.io/v1beta
kind: ExternalSecret
metadata:
name: postgres
spec:
data:
- remoteRef:
conversionStrategy: Default
key: secret/dev-1/postgres
property: password
secretKey: password
refreshInterval: 5s
secretStoreRef:
name: vault-backend
target:
name: postgres
status:
conditions:
- type: Ready
status: False
reason: SecretSyncedError
message: Secret key was not found
- type: Created
status: False
reason: VaultConnError
message: Vault returned 403 unauthorized
What now?
Use Go and controller-runtime
Slow sequential processing
due to the while
loop
func main() {
mgr, _ := manager.New(config.GetConfigOrDie(), manager.Options{})
c, err := controller.New("ext-secrets-vault-creator", mgr, controller.Options{
Reconciler: reconcile.Func(func(ctx context.Context, r reconcile.Request) (reconcile.Result, error) {
extsecret := v1.ExternalSecret{}
err := mgr.GetClient().Get(ctx, r.NamespacedName, &secret)
// vault kv put
return reconcile.Result{}, nil
}),
})
}
no more slow sequential processing, i.e., controller can handle hundreds of ExternalSecrets
Maël Valais
Antoine Le Squéren