Provision an Amazon EKS (Elastic Kubernetes Service)
IAM: EKS User (required)
This user is required so Valohai can deploy access the cluster and deploy new images to your ECR.
Create a user
enable
Programmatic access
and Console access

- Attach the following existing policies
- AmazonEC2ContainerRegistryFullAccess
- AmazonEKSServicePolicy
Click on
Create policy
to open a new tab. Describe the new policy with the JSON below.{ "Version": "2012-10-17", "Statement": [ { "Sid": "1", "Effect": "Allow", "Action": "eks:ListClusters", "Resource": "*" } ] }
Name the policy
and create it.
Back in your
Add user
tab click on the refresh button and select the VH_EKS_USER
policy.
Store the access key & secret in a safe place.
IAM: Admin user (optional)
This user is needed only if you want to give Valohai elevated permissions to install an EKS cluster in your subscription.
You can skip this IAM user if you’re creating the cluster yourself or using an existing cluster.
Create a user
enable
Programmatic access
and Console access

- Attach the following existing policies
- AmazonEKSClusterPolicy
- AmazonEC2FullAccess
- AmazonVPCFullAccess
- AmazonEC2ContainerRegistryPowerUser
- AmazonEKSServicePolicy
Click on
Create policy
to open a new tab. Describe the new policy with the JSON below.{ "Version": "2012-10-17", "Statement": [ { "Sid": "1", "Effect": "Allow", "Action": [ "iam:GetRole", "iam:ListRoleTags", "iam:CreateRole", "iam:DeleteRole", "iam:AttachRolePolicy", "iam:PutRolePolicy", "iam:PassRole", "iam:DetachRolePolicy", "iam:CreateServiceLinkedRole", "iam:GetRolePolicy", "iam:CreateOpenIDConnectProvider", "iam:GetRolePolicy", "eks:*", "cloudformation:*" ], "Resource": "*" }, { "Sid": "2", "Effect": "Allow", "Action": "ecr:*", "Resource": "arn:aws:ecr:*:*:repository/*" } ] }
Name the policy
and create it.
Back in your
Add user
tab click on the refresh button and select the VH_EKS_ADMIN
policy.
Store the access key & secret in a safe location.
Create the EKS cluster
You can also skip this section and use an existing cluster - or define different settings.
We’ll use eksctl , a simple CLI tool to create the cluster on EKS.
Start by logging in to the AWS CLI aws configure --profile valohai-eks-admin
and by passing in the right keys.
Then set the current profile with export AWS_PROFILE=valohai-eks-admin
Start the cluster creation
Below a sample command to start a new cluster creation with max four t3.medium
nodes and with a dedicated VPC.
Create a couple of env variables to make life easier:
export CLUSTER=<customer-name>-valohai
export REGION=<aws-region>
Then create the cluster:
eksctl create cluster \
--name $CLUSTER \
--region $REGION \
--nodegroup-name standard-workers \
--node-type t3.medium \
--nodes 1 \
--nodes-min 1 \
--nodes-max 4 \
--managed \
--asg-access \
This takes 10-15 minutes to go up.
Logs are available under CloudFormation on console or with CLI:
aws cloudformation describe-stack-events --stack-name eksctl-$CLUSTER-cluster
aws cloudformation describe-stack-events --stack-name eksctl-$CLUSTER-nodegroup-standard-workers
Setup kubeconfig
We’re defining a custom location for the config file (with –kubeconfig) to ensure we’re writing to an empty file instead of modifying to the default config.
aws eks --region $REGION update-kubeconfig --name $CLUSTER --kubeconfig ~/.kube/$CLUSTER
# now you can either give '--kubeconfig ~/.kube/$CLUSTER' to 'kubectl' commands
# or define `KUBECONFIG` for the session like below:
export KUBECONFIG=~/.kube/$CLUSTER
Check that the cluster is up and running:
kubectl get svc --kubeconfig ~/.kube/$CLUSTER
Setup the RBAC user on Kubernetes (required)
Create the files below to enable the valohai-eks-user
to deploy from Valohai to your cluster.
Create a Kubernetes user and map it to the IAM user:
with your own AWS Account's
mapUsers: |
- userarn: arn:aws:iam::<ACCOUNT-ID>:user/valohai-eks-user
username: valohai-eks-user
vim aws-auth-patch.yaml
kubectl -n kube-system patch configmap/aws-auth --patch "$(cat aws-auth-patch.yaml)" --kubeconfig ~/.kube/$CLUSTER
# you can check what it looks like with:
# kubectl -n kube-system get configmap/aws-auth -o yaml --kubeconfig ~/.kube/$CLUSTER
Create a namespace-reader
role that will give valohai-eks-user
permissions on the cluster:
cat <<EOF > rbacuser-clusterrole.yaml
kind: ClusterRole
name: namespace-reader
- apiGroups: [ "" ]
resources: [ "namespaces", "services" ]
verbs: [ "get", "watch", "list", "create", "update", "patch", "delete" ]
- apiGroups: [ "" ]
resources: [ "pods", "pods/log", "events" ]
verbs: [ "list","get","watch" ]
- apiGroups: [ "extensions","apps" ]
resources: [ "deployments", "ingresses" ]
verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
- apiGroups: [ "" ]
resources: [ "ingresses" ]
verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]
kubectl apply -f rbacuser-clusterrole.yaml --kubeconfig ~/.kube/$CLUSTER
# and verify changes with...
# kubectl get clusterrole/namespace-reader -o yaml --kubeconfig ~/.kube/$CLUSTER
Bind our cluster role and user together:
cat <<EOF > rbacuser-clusterrole-binding.yaml
kind: ClusterRoleBinding
name: namespace-reader-global
- kind: User
name: valohai-eks-user
kind: ClusterRole
name: namespace-reader
kubectl apply -f rbacuser-clusterrole-binding.yaml --kubeconfig ~/.kube/$CLUSTER
# and verify changes with...
# kubectl get clusterrolebinding/namespace-reader-global -o yaml --kubeconfig ~/.kube/$CLUSTER
Setup AWS EKS autoscaling
We’ll install cluster-autoscaler
to manage autoscaling on the AWS EKS cluster.
Create an IAM OIDC identity provider
eksctl utils associate-iam-oidc-provider --cluster $CLUSTER --approve
# whichever entity is running the above command must be able to do "iam:CreateOpenIDConnectProvider"
#aws eks describe-cluster \
# --name $CLUSTER \
# --query "cluster.identity.oidc.issuer" \
# --output text
# note that the resource target comes from the previous command
# "Version": "2012-10-17",
# "Statement": [
# {
# "Effect": "Allow",
# "Action": "iam:CreateOpenIDConnectProvider",
# "Resource": "arn:aws:iam::<ACCOUNT-ID>:oidc-provider/EXAMPLE7B896A512D065990B999222FC84"
# }
# ]
Create AWS IAM policy for cluster-autoscaler
In the next policy, you can also replace the "Resource"
limitation with a "*"
if getting the autoscaling group ARN is troublesome. The included Condition
should be enough. Otherwise, list all ASG ARNs that are part of the cluster.
# lists all ARNs of the autoscaling groups of the cluster...
aws autoscaling describe-auto-scaling-groups \
--query "AutoScalingGroups[?Tags[?Value == \`$CLUSTER\`]].AutoScalingGroupARN" \
--output text
# arn:aws:autoscaling:eu-west-1::autoScalingGroup:EXAMPLE:autoScalingGroupName/eks-EXAMPLE
# note that you will have to be able to create new AWS IAM roles...
cat <<EOF >> cluster-autoscaler-policy.json
"Version": "2012-10-17",
"Statement": [
"Effect": "Allow",
"Action": [
"Resource": [
"Condition": {
"StringEquals": {
"autoscaling:ResourceTag/": "true"
"Effect": "Allow",
"Action": [
"Resource": "*"
aws iam \
create-policy \
--policy-name ValohaiClusterAutoscalerPolicy \
--policy-document file://cluster-autoscaler-policy.json
rm cluster-autoscaler-policy.json
# record the printed ARN e.g. "arn:aws:iam:::policy/ValohaiClusterAutoscalerPolicy"
Create AWS IAM role and service account for cluster-autoscaler
eksctl create iamserviceaccount \
--name cluster-autoscaler \
--namespace kube-system \
--cluster $CLUSTER \
--attach-policy-arn arn:aws:iam:::policy/ValohaiClusterAutoscalerPolicy \
--approve \
# creates a Role that is something like...
# arn:aws:iam:::role/eksctl-sandbox-valohai-addon-iamserviceaccou-Role1-1M0AUUY1YCW5S
# and a Kubernetes service account like...
kubectl get -n kube-system serviceaccount/cluster-autoscaler -o yaml
Install cluster-autoscaler
# open in text editor and...
vim cluster-autoscaler-autodiscover.yaml
- Remove the “kind: ServiceAccount” section as we created that already with eksctl
- Find the “kind: Deployment” and…
- Replace <YOUR CLUSTER NAME> with the cluster name.
- Add the following env definition right below it, on the same level as command
env: - name: AWS_REGION value: eu-west-1 # or what region the cluster is in
- Then apply these changes
kubectl apply -f cluster-autoscaler-autodiscover.yaml kubectl get pods -n kube-system # cluster-autoscaler-7dd5d74dc5-qs8gj 1/1 Running kubectl logs -n kube-system cluster-autoscaler-7dd5d74dc5-qs8gj -f
Install NGINX Ingress Controller
Installing NGINX Ingress Controller which we use routing incoming connections to individual endpoints.
kubectl create namespace ingress-nginx
helm repo add ingress-nginx
helm repo update
# you can use `helm show values ingress-nginx/ingress-nginx` to get the all possible installation customization
helm install \
ingress-nginx \
ingress-nginx/ingress-nginx \
--version v3.31.0 \
--namespace ingress-nginx
Make sure ingress-nginx-controller
pod is running, might take a minute:
kubectl get pods --all-namespaces
# ingress-nginx nginx-ingress-controller-6885bc7778-rckm6 1/1 Running 0 2m15s
See that the ingress-nginx
is running and get the external address:
kubectl -n ingress-nginx get service/ingress-nginx-controller
# The external IP is something on the lines of `` or a raw IP
Now we should get a default NGINX 404 from the load balancer external IP:
Optional: Modify Load Balancer Security Group
You can now modify the NGINX configuration, like adding a load balancer security group on AWS to allow only certain CIDR ranges or Security Groups to access the endpoints.
kubectl -n ingress-nginx edit service/ingress-nginx-controller
# e.g. adding to annotations:
**If** you want to use TLS certificates generated through AWS, replace with the correct value of
# the generated certificate in the AWS console. Otherwise, setup free TLS/HTTPS in `3-deployment-https`. "arn:aws:acm:YYYYYYY:XXXXXXXX:certificate/XXXXXX-XXXXXXX-XXXXXXX-XXXXXXXX"
# Ensure the ELB idle timeout is less than nginx keep-alive timeout. By default,
# NGINX keep-alive is set to 75s. ELB is 60. If using WebSockets, the value will need to be
# increased to '3600' to avoid any potential issues. "60"
# **If** a customer wants to limit access to the endpoints, you might want to create a new security group where
# you can set the inbound rules, but you can also simply modify the existing one. "sg-XXXXXXXXX”
Send details to Valohai
Send Valohai engineers:
- valohai-eks-user access key ID and secret.
- AWS region of the cluster
- Details of the created cluster - Find these on the cluster’s page on EKS
- Cluster name
- API server endpoint
- Cluster ARN
- Certificate authority
External IP of the Load Balancer tied to the NGINX Ingress Controller (run
kubectl -n ingress-nginx get service/ingress-nginx-controller
ECR name - Copy the URL you see when creating a new repository in your ECR (for example
