2 minutes
Fix health checks failing on target group’s nodes on EKS NLB
Problem
When using nginx ingress controller on EKS and a service of type AWS NLB is provisioned. In some instances helthchecks on node that are part of a target group of NLB can fail. Eventhough helath checks fail, sometimes all workloads seems to be working as normal. This is especially true if the numnber of nodes in the cluster are small. Symptoms also include showing only one nodes as healthy and the remianing nodes as unhealthy. Another symptom is when pod to pod communication fails. This can be quiet confusing.
Root cause
This problem is a side effect from a setting that is set in service definition of NLB loadbalancer
Kuberntes service has a property called externalTrafficPolicy:
. The value of this property can be either Local
or Cluster
.
Local means that this service ( which is basically a kube-proxy) can only route traffic to pods on the same node where it is running.
CLuster means this serive can route traffic that is on other nodes too.
But why does my workloads work somnetimes and why is there one node passing health check on NLB ?
This is because the serivce ( kube-proxy) sitting on the node will pass NLB’s health check. When Local is set , other nodes are not reachable by this service hence they fail health checks on NLB. When the target pod is sceduled conincidently on the same node where the service for NLB is running, all pods in that node are reachable.
Example yaml definition of a service definition for AWS NLB.
kind: Service
apiVersion: v1
metadata:
name: internal-ingress-nginx
namespace: internal-ingress-nginx
labels:
app.kubernetes.io/name: internal-ingress-nginx
app.kubernetes.io/part-of: internal-ingress-nginx
annotations:
# by default the type is elb (classic load balancer).
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
# this setting is to make sure the source IP address is preserved.
externalTrafficPolicy: Cluster
type: LoadBalancer
selector:
app.kubernetes.io/name: internal-ingress-nginx
app.kubernetes.io/part-of: internal-ingress-nginx
ports:
- name: http
port: 80
targetPort: http
- name: https
port: 443
targetPort: https
Pros and Cons
So it looks like having Cluster setting is a win win situation. However there might be cases where the target pods need to
be aware of source ip addresses of the requests. In these situation Local
is needed. Having Cluster
setting will drop source
source ip addresses which will be overwtitten by a cluster ip address. Hence the terminology Local implies that traffic is only routed to Local node where the service is running.
402 Words
2020-03-01 00:00 +0000