Monitoring OpenShift through its and/or Kubernetes API
While proceeding with CloudForms installation and configuration, I thought about the chance to monitor some pieces of the OpenShift v3 infrastructure using existing monitor tool (Zabbix for example).
In the example I'll describe, I want (for now) monitor the state of all the OpenShift nodes and the restart count for pods of some critical projects (I experienced some issues with Docker that result in multiple restarts for containers running in one node).
I'll show in this document the whole process I made to make it works (unfortunately googling and testing because there is no great documentation on this topic).
PLEASE NOTE: all the commands you'll see has been executed as system:admin, the thing is not necessary if you stay in your project scope and/or projects in which you're admin (but it would be mandatory for cluster policies).
First of all we need to setup at least a service account, this is the better way to handle it avoiding standard user.
A service account is usually associated to a project, so let's define it:
# cat > monitor-sa.yaml << EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: monitor
EOF
And then create it:
# oc create -f ./monitor-sa.yaml -n yourProject
serviceaccount "monitor" created
Then you have to give to service account some permissions to view the resources in its project:
# oc policy add-role-to-user view system:serviceaccount:yourProject:monitor
Finally supposing that you want that this service account may also view/read resources of other projects, you should plan to extend its permissions with:
# oc policy add-role-to-user view system:serviceaccount:yourProject:monitor -n anotherProject
# oc policy add-role-to-user view system:serviceaccount:yourProject:monitor -n oneMoreProject
Thanks to these policies you should be able to query the kubernetes API for the previously enabled projects, we'll see at end of this doc how to query these APIs.
Going forward, to enable our service account "monitor" for seeing the OpenShift Cluster status, we need to assign special policies:
# oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:yourProject:monitor
I've marked bold that part of the command, because that makes the role assignment working. (If you try to use the previous instruction "add-role-to-user" it won't work .. )
Now we've finally completed the configuration process, we can start testing if the policies we applied are actually working.
First of all we have to grab the name of the generated token for our Service Account:
# oc describe sa monitor -n yourProject
Name: monitor
Namespace: yourProject
Labels: <none>
Tokens: monitor-token-ds2b6
monitor-token-re2s3
Image pull secrets: monitor-dockercfg-0u01t
Mountable secrets: monitor-token-re2s3
monitor-dockercfg-0u01t
Then we can choose a token (OpenShift has already generated two tokens for us) for querying the REST API:
# oc describe secret monitor-token-ds2b6 -n yourProject
Name: monitor-token-ds2b6
Namespace: yourProject
Labels: <none>
Annotations: kubernetes.io/service-account.name=monitor,kubernetes.io/service-account.uid=6fa2ee01-1926-11e6-961f-0050568c6666
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1066 bytes
token: eyKhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJhdGxhcy1wcm9kdWN0aW9uIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3ItdG9rZW4tZHMyYjYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibW9uaXRvciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjZmYzJlZTAyLTE4MjYtMTFlNi05NjFiLTAwNTA1NjhjNjY3OCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDphdGxhcy1wcm9kdWN0aW9uOm1vbml0b3IifQ.hmYza9ongOwAfNb4HhpaJBah22VBQxEsAFIjQu4RXVkE2OR-meYcZzp16YYDimvq6QAUama7PF5id9x5l8e_g9T6JdBPrlmAagAAAedNWOdmDtQ9OH4wQYfOipo1pddsyll4bO682VNAf9CW9adXdgXIUEiM9rKHhbt--B4Rehe-OOKcIxFeVe1deo46adziwfMQYjoa6K9CAFQPoLGUj4M3slb6XyBGSbmDWPTB8LRP0O1bfP9ZXRqsXb-O0IGNOMkkzR2Ox2ZaVU9CjjmkrchxB-_Pe4pkkv6rty1lRba0colhL4kwIHWi-yll2frEteNy0ZF6GcvpZw1ucd6WWz
So we're ready to test it:
# export TOKEN=eyKhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJhdGxhcy1wcm9kdWN0aW9uIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3ItdG9rZW4tZHMyYjYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibW9uaXRvciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjZmYzJlZTAyLTE4MjYtMTFlNi05NjFiLTAwNTA1NjhjNjY3OCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDphdGxhcy1wcm9kdWN0aW9uOm1vbml0b3IifQ.hmYza9ongOwAfNb4HhpaJBah22VBQxEsAFIjQu4RXVkE2OR-meYcZzp16YYDimvq6QAUama7PF5id9x5l8e_g9T6JdBPrlmAagAAAedNWOdmDtQ9OH4wQYfOipo1pddsyll4bO682VNAf9CW9adXdgXIUEiM9rKHhbt--B4Rehe-OOKcIxFeVe1deo46adziwfMQYjoa6K9CAFQPoLGUj4M3slb6XyBGSbmDWPTB8LRP0O1bfP9ZXRqsXb-O0IGNOMkkzR2Ox2ZaVU9CjjmkrchxB-_Pe4pkkv6rty1lRba0colhL4kwIHWi-yll2frEteNy0ZF6GcvpZw1ucd6WWz
#
# curl -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" -X GET https://console.mydomain.local:8443/api/v1/namespaces/yourProject/pods -k 2>/dev/null | jq '.items[]."status"."containerStatuses"[]."restartCount"'
0
0
0
8
0
0
0
10
$ curl -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" -X GET https://console.mydomain.local:8443/api/v1/nodes -k 2>/dev/null | jq '.items[]."status"."conditions"[]."status"'
"True"
"True"
"True"
"True"
"True"
"True"
"True"
"True"
"True"
"True"
"True"
As you can see the first HTTP GET gets information about the pods running in yourProject, instead the second one request information about the cluster's nodes.
Removing the "jq" command, that actually parse and filter the json output response, you'll obtain the full output, so you can keep digging on the available details and change what you could actually monitor.
That's all!
If you have any comments/questions please comment!