Monitoring OpenShift through its and/or Kubernetes API

While proceeding with CloudForms installation and configuration, I thought about the chance to monitor some pieces of the OpenShift v3 infrastructure using existing monitor tool (Zabbix for example).

In the example I'll describe, I want (for now) monitor the state of all the OpenShift nodes and the restart count for pods of some critical projects (I experienced some issues with Docker that result in multiple restarts for containers running in one node).

I'll show in this document the whole process I made to make it works (unfortunately googling and testing because there is no great documentation on this topic).

PLEASE NOTE: all the commands you'll see has been executed as system:admin, the thing is not necessary if you stay in your project scope and/or projects in which you're admin (but it would be mandatory for cluster policies).

First of all we need to setup at least a service account, this is the better way to handle it avoiding standard user.

A service account is usually associated to a project, so let's define it:

# cat > monitor-sa.yaml << EOF

apiVersion: v1

kind: ServiceAccount


  name: monitor


And then create it:

# oc create -f ./monitor-sa.yaml -n yourProject

serviceaccount "monitor" created


Then you have to give to service account some permissions to view the resources in its project:

# oc policy add-role-to-user view system:serviceaccount:yourProject:monitor


Finally supposing that you want that this service account may also view/read resources of other projects, you should plan to extend its permissions with:

# oc policy add-role-to-user view system:serviceaccount:yourProject:monitor -n anotherProject

# oc policy add-role-to-user view system:serviceaccount:yourProject:monitor -n oneMoreProject


Thanks to these policies you should be able to query the kubernetes API for the previously enabled projects, we'll see at end of this doc how to query these APIs.


Going forward, to enable our service account "monitor" for seeing the OpenShift Cluster status, we need to assign special policies:

# oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:yourProject:monitor

I've marked bold that part of the command, because that makes the role assignment working. (If you try to use the previous instruction "add-role-to-user" it won't work .. )


Now we've finally completed the configuration process, we can start testing if the policies we applied are actually working.


First of all we have to grab the name of the generated token for our Service Account:

# oc describe sa monitor -n yourProject

Name:        monitor

Namespace:    yourProject

Labels:        <none>


Tokens:               monitor-token-ds2b6



Image pull secrets:    monitor-dockercfg-0u01t


Mountable secrets:     monitor-token-re2s3



Then we can choose a token (OpenShift has already generated two tokens for us) for querying the REST API:

# oc describe secret monitor-token-ds2b6 -n yourProject

Name:        monitor-token-ds2b6

Namespace:    yourProject

Labels:        <none>







ca.crt:    1066 bytes

token:    eyKhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJhdGxhcy1wcm9kdWN0aW9uIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3ItdG9rZW4tZHMyYjYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibW9uaXRvciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjZmYzJlZTAyLTE4MjYtMTFlNi05NjFiLTAwNTA1NjhjNjY3OCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDphdGxhcy1wcm9kdWN0aW9uOm1vbml0b3IifQ.hmYza9ongOwAfNb4HhpaJBah22VBQxEsAFIjQu4RXVkE2OR-meYcZzp16YYDimvq6QAUama7PF5id9x5l8e_g9T6JdBPrlmAagAAAedNWOdmDtQ9OH4wQYfOipo1pddsyll4bO682VNAf9CW9adXdgXIUEiM9rKHhbt--B4Rehe-OOKcIxFeVe1deo46adziwfMQYjoa6K9CAFQPoLGUj4M3slb6XyBGSbmDWPTB8LRP0O1bfP9ZXRqsXb-O0IGNOMkkzR2Ox2ZaVU9CjjmkrchxB-_Pe4pkkv6rty1lRba0colhL4kwIHWi-yll2frEteNy0ZF6GcvpZw1ucd6WWz


So we're ready to test it:

# export TOKEN=eyKhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJhdGxhcy1wcm9kdWN0aW9uIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3ItdG9rZW4tZHMyYjYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibW9uaXRvciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjZmYzJlZTAyLTE4MjYtMTFlNi05NjFiLTAwNTA1NjhjNjY3OCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDphdGxhcy1wcm9kdWN0aW9uOm1vbml0b3IifQ.hmYza9ongOwAfNb4HhpaJBah22VBQxEsAFIjQu4RXVkE2OR-meYcZzp16YYDimvq6QAUama7PF5id9x5l8e_g9T6JdBPrlmAagAAAedNWOdmDtQ9OH4wQYfOipo1pddsyll4bO682VNAf9CW9adXdgXIUEiM9rKHhbt--B4Rehe-OOKcIxFeVe1deo46adziwfMQYjoa6K9CAFQPoLGUj4M3slb6XyBGSbmDWPTB8LRP0O1bfP9ZXRqsXb-O0IGNOMkkzR2Ox2ZaVU9CjjmkrchxB-_Pe4pkkv6rty1lRba0colhL4kwIHWi-yll2frEteNy0ZF6GcvpZw1ucd6WWz


# curl -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" -X GET https://console.mydomain.local:8443/api/v1/namespaces/yourProject/pods -k 2>/dev/null | jq '.items[]."status"."containerStatuses"[]."restartCount"'









$ curl -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" -X GET https://console.mydomain.local:8443/api/v1/nodes -k 2>/dev/null | jq '.items[]."status"."conditions"[]."status"'













As you can see the first HTTP GET gets information about the pods running in yourProject, instead the second one request information about the cluster's nodes.


Removing the "jq" command, that actually parse and filter the json output response, you'll obtain the full output, so you can keep digging on the available details and change what you could actually monitor.


That's all!

If you have any comments/questions please comment!