Ansible playbook for installing Zabbix and its templates for OpenStack (through Zabbix user parameters).
In the following screenshot you can see the final result:
As you can see, hosts are segregated by groups (controllers, compute, ceph, external Horizon, IdM, storage, etc.), and the items we monitore in each group are different.
- SNMP of the servers IPMI
- CPU (specially load above 30), memory (above 85% because of KSM), disk and network interfaces
- Status of the network bonds
- Number of processes/workers for each OpenStack Linux service (
- Response times of every OpenStack API
- Status of Open vSwitch services
- Pacemaker cluster status
- Galera SQL cluster status
- Health of RabbitMQ e Redis
- Status of OpenStack agents
openstack compute service list
openstack agent list
openstack volume service list
- The presence of virtual machines, volumes or volume snapshots in error state
- Presence of failed multipath routes
- Problems with LVM mappings
- Incongruences with the Device Mapper (DM)
Some Zabbix user parameters needed modifications in standard SELinux policies in order to properly work.
- Number of virtual routers
- Number of namespaces in the controller nodes
- Number of virtual machines
- Number of assigned floating IPs
- Memory used by virtual machines
- vCPUs used
- Storage IOPs / throughput
- North-South traffic
The templates will surely need some customization work to adapt to each customer environment. They include an Ansible playbook to automate the Zabbix agent installation along with firewall liberations, sudoers file customizations, Zabbix user agents and corresponding scripts, SELinux policies, etc.