Zabbix Server Cluster部署最佳实践
架构设计
使用软件
- REDHAT 8.4
- Mysql 8.0
- Zabbix 5.4
IP规划
vips for cluster
1 | 192.168.2.28 zabbix-ha-db |
db nodes
1 | 192.168.2.24 zabbix-db1 |
web nodes
1 | 192.168.2.26 zabbix-server1 |
服务器通用配置
- 时间同步
1 | 0 0 * * * /usr/sbin/ntpdate ntpserver >> /root/ntpdate.log 2>&1 ; /sbin/hwclock -w |
- 关闭防火墙
1 | systemctl stop firewalld |
- 关闭SElinux
1 | sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config |
- 配置hosts文件
1 | # vips for cluster |
- 配置yum源
1 | [base] |
数据库HA集群
集群安装
所有节点上执行:
- 安装HA组件
1 | yum install pcs pacemaker fence-agents-all |
- 给hacluster用户设置密码(最好相同)
1 | echo hacluster | passwd --stdin hacluster |
任意节点上执行:
- 用相同的密码验证所有节点
1 | pcs host auth zabbix-db1 zabbix-db2 -u hacluster -p hacluster |
- 创建database cluster和增加资源
1 | pcs cluster setup zabbix_db_cluster zabbix-db1 zabbix-db2 |
- 启用集群
1 | pcs cluster start --all |
- 检查集群状态
1 | [root@zabbix-db1 pcsd]# pcs status |
集群参数配置
- 禁用fencing
1 | pcs property set stonith-enabled=false |
- 忽略quorum状态
1 | pcs property set no-quorum-policy=ignore |
- 配置转移策略
1 | pcs resource defaults migration-threshold=1 |
创建Service和测试故障转移
- 创建VIP服务
1 | pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.2.28 op monitor interval=5s --group zabbix_db_cluster |
1 | [root@zabbix-db1 pcsd]# pcs status |
vip可以ping通
1 | [root@zabbix-db1 pcsd]# ping 192.168.2.28 |
IP信息里也能看到
1 | [root@zabbix-db1 pcsd]# ip addr |
- 通过crm_resource强制终止服务
1 | crm_resource --resource VirtualIP --force-stop |
通过crm_mon来监控资源的运行情况
1 | #crm_mon |
这里可以看到VirtualIP服务自动转移到了zabbix-db2节点上,由此验证了资源的自动故障转移。
1 | [root@zabbix-db2 pcsd]# ip addr |
- 避免频繁故障转移,意思就是当转移到节点二后可以一直在节点2运行,直到节点2出现异常
1 | pcs resource defaults resource−stickiness=100 |
Mysql安装
单节点安装
- 选择使用新版的mysql8.0
1 | yum install mysql-community-server |
这里由于redhat版本太新(8.4),导致只能手动本地安装
1 | [root@zabbix-db1 ~]# ll *.rpm |
- 启动mysql
1 | systemctl start mysqld |
- 修改初始密码
1 | sudo grep 'temporary password' /var/log/mysqld.log |
- 修改my.cnf
1 | [client] |
- 修改二号节点配置
1 | server_id = 2 ## Last number of IP |
Mysql同步配置
登录zabbix-db1
1 | #mysql −uroot −p |
登录zabbix-db2
配置db2作为db1的从库
1 | mysql −uroot −p<MYSQL_ROOT_PASSWORD> |
配置db2的同步账号
1 | mysql> create user 'rep'@'192.168.2.24' identified by 'MyNewPass4!'; |
重置db2的master,启动slave
1 | RESET MASTER; |
查看db2的master信息
1 | mysql> show master status\G; |
登录zabbix-db1
配置db1作为db2的从库
1 | CHANGE MASTER TO MASTER_HOST = '192.168.2.25', MASTER_USER = 'rep', MASTER_PASSWORD='MyNewPass4!', MASTER_LOG_FILE = 'mysql-bin.000001', MASTER_LOG_POS = 156; |
Zabbix Server Mysql性能优化
zabbix一般来说最大的瓶颈都在于数据库层面,大量数据的读写导致压力很大,所以可以历史数据表进行分区处理。
由于即使关闭了前端的housekeeping,zabbix server依旧会写相关信息到housekeeper表中,所以将其关闭
1 | ALTER TABLE housekeeper ENGINE = BLACKHOLE; |
首先需要对7张表做一个分区的初始化操作。
1 | ALTER TABLE `history` PARTITION BY RANGE ( clock) |
开启event
1 | mysql> show variables like '%event_scheduler%'; |
通过自带的存储过程来实现定时自动增删分区
1 | USE `zabbix`; |
Zabbix Proxy Mysql性能优化
1 | # 重建proxy_history表 |
Zabbix数据库准备
创建zabbix数据库
1 | # mysql -uroot -p |
导入zabbix数据
1 | ## create.sql.gz从zabbix-server服务器上复制过来 |
Zabbix Server HA集群
集群安装
所有节点上执行:
安装HA组件
1
2
3yum install pcs pacemaker fence-agents-all
systemctl start pcsd.service
systemctl enable pcsd.service给hacluster用户设置密码(最好相同)
1
echo hacluster | passwd --stdin hacluster
任意节点上执行:
用相同的密码验证所有节点
1
pcs host auth zabbix-server1 zabbix-server2 -u hacluster -p hacluster
创建database cluster和增加资源
1
pcs cluster setup zabbix_server_cluster zabbix-server1 zabbix-server2
启用集群
1
2
3
4
5
6
7pcs cluster start --all
#启动pacemaker服务
systemctl start pacemaker.service
pcs cluster enable --all
systemctl enable pacemaker.service检查集群状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24[root@zabbix-server1 ~]# pcs status
Cluster name: zabbix_server_cluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Cluster Summary:
* Stack: corosync
* Current DC: zabbix-server2 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
* Last updated: Thu Jun 24 16:16:52 2021
* Last change: Thu Jun 24 16:16:52 2021 by hacluster via crmd on zabbix-server2
* 2 nodes configured
* 0 resource instances configured
Node List:
* Online: [ zabbix-server1 zabbix-server2 ]
Full List of Resources:
* No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
集群参数配置
禁用fencing
1
pcs property set stonith-enabled=false
忽略quorum状态
1
pcs property set no-quorum-policy=ignore
配置转移策略
1
pcs resource defaults migration-threshold=1
创建Service和测试故障转移
创建VIP服务
1
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.2.29 op monitor interval=5s --group zabbix_server_cluster
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21[root@zabbix-server1 ~]# pcs status
Cluster name: zabbix_server_cluster
Cluster Summary:
* Stack: corosync
* Current DC: zabbix-server2 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
* Last updated: Thu Jun 24 16:19:07 2021
* Last change: Thu Jun 24 16:19:00 2021 by root via cibadmin on zabbix-server1
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ zabbix-server1 zabbix-server2 ]
Full List of Resources:
* Resource Group: zabbix_server_cluster:
* VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-server1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabledvip可以ping通
1
2
3
4
5[root@zabbix-server1 ~]# ping 192.168.2.29
PING 192.168.2.29 (192.168.2.29) 56(84) bytes of data.
64 bytes from 192.168.2.29: icmp_seq=1 ttl=64 time=0.023 ms
64 bytes from 192.168.2.29: icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from 192.168.2.29: icmp_seq=3 ttl=64 time=0.035 msIP信息里也能看到
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15[root@zabbix-server1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:94:1c:50 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.26/24 brd 192.168.2.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
inet 192.168.2.29/24 brd 192.168.2.255 scope global secondary ens192
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe94:1c50/64 scope link noprefixroute
valid_lft forever preferred_lft forever通过crm_resource强制终止服务
1
crm_resource --resource VirtualIP --force-stop
通过crm_mon来监控资源的运行情况
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19#crm_mon
Cluster Summary:
* Stack: corosync
* Current DC: zabbix-server2 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
* Last updated: Thu Jun 24 16:20:09 2021
* Last change: Thu Jun 24 16:19:00 2021 by root via cibadmin on zabbix-server1
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ zabbix-server1 zabbix-server2 ]
Active Resources:
* Resource Group: zabbix_server_cluster:
* VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-server2
Failed Resource Actions:
* VirtualIP_monitor_5000 on zabbix-server1 'not running' (7): call=7, status='complete', exitreason='', last-rc-change='2021-06-24 16:20:06 +08:00', queued=0ms, exec=0ms这里可以看到VirtualIP服务自动转移到了zabbix-server2节点上,由此验证了资源的自动故障转移。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15[root@zabbix-server2 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:94:27:39 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.27/24 brd 192.168.2.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
inet 192.168.2.29/24 brd 192.168.2.255 scope global secondary ens192
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe94:2739/64 scope link noprefixroute
valid_lft forever preferred_lft forever避免频繁故障转移,意思就是当转移到节点二后可以一直在节点2运行,直到节点2出现异常
1
pcs resource defaults resource−stickiness=100
Zabbix安装
配置zabbix yum源
1
2
3
4
5[zabbix]
name=zabbix
baseurl=http://yumserver/zabbix/zabbix/5.4/rhel/8/x86_64/
enable=1
gpgcheck=0安装zabbix server、前端和agent
1
dnf install zabbix-server-mysql zabbix-web-mysql zabbix-nginx-conf zabbix-sql-scripts zabbix-agent
配置zabbix_server.conf
1
2
3
4
5
6
7
8# 修改sourceip为VIP
SourceIP=192.168.2.29
# 修改dbhost为db的vip
DBHost=192.168.2.28
DBName=zabbix
DBUser=zabbix
DBPassword=<DB_ZABBIX_PASS>对于zabbix_server节点创建ZabbixServer资源
1
pcs resource create zabbixserver systemd:zabbix-server op monitor interval=10s --group zabbix_server_cluster
两个zabbix server不能同时运行,所以要确保zabbix server只在其中一个节点在线
1
pcs constraint colocation add VirtualIP with zabbixserver INFINITY
确保VirtualIP在zabbixserver之前开始运行
1
pcs constraint order VirtualIP then zabbixserver
配置资源的超时时间
1
2pcs resource op add zabbixserver start interval=0s timeout=60s
pcs resource op add zabbixserver stop interval=0s timeout=120s检查资源状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[root@zabbix-server1 zabbix]# pcs status
Cluster name: zabbix_server_cluster
Cluster Summary:
* Stack: corosync
* Current DC: zabbix-server1 (version 2.0.5-9.el8-ba59be7122) - partition with quorum
* Last updated: Thu Jun 24 17:58:18 2021
* Last change: Thu Jun 24 17:56:40 2021 by root via crm_resource on zabbix-server1
* 2 nodes configured
* 2 resource instances configured
Node List:
* Online: [ zabbix-server1 zabbix-server2 ]
Full List of Resources:
* Resource Group: zabbix_server_cluster:
* VirtualIP (ocf::heartbeat:IPaddr2): Started zabbix-server1
* zabbixserver (systemd:zabbix-server): Started zabbix-server1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled修改/etc/nginx/conf.d/zabbix.conf文件
1
2listen 80;
server_name 192.168.2.29;启动zabbix server和agent
1
2systemctl restart zabbix-server zabbix-agent nginx php-fpm
systemctl enable zabbix-server zabbix-agent nginx php-fpm关闭IPV6
1
2
3
4
5# 临时关闭
# sysctl -w net.ipv6.conf.all.disable_ipv6=1
# Disabling IPv6 in NetworkManager
nmcli connection modify ens192 ipv6.method "disabled"修改默认时区
1
2
3
4
5
6[root@zabbix-server1 ~]# vim /etc/php.ini
date.timezone = Asia/Shanghai
#重启服务
systemctl restart php-fpmzabbix server参数配置(参考)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34[root@zabbix-server1 include]# egrep -v '^#|^$' /etc/zabbix/zabbix_server.conf
SourceIP=192.168.2.29
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=192.168.2.28
DBName=zabbix
DBUser=zabbix
DBPassword=zabbix
StartPollers=200
StartPreprocessors=20
StartPollersUnreachable=5
StartTrappers=20
StartPingers=5
StartDiscoverers=5
StartHTTPPollers=5
StartTimers=5
StartEscalators=5
StartAlerters=5
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
StartSNMPTrapper=1
CacheSize=2G
StartDBSyncers=20
HistoryCacheSize=1G
HistoryIndexCacheSize=512M
TrendCacheSize=512M
TrendFunctionCacheSize=128M
ValueCacheSize=128M
Timeout=30
LogSlowQueries=3000
StartLLDProcessors=20
AllowRoot=1
StatsAllowedIP=127.0.0.1
故障处理
验证节点失败Unable to communicate
1 | rm -rf /var/lib/pcsd/ |
创建cluster报错Unable to read the known-hosts file: No such file or directory: ‘/var/lib/pcsd/known-hosts’
1 | pcs cluster destroy |
Authentication plugin ‘caching_sha2_password’ reported error: Authentication requires secure connection
使用复制用户请求服务器公钥: mysql -u rep -p -h 192.168.2.24 -P3306 --get-server-public-key
在这种情况下,服务器将RSA公钥发送给客户端,后者使用它来加密密码并将结果返回给服务器。插件使用服务器端的RSA私钥解密密码,并根据密码是否正确来接受或拒绝连接。
重新在从库配置change masrer to并且start slave,复制可以正常启动:
1 | #停止主从复制 |
缺少字体
查看系统带的字体
1 | locale -a |
安装缺少的字体
1 | yum install langpacks-en.noarch |