#日常监控
通过查询系统视图可以检查主备机的运行状态和同步情况。视图中每个字段的含义见参考手册系统视图。
这些系统视图可以帮助管理员监控主备机之间的同步情况,也可确认高可用环境的关键信息,如主备角色,保护模式,同步性能等。
在分布式部署中,还可以通过yasboot工具查看分布式集群各节点状况进行监控。 Note:
本文所列视图在主备机都存在,独有视图表示该视图只在主机或者备机上才存在记录。
# 主备机通用的视图
V$DATABASE/DV$DATABASE
数据库状态视图,包含database_role主备角色,log_mode归档模式、open_mode读写模式和protection_mode保护模式等信息。
--单机高可用
--备机执行
SELECT database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM V$DATABASE;
DATABASE_NAME LOG_MODE OPEN_MODE PROTECTION_MODE DATABASE_ROLE BLOCK_SIZE STATUS
-------------- ----------- ---------- -------------------- ------------- ------------ -------
yasdb ARCHIVELOG READ_ONLY MAXIMUM PERFORMANCE STANDBY 8192 NORMAL
--分布式高可用
-- 在CN节点上执行
SELECT group_id, group_node_id, database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM DV$DATABASE;
GROUP_ID GROUP_NODE_ID DATABASE_NAME LOG_MODE OPEN_MODE PROTECTION_MODE DATABASE_ROLE BLOCK_SIZE STATUS
---------- -------------- --------------- ----------------- ----------------- --------------------- ----------------- ------------ -------
1 1 YashanDB NOARCHIVELOG READ_WRITE MAXIMUM PERFORMANCE PRIMARY 8192 NORMAL
2 1 YashanDB NOARCHIVELOG READ_WRITE MAXIMUM PERFORMANCE PRIMARY 8192 NORMAL
3 1 YashanDB ARCHIVELOG READ_WRITE MAXIMUM PROTECTION PRIMARY 8192 NORMAL
3 2 YashanDB ARCHIVELOG READ_ONLY MAXIMUM PROTECTION STANDBY 8192 NORMAL
3 3 YashanDB ARCHIVELOG READ_ONLY MAXIMUM PROTECTION STANDBY 8192 NORMAL
V$ARCHIVED_LOG/DV$ARCHIVED_LOG
归档日志统计视图,包含日志成员、日志大小、日志号和完成时间等信息。
--单机高可用
SELECT name, SEQUENCE#, blocks, block_size, completion_time FROM V$ARCHIVED_LOG;
NAME SEQUENCE# BLOCKS BLOCK_SIZE COMPLETION_TIME
----------------------------------------- ------------ ------------ ---------- ------------------
/home/yashan/node_1/archive/arch_0_1.ARC 1 11 4096 2022-07-19
/home/yashan/node_1/archive/arch_0_2.ARC 2 1812 4096 2022-07-19
--分布式高可用
SELECT group_id, group_node_id, name, SEQUENCE#, blocks, block_size, completion_time FROM DV$ARCHIVED_LOG;
GROUP_ID GROUP_NODE_ID NAME SEQUENCE# BLOCKS BLOCK_SIZE COMPLETION_TIME
---------- -------------- ---------------------------------------------- ------------ ------------ ---------- ---------------
3 1 /data/yashan/dn-3-1/archive/arch_0_1.ARC 1 14 4096 2022-08-26
DV$ELECTION
自动选举视图,包含当前LEADER、任期、同步日志点、状态、最后心跳时间和节点组成员等信息。
--分布式高可用
SELECT * FROM DV$ELECTION;
GROUP_ID GROUP_NODE_ID LEADER_GROUP_ID LEADER_GROUP_NODE_ID TERM LFN LFN_TERM STATE LAST_HEARTBEAT_TIME PEERS
---------- -------------- --------------- --------------------- ------ ------ --------- ---------- --------------------------- -----------
2 3 2 1 1 1022 0 Follower 2022-08-26 11:24:01.089933 2-1,2-2
2 2 2 1 1 1022 0 Follower 2022-08-26 11:24:01.089965 2-1,2-3
2 1 2 1 1 1022 0 Leader 2-2,2-3
# 主机独有的视图
# 单机高可用
V$ARCHIVE_DEST_STATUS/DV$ARCHIVE_DEST_STATUS
备机列表和状态视图,包含dest_id备机编号、connection连接状态、status备机状态、received_lfn备机接收点和applied_lfn备机回放点等信息。该视图也可以在级联备的上级备机上查询,显示其下属级联备列表信息。
--单机高可用
SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED
------- -------------- ------- -------------- ------------ ------------ ------------
2 CONNECTED NORMAL OPEN 1179 1179 YES
3 DISCONNECTED UNKOWN UNKOWN 985 985 NO
--分布式高可用
SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM DV$ARCHIVE_DEST_STATUS WHERE GROUP_ID=2 AND GROUP_NODE_ID=3;
DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED
------- -------------- ------- -------------- ------------ ------------ ------------
2 CONNECTED NORMAL OPEN 1179 1179 YES
3 DISCONNECTED UNKOWN UNKOWN 985 985 NO
# 备机独有的视图
V$REPLICATION_STATUS/DV$REPLICATION_STATUS
备机同步状态视图,包含status连接状态、primary_lfn主机日志点、received_lfn备机接收点、applied_lfn备机回放点、transport_lag日志传输和apply_lag日志应用延时情况等信息,例如:
--单机高可用
SELECT connection,status,peer_addr,transport_lag,apply_lag FROM V$REPLICATION_STATUS;
CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG
----------- -------- ------------------------- ------------- ------------
CONNECTED NORMAL 192.168.4.201:1689 0 0
--分布式高可用
SELECT group_id, group_node_id,connection,status,peer_addr,transport_lag,apply_lag FROM DV$REPLICATION_STATUS;
GROUP_ID GROUP_NODE_ID CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG
---------- --------------------- ----------------- ----------------- --------------------- ------------- ------------
2 2 CONNECTED NORMAL 127.0.0.1:1411 0 0
2 3 CONNECTED NORMAL 127.0.0.1:1411 2041932861 0
connection字段一般有以下值:
- CONNECTED:备机能够跟主机通信。
- DISCONNECTED:备机不能跟主机通信。
status字段一般有以下值:
- NORMAL:主备同步正常。
- NEED REPAIR:出现不一致情况,需要修复该备机。
- REDO MISMATCH:主备redo日志出现分歧,可以执行alter system ignore standby mismatched redo快速清理备机不匹配的redo日志,清理后的redo无法恢复,请谨慎操作。
peer_addr字段表示所连接的主机的IP与监听端口。
V$RECOVERY_PROGRESS/DV$RECOVERY_PROGRESS
备机回放进度视图,包含Average Apply Rate、Redo Remain、Remain Time和Parallelism等信息,用于分析备机日志回放性能。一般情况下,Remain Time应小于1秒。
--单机高可用
SELECT * FROM V$RECOVERY_PROGRESS;
ITEM UNITS VALUE
--------------------------------- ----------------- ---------------------
Active Apply Rate KB/sec 337
Average Apply Rate KB/sec 22832
Maximum Apply Rate KB/sec 309180
Redo Applied MB 5
Redo Remain MB 0
First Applied Redo LFN 987
Last Applied Redo LFN 1180
Active Time Seconds 0
Remain Time Seconds 0
Non Physical Log Count Number 0
Parallelism Threads 8
Latest Load Size KB 4
Latest Load Time Millisecond 0
Latest Analysis Time Millisecond 0
Wait Previous Apply Time Millisecond 11
Latest Apply Time Millisecond 11
--分布式高可用
SELECT item, units, value FROM DV$RECOVERY_PROGRESS WHERE GROUP_ID=2 AND GROUP_NODE_ID=3;
ITEM UNITS VALUE
------------------------- ----------------- ------
Active Apply Rate KB/sec 1843
Average Apply Rate KB/sec 1843
Maximum Apply Rate KB/sec 1843
Redo Applied MB 0
Redo Remain MB 0
First Applied Redo LFN 0
Last Applied Redo LFN 1022
Active Time Seconds 0
Remain Time Seconds 0
Non Physical Log Count Number 0
Parallelism Threads 0
Latest Load Size KB 4
Latest Load Time Millisecond 2
Latest Analysis Time Millisecond 0
Wait Previous Apply Time Millisecond 0
Latest Apply Time Millisecond 2
# yasboot工具监控
通过yasboot工具查询分布式集群中所有节点的状态,status字段值为PENDING或-
时,表示该节点状态异常,需进行干预处理。
$ yasboot cluster status -c yashan -d
Host: 192.168.3.112:22, User: yashan
group type | node id | group name | listen addr | pid | status | data path
mn | 1-1 | manage | 192.168.3.112:1658 | 81088 | OPEN | /dev/shm/mn-1-1
mn | 1-2 | manage | 192.168.3.112:1668 | 81222 | OPEN | /dev/shm/mn-1-2
dn | 3-1 | dn1 | 192.168.3.112:1708 | 81805 | OPEN | /dev/shm/dn-3-1
dn | 3-2 | dn1 | 192.168.3.112:1718 | 81969 | OPEN | /dev/shm/dn-3-2
cn | 2-1 | cn1 | 192.168.3.112:1688 | 82074 | OPEN | /dev/shm/cn-2-1
Host: 192.168.3.113:22, User: yashan2
group type | node id | group name | listen addr | pid | status | data path
mn | 1-3 | manage | 192.168.3.113:1678 | 61221 | OPEN | /dev/shm/mn-1-3
dn | 5-1 | dn3 | 192.168.3.113:1738 | 61807 | OPEN | /dev/shm/dn-5-1
dn | 4-1 | dn2 | 192.168.3.113:1728 | 61809 | OPEN | /dev/shm/dn-4-1
cn | 1-2 | cn1 | 192.168.3.113:1698 | 62075 | OPEN | /dev/shm/cn-2-2