#日常监控

通常,管理员可以通过查询特定的系统视图检查主备库的运行状态以及同步情况,确认高可用环境的关键信息,例如主备角色、保护模式、同步性能等。

在分布式部署中,还可以通过yasboot工具查看分布式集群各节点状况。

Note

本文所列视图在主备库都存在,独有视图表示该视图只在特定节点上才存在记录。

# 主备库通用视图

# V$DATABASE/DV$DATABASE

数据库状态视图,包含database_role主备角色,log_mode归档模式、open_mode读写模式和protection_mode保护模式等信息。

示例(单机部署)

-- 单机高可用环境中,在备库执行
SELECT database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM  V$DATABASE; 
DATABASE_NAME  LOG_MODE    OPEN_MODE  PROTECTION_MODE      DATABASE_ROLE   BLOCK_SIZE STATUS
-------------- ----------- ---------- -------------------- ------------- ------------ -------
yasdb          ARCHIVELOG  READ_ONLY  MAXIMUM PERFORMANCE  STANDBY               8192 NORMAL
Copied!

示例(分布式部署)

-- 分布式高可用环境中,在CN节点上执行
SELECT group_id, group_node_id, database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM DV$DATABASE;
  GROUP_ID  GROUP_NODE_ID DATABASE_NAME   LOG_MODE          OPEN_MODE         PROTECTION_MODE       DATABASE_ROLE       BLOCK_SIZE STATUS 
---------- -------------- --------------- ----------------- ----------------- --------------------- ----------------- ------------ -------
         1              1 YashanDB        NOARCHIVELOG      READ_WRITE        MAXIMUM PERFORMANCE   PRIMARY                   8192 NORMAL 
         2              1 YashanDB        NOARCHIVELOG      READ_WRITE        MAXIMUM PERFORMANCE   PRIMARY                   8192 NORMAL 
         3              1 YashanDB        ARCHIVELOG        READ_WRITE        MAXIMUM PROTECTION    PRIMARY                   8192 NORMAL 
         3              2 YashanDB        ARCHIVELOG        READ_ONLY         MAXIMUM PROTECTION    STANDBY                   8192 NORMAL 
         3              3 YashanDB        ARCHIVELOG        READ_ONLY         MAXIMUM PROTECTION    STANDBY                   8192 NORMAL 
Copied!

# V$ARCHIVED_LOG/DV$ARCHIVED_LOG

归档日志统计视图,包含日志成员、日志大小、日志号和完成时间等信息。

示例(单机部署)

-- 单机高可用环境
SELECT name, SEQUENCE#, blocks, block_size, completion_time FROM V$ARCHIVED_LOG;
NAME                                         SEQUENCE#       BLOCKS BLOCK_SIZE COMPLETION_TIME                  
----------------------------------------- ------------ ------------ ---------- ------------------
/home/yashan/node_1/archive/arch_0_1.ARC             1           11       4096 2022-07-19                      
/home/yashan/node_1/archive/arch_0_2.ARC             2         1812       4096 2022-07-19 
Copied!

示例(分布式部署)

-- 分布式高可用环境
SELECT group_id, group_node_id, name, SEQUENCE#, blocks, block_size, completion_time FROM DV$ARCHIVED_LOG;
  GROUP_ID  GROUP_NODE_ID NAME                                              SEQUENCE#       BLOCKS BLOCK_SIZE COMPLETION_TIME
---------- -------------- ---------------------------------------------- ------------ ------------ ---------- ---------------
         3              1 /data/yashan/dn-3-1/archive/arch_0_1.ARC              1           14       4096 2022-08-26 
Copied!

# DV$ELECTION

自动选主视图,包含当前LEADER、任期、同步日志点、状态、最后心跳时间和节点组成员等信息。

示例(分布式部署)

-- 分布式高可用环境
SELECT group_id,group_node_id,leader_group_id,leader_group_node_id,term,lfn,lfn_term,state,last_heartbeat_time,peers FROM DV$ELECTION;
  GROUP_ID  GROUP_NODE_ID LEADER_GROUP_ID  LEADER_GROUP_NODE_ID   TERM    LFN  LFN_TERM STATE      LAST_HEARTBEAT_TIME         PEERS
---------- -------------- --------------- --------------------- ------ ------ --------- ---------- --------------------------- -----------
         2              3               2                     1      1   1022         0 Follower   2022-08-26 11:24:01.089933  2-1,2-2
         2              2               2                     1      1   1022         0 Follower   2022-08-26 11:24:01.089965  2-1,2-3
         2              1               2                     1      1   1022         0 Leader                                 2-2,2-3
Copied!

# 主库独有的视图

V$ARCHIVE_DEST_STATUS/DV$ARCHIVE_DEST_STATUS

备库列表和状态视图,包含dest_id备库编号、connection连接状态、status备库状态、received_lfn备库接收点和applied_lfn备库回放点等信息。该视图也可以在级联备的上级备库上查询,显示其下属级联备列表信息。

示例(单机部署)

-- 单机高可用环境
SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION     STATUS  DATABASE_MODE  RECEIVED_LFN  APPLIED_LFN SYNCHRONIZED 
------- -------------- ------- -------------- ------------ ------------ ------------ 
      2 CONNECTED      NORMAL  OPEN                   1179         1179 YES         
      3 DISCONNECTED   UNKOWN  UNKOWN                  985          985 NO     
Copied!

示例(分布式部署)

-- 分布式高可用环境
SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM DV$ARCHIVE_DEST_STATUS WHERE GROUP_ID=2 AND GROUP_NODE_ID=3;
DEST_ID CONNECTION     STATUS  DATABASE_MODE  RECEIVED_LFN  APPLIED_LFN SYNCHRONIZED 
------- -------------- ------- -------------- ------------ ------------ ------------ 
      2 CONNECTED      NORMAL  OPEN                   1179         1179 YES         
      3 DISCONNECTED   UNKOWN  UNKOWN                  985          985 NO       
Copied!

# 备库独有的视图

# V$REPLICATION_STATUS/DV$REPLICATION_STATUS

备库同步状态视图,包含status连接状态、peer_point主库日志点、received_point备库接收点、applied_point备库回放点、transport_lag日志传输延迟时间和apply_lag日志回放延迟时间等信息。

示例(单机部署)

-- 单机高可用环境
SELECT connection,status,peer_addr,peer_point,received_point,applied_point,transport_lag,apply_lag FROM V$REPLICATION_STATUS;
CONNECTION        STATUS     PEER_ADDR               PEER_POINT     RECEIVED_POINT    APPLIED_POINT     TRANSPORT_LAG   APPLY_LAG
----------------- ---------- ----------------------- -------------- ----------------- ----------------- --------------- ------------
CONNECTED         NORMAL     192.168.4.201:1689      0-2-150        0-2-150           0-2-150           0               0
Copied!

示例(分布式部署)

-- 分布式高可用环境
SELECT group_id, group_node_id,connection,status,peer_addr,transport_lag,apply_lag FROM DV$REPLICATION_STATUS;
  GROUP_ID         GROUP_NODE_ID CONNECTION        STATUS            PEER_ADDR             TRANSPORT_LAG    APPLY_LAG 
---------- --------------------- ----------------- ----------------- --------------------- ------------- ------------ 
         2                     2 CONNECTED         NORMAL            127.0.0.1:1411                   0            0
         2                     3 CONNECTED         NORMAL            127.0.0.1:1411          2041932861            0
Copied!

connection字段一般有以下值:

  • CONNECTED:备库能够跟主库通信。
  • DISCONNECTED:备库不能跟主库通信。

status字段一般有以下值:

  • NORMAL:主备同步正常。
  • NEED REPAIR:出现不一致情况,需要修复该备库。
  • REDO MISMATCH:主备redo日志出现分歧,可以执行alter system ignore standby mismatched redo快速清理备库不匹配的redo日志,清理后的redo无法恢复,请谨慎操作。

peer_addr字段表示所连接的主库的IP与监听端口。

# V$RECOVERY_PROGRESS/DV$RECOVERY_PROGRESS

备库回放进度视图,包含Average Apply Rate、Redo Remain、Remain Time和Parallelism等信息,用于分析备库日志回放性能。一般情况下,Remain Time应小于1秒。

示例(单机部署)

-- 单机高可用环境
SELECT item,units,value FROM V$RECOVERY_PROGRESS;
ITEM                              UNITS                             VALUE 
--------------------------------- ----------------- --------------------- 
Active Apply Rate                 KB/sec                              337
Average Apply Rate                KB/sec                            22832
Maximum Apply Rate                KB/sec                           309180
Redo Applied                      MB                                    5
Redo Remain                       MB                                    0
First Applied Redo                LFN                                 987
Last Applied Redo                 LFN                                1180
Active Time                       Seconds                               0
Remain Time                       Seconds                               0
Non Physical Log Count            Number                                0
Parallelism                       Threads                               8
Latest Load Size                  KB                                    4
Latest Load Time                  Millisecond                           0
Latest Analysis Time              Millisecond                           0
Wait Previous Apply Time          Millisecond                          11
Latest Apply Time                 Millisecond                          11
Copied!

示例(分布式部署)

-- 分布式高可用环境
SELECT item, units, value FROM DV$RECOVERY_PROGRESS WHERE GROUP_ID=2 AND GROUP_NODE_ID=3;
ITEM                      UNITS              VALUE 
------------------------- ----------------- ------ 
Active Apply Rate         KB/sec              1843
Average Apply Rate        KB/sec              1843
Maximum Apply Rate        KB/sec              1843
Redo Applied              MB                     0
Redo Remain               MB                     0
First Applied Redo        LFN                    0
Last Applied Redo         LFN                 1022
Active Time               Seconds                0
Remain Time               Seconds                0
Non Physical Log Count    Number                 0
Parallelism               Threads                0
Latest Load Size          KB                     4
Latest Load Time          Millisecond            2
Latest Analysis Time      Millisecond            0
Wait Previous Apply Time  Millisecond            0
Latest Apply Time         Millisecond            2
Copied!

# yasboot工具监控

通过yasboot工具查询分布式集群中所有节点的状态,status字段值为PENDING或-时,表示该节点状态异常,需进行干预处理。

$ yasboot cluster status -c yashan -d
Host: 192.168.3.112:22, User: yashan
group type | node id         | group name      | listen addr            | pid        | status     | data path                     
mn         |   1-1           | manage          | 192.168.3.112:1658     | 81088      | OPEN       | /dev/shm/mn-1-1                       
mn         |   1-2           | manage          | 192.168.3.112:1668     | 81222      | OPEN       | /dev/shm/mn-1-2             
dn         |   3-1           | dn1             | 192.168.3.112:1708     | 81805      | OPEN       | /dev/shm/dn-3-1                      
dn         |   3-2           | dn1             | 192.168.3.112:1718     | 81969      | OPEN       | /dev/shm/dn-3-2                        
cn         |   2-1           | cn1             | 192.168.3.112:1688     | 82074      | OPEN       | /dev/shm/cn-2-1   

Host: 192.168.3.113:22, User: yashan2
group type | node id         | group name      | listen addr            | pid        | status     | data path                     
mn         |   1-3           | manage          | 192.168.3.113:1678     | 61221      | OPEN       | /dev/shm/mn-1-3   
dn         |   5-1           | dn3             | 192.168.3.113:1738     | 61807      | OPEN       | /dev/shm/dn-5-1           
dn         |   4-1           | dn2             | 192.168.3.113:1728     | 61809      | OPEN       | /dev/shm/dn-4-1
cn         |   1-2           | cn1             | 192.168.3.113:1698     | 62075      | OPEN       | /dev/shm/cn-2-2
Copied!