#日常监控

通过查询系统视图可以检查主备机的运行状态和同步情况。视图中每个字段的含义见参考手册系统视图

这些系统视图可以帮助管理员监控主备机之间的同步情况,也可确认高可用环境的关键信息,如主备角色,保护模式,同步性能等。

在分布式部署中,还可以通过yasboot工具查看分布式集群各节点状况进行监控。 Note

本文所列视图在主备机都存在,独有视图表示该视图只在主机或者备机上才存在记录。

# 主备机通用的视图

V$DATABASE/DV$DATABASE

数据库状态视图,包含database_role主备角色,log_mode归档模式、open_mode读写模式和protection_mode保护模式等信息。

--单机高可用
--备机执行
SELECT database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM  V$DATABASE; 
DATABASE_NAME  LOG_MODE    OPEN_MODE  PROTECTION_MODE      DATABASE_ROLE   BLOCK_SIZE STATUS
-------------- ----------- ---------- -------------------- ------------- ------------ -------
yasdb          ARCHIVELOG  READ_ONLY  MAXIMUM PERFORMANCE  STANDBY               8192 NORMAL

--分布式高可用
-- 在CN节点上执行
SELECT service_id, service_group_id, group_node_id, database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM DV$DATABASE;
  SERVICE_ID SERVICE_GROUP_ID  GROUP_NODE_ID DATABASE_NAME   LOG_MODE          OPEN_MODE         PROTECTION_MODE       DATABASE_ROLE       BLOCK_SIZE STATUS 
------------ ---------------- -------------- --------------- ----------------- ----------------- --------------------- ----------------- ------------ -------
           0                1              1 YashanDB        NOARCHIVELOG      READ_WRITE        MAXIMUM PERFORMANCE   PRIMARY                   8192 NORMAL 
           1                1              1 YashanDB        NOARCHIVELOG      READ_WRITE        MAXIMUM PERFORMANCE   PRIMARY                   8192 NORMAL 
           1                2            101 YashanDB        ARCHIVELOG        READ_WRITE        MAXIMUM PROTECTION    PRIMARY                   8192 NORMAL 
           1                2            102 YashanDB        ARCHIVELOG        READ_ONLY         MAXIMUM PROTECTION    STANDBY                   8192 NORMAL 
           1                2            103 YashanDB        ARCHIVELOG        READ_ONLY         MAXIMUM PROTECTION    STANDBY                   8192 NORMAL 

V$ARCHIVED_LOG/DV$ARCHIVED_LOG

归档日志统计视图,包含日志成员、日志大小、日志号和完成时间等信息。

--单机高可用
SELECT name, SEQUENCE#, blocks, block_size, completion_time FROM V$ARCHIVED_LOG;
NAME                                         SEQUENCE#       BLOCKS BLOCK_SIZE COMPLETION_TIME                  
----------------------------------------- ------------ ------------ ---------- ------------------
/home/yashan/node_1/archive/arch_0_1.ARC             1           11       4096 2022-07-19                      
/home/yashan/node_1/archive/arch_0_2.ARC             2         1812       4096 2022-07-19 

--分布式高可用
SELECT service_id, service_group_id, group_node_id, name, SEQUENCE#, blocks, block_size, completion_time FROM DV$ARCHIVED_LOG;
  SERVICE_ID SERVICE_GROUP_ID  GROUP_NODE_ID NAME                                              SEQUENCE#       BLOCKS BLOCK_SIZE COMPLETION_TIME
------------ ---------------- -------------- ---------------------------------------------- ------------ ------------ ---------- ---------------
           1                2            101 /data/yashan/dn-1-2-101/archive/arch_0_1.ARC              1           14       4096 2022-08-26 

DV$ELECTION

自动选举视图,包含当前LEADER、任期、同步日志点、状态和最后心跳时间等信息。

--分布式高可用
SELECT * FROM DV$ELECTION;
  SERVICE_ID SERVICE_GROUP_ID  GROUP_NODE_ID LEADER_SERVICE_ID LEADER_SERVICE_GROUP_ID  LEADER_GROUP_NODE_ID   TERM    LFN  LFN_TERM STATE      LAST_HEARTBEAT_TIME        
------------ ---------------- -------------- ----------------- ----------------------- --------------------- ------ ------ --------- ---------- ---------------------------
           1                2            103                 1                       2                   101      1   1022         0 Follower   2022-08-26 11:24:01.089933 
           1                2            102                 1                       2                   101      1   1022         0 Follower   2022-08-26 11:24:01.089965 
           1                2            101                 1                       2                   101      1   1022         0 Leader                                

# 主机独有的视图

# 单机高可用

V$ARCHIVE_DEST_STATUS/DV$ARCHIVE_DEST_STATUS

备机列表和状态视图,包含dest_id备机编号、connection连接状态、status备机状态、received_lfn备机接收点和applied_lfn备机回放点等信息。该视图也可以在级联备的上级备机上查询,显示其下属级联备列表信息。

--单机高可用
SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS;
DEST_ID CONNECTION     STATUS  DATABASE_MODE  RECEIVED_LFN  APPLIED_LFN SYNCHRONIZED 
------- -------------- ------- -------------- ------------ ------------ ------------ 
      2 CONNECTED      NORMAL  OPEN                   1179         1179 YES         
      3 DISCONNECTED   UNKOWN  UNKOWN                  985          985 NO     
      
--分布式高可用
SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM DV$ARCHIVE_DEST_STATUS WHERE SERVICE_ID=1 AND SERVICE_GROUP_ID=2 AND GROUP_NODE_ID=103;
DEST_ID CONNECTION     STATUS  DATABASE_MODE  RECEIVED_LFN  APPLIED_LFN SYNCHRONIZED 
------- -------------- ------- -------------- ------------ ------------ ------------ 
      2 CONNECTED      NORMAL  OPEN                   1179         1179 YES         
      3 DISCONNECTED   UNKOWN  UNKOWN                  985          985 NO       

# 备机独有的视图

V$REPLICATION_STATUS/DV$REPLICATION_STATUS

备机同步状态视图,包含status连接状态、peer_point主机日志点、received_point备机接收点、applied_point备机回放点、transport_lag日志传输延迟时间和apply_lag日志回放延迟时间等信息。

--单机高可用
SELECT connection,status,peer_addr,peer_point,received_point,applied_point,transport_lag,apply_lag FROM V$REPLICATION_STATUS;
CONNECTION        STATUS     PEER_ADDR               PEER_POINT     RECEIVED_POINT    APPLIED_POINT     TRANSPORT_LAG   APPLY_LAG
----------------- ---------- ----------------------- -------------- ----------------- ----------------- --------------- ------------
CONNECTED         NORMAL     192.168.4.201:1689      0-2-150        0-2-150           0-2-150           0               0

--分布式高可用
SELECT service_id, service_group_id, group_node_id,connection,status,peer_addr,transport_lag,apply_lag FROM DV$REPLICATION_STATUS;
  SERVICE_ID SERVICE_GROUP_ID         GROUP_NODE_ID CONNECTION        STATUS            PEER_ADDR             TRANSPORT_LAG    APPLY_LAG 
------------ ---------------- --------------------- ----------------- ----------------- --------------------- ------------- ------------ 
           1                2                   102 CONNECTED         NORMAL            127.0.0.1:1411                   0            0
           1                2                   103 CONNECTED         NORMAL            127.0.0.1:1411          2041932861            0

connection字段一般有以下值:

  • CONNECTED:备机能够跟主机通信。
  • DISCONNECTED:备机不能跟主机通信。

status字段一般有以下值:

  • NORMAL:主备同步正常。
  • NEED REPAIR:出现不一致情况,需要修复该备机。
  • REDO MISMATCH:主备redo日志出现分歧,可以执行alter system ignore standby mismatched redo快速清理备机不匹配的redo日志,清理后的redo无法恢复,请谨慎操作。

peer_addr字段表示所连接的主机的IP与监听端口。

V$RECOVERY_PROGRESS/DV$RECOVERY_PROGRESS

备机回放进度视图,包含Average Apply Rate、Redo Remain、Remain Time和Parallelism等信息,用于分析备机日志回放性能。一般情况下,Remain Time应小于1秒。

--单机高可用
SELECT * FROM V$RECOVERY_PROGRESS;
ITEM                              UNITS                             VALUE 
--------------------------------- ----------------- --------------------- 
Active Apply Rate                 KB/sec                              337
Average Apply Rate                KB/sec                            22832
Maximum Apply Rate                KB/sec                           309180
Redo Applied                      MB                                    5
Redo Remain                       MB                                    0
First Applied Redo                LFN                                 987
Last Applied Redo                 LFN                                1180
Active Time                       Seconds                               0
Remain Time                       Seconds                               0
Non Physical Log Count            Number                                0
Parallelism                       Threads                               8
Latest Load Size                  KB                                    4
Latest Load Time                  Millisecond                           0
Latest Analysis Time              Millisecond                           0
Wait Previous Apply Time          Millisecond                          11
Latest Apply Time                 Millisecond                          11

--分布式高可用
SELECT item, units, value FROM DV$RECOVERY_PROGRESS WHERE SERVICE_ID=1 AND SERVICE_GROUP_ID=2 AND GROUP_NODE_ID=103;
ITEM                      UNITS              VALUE 
------------------------- ----------------- ------ 
Active Apply Rate         KB/sec              1843
Average Apply Rate        KB/sec              1843
Maximum Apply Rate        KB/sec              1843
Redo Applied              MB                     0
Redo Remain               MB                     0
First Applied Redo        LFN                    0
Last Applied Redo         LFN                 1022
Active Time               Seconds                0
Remain Time               Seconds                0
Non Physical Log Count    Number                 0
Parallelism               Threads                0
Latest Load Size          KB                     4
Latest Load Time          Millisecond            2
Latest Analysis Time      Millisecond            0
Wait Previous Apply Time  Millisecond            0
Latest Apply Time         Millisecond            2

# yasboot工具监控

通过yasboot工具查询分布式集群中所有节点的状态,status字段值为PENDING或-时,表示该节点状态异常,需进行干预处理。

$ yasboot cluster status -c yashan -d
Host: 192.168.3.112:22, User: yashan
group type | node id         | service name    | group name      | listen addr            | pid        | status     | data path                     
mn         | 0-1-1           | manage          | manage          | 192.168.3.112:1658     | 81088      | OPEN       | /dev/shm/mn-0-1-1                       
mn         | 0-1-2           | manage          | manage          | 192.168.3.112:1668     | 81222      | OPEN       | /dev/shm/mn-0-1-2             
dn         | 1-2-101         | my_service      | dn1             | 192.168.3.112:1708     | 81805      | OPEN       | /dev/shm/dn-1-2-101                      
dn         | 1-2-102         | my_service      | dn1             | 192.168.3.112:1718     | 81969      | OPEN       | /dev/shm/dn-1-2-102                        
cn         | 1-1-1           | my_service      | cn1             | 192.168.3.112:1688     | 82074      | OPEN       | /dev/shm/cn-1-1-1   

Host: 192.168.3.113:22, User: yashan2
group type | node id         | service name    | group name      | listen addr            | pid        | status     | data path                     
mn         | 0-1-3           | manage          | manage          | 192.168.3.113:1678     | 61221      | OPEN       | /dev/shm/mn-0-1-3   
dn         | 1-4-101         | my_service      | dn3             | 192.168.3.113:1738     | 61807      | OPEN       | /dev/shm/dn-1-4-101           
dn         | 1-3-101         | my_service      | dn2             | 192.168.3.113:1728     | 61809      | OPEN       | /dev/shm/dn-1-3-101
cn         | 1-1-2           | my_service      | cn1             | 192.168.3.113:1698     | 62075      | OPEN       | /dev/shm/cn-1-1-2