Author: ivan@mysqlab.net
早上6点被电话叫醒,有应用报警,根据经验应该是raid卡电池没电了,一查果然有2台机器没电了,还有2台已经充电完成。
[root@userminfo2:2.20 ~]# MegaCli -AdpBbuCmd -GetBbuStatus -aALL
BBU status for Adapter: 0
BatteryType: TBBU
Voltage: 3858 mV
Current: 932 mA
Temperature: 24 C
Firmware Status: 00000008
Battery state:
GasGuageStatus:
Fully Discharged : No
Fully Charged : No
Discharging : No
Initialized : Yes
Remaining Time Alarm : No
Remaining Capacity Alarm: No
Discharge Terminated : No
Over Temperature : No
Charging Terminated : No
Over Charged : No
Relative State of Charge: 17 %
Charger Status: In Progress
Remaining Capacity: 253 mAh
Full Charge Capacity: 1502 mAh
isSOHGood: Yes
BBU status for Adapter: 0
BatteryType: TBBU
Voltage: 4065 mV
Current: 0 mA
Temperature: 22 C
Firmware Status: 00000000
Battery state:
GasGuageStatus:
Fully Discharged : No
Fully Charged : Yes
Discharging : Yes
Initialized : Yes
Remaining Time Alarm : No
Remaining Capacity Alarm: No
Discharge Terminated : No
Over Temperature : No
Charging Terminated : Yes
Over Charged : No
Relative State of Charge: 100 %
Charger Status: Complete
Remaining Capacity: 1639 mAh
Full Charge Capacity: 1642 mAh
isSOHGood: Yes
其实这问题都遇上N次了,为啥都遇上问题才得以解决?就是因为对BBU电量比例并没有监控。
对于出现问题的机器,一般临时处理办法都是更改trx参数(这是个非常危险的操作,如果这段时间按掉电死机之类的就会造成数据丢失),而这些都是可以预警和自动完成的。
新的监控系统将加入Raid卡BBU电池电量的监控,实现预警。至于是否自动更改redo日志的写策略可以配置,根据不同业务需求而定。
ivan 于 2009-05-19 早上6点
最近评论