Setting up pre-requisites for RAS monitoring

We need to enable the Baseboard Management Controller (BMC) and configure the IPMI libraries for some of the functionality under RAS monitoring. Especially to enable the "ipmi" plugin. Following are the instructions for setting up BMC in BIOS for compute nodes and service nodes (aggregator):

Enable BMC in BIOS, how to do this is specific to your BIOS. Refer to your server manual for guidance.

These instructions work on our specific development machine:

In BIOS settings, go to Server Management tab:

  1. Enable Plug & Play BMC Detection.
  2. Select BMC Lan Configuration
    • Select IP source as dynamic
    • Scroll down, select user Id, and select root
    • Enable User Status.
    • Select Password, and enter a password

Once BMC is enabled, you can optionally configure it within Linux using ipmitool:

for module in ipmi_devintf ipmi_si ipmi_msghandler; do
    /sbin/modprobe $module
done

$ ipmitool lan set 1 ipsrc static
$ ipmitool lan set 1 ipaddr <IP>
$ ipmitool lan set 1 netmask <NETMASK>
$ ipmitool lan set 1 defgw ipaddr <GATEWAY>

Check settings with:

$ ipmitool lan print 1

Setup user:

$ ipmitool user set name 2 <user>
$ ipmitool user set password 2 <password>
$ ipmitool channel setaccess 1 2 link=on ipmi=on callin=on privilege=4
$ ipmitool user enable 2

If users need to use the "coretemp" plug-in they need the following kernel module on all nodes:

% modprobe coretemp

If users need to use the "componentpower" plug-in they need the following for setting up MSR on compute nodes and aggregator nodes:

$ ls -la /dev/cpu/<cpu #>/msr

If the "msr" file is not present, please load MSR module by running:

% modprobe msr

If users need to use the "dmidata" plug-in they need the following for accessing DMI information:

$ ls -la /sys/firmware/dmi

If the "dmi" file is not present, please load the dmi-sysfs module by running:

% modprobe dmi_sysfs

If users need to use the "errcounts" plug-in they need the following for accessing ECC error counts information:

$ ls -la /sys/devices/system/edac/mc*

If one or more "mcN" directories are not present, please load the edac modules by running:

% modprobe edac-core
% modprobe sb_edac