octl -

Admin-focused tool for interacting with Sensys. This tool has the ability to run as an interactive shell or as a single one-shot command. Currently the tool provides information about configured resources, sensors and, allows to modify the workflows of Sensys.

The octl command itself takes the following options:

Usage: octl [OPTIONS]
  Open Resilient Cluster Manager "octl" Tool

   -am <arg0>            Aggregate MCA parameter set file list
   -gomca|--gomca <arg0> <arg1>
                         Pass global MCA parameters that are applicable to
                         all contexts (arg0 is the parameter name; arg1 is
                         the parameter value)
-h|--help                This help message
-l|-config-file|--config-file <arg0>
                         Logical group configuration file for this orcm
                         chain
   -omca|--omca <arg0> <arg1>
                         Pass context-specific MCA parameters; they are
                         considered global if --gomca is not used and only
                         one context is specified (arg0 is the parameter
                         name; arg1 is the parameter value)
   -tune <arg0>          Application profile options file list
-v|--verbose             Be Verbose
-V|--version             Show version information

Interactive shell:
Use 'quit' or 'exit' - for exiting the shell
Use '<tab>' or '<?>' for help

The subcommands have the option to take arguments specific to that command as well.

The interactive mode of the CLI is invoked by running the command without any subcommands. Optional arguments such as MCA parameters can be specified as well.

% octl
*** WELCOME TO OCTL ***
 Possible commands:
   resource             Resource Information
   diag                 Diagnostics
   sensor               sensor
   notifier             notifier
   grouping             Logical Grouping Information
   Workflow             Workflow information
   store                Sensor store Commands
   query                Query data from DB
   chassis-id           Enable/Disable chassis identify LED.
   quit                 Exit the shell
octl>

Once in the interactive shell, the <tab> key can be used to either autocomplete unambiguous partial commands or list possible completions for ambiguous partial commands. The <?> key will display more information about all of the commands at the current hierarchy.

For example, pressing <tab> after entering res will autocomplete the resource command, and pressing <tab> after the resource command is fully entered, will show:

octl> resource
        status add remove drain resume
octl> resource

As another example, pressing <?> after entering sensor, will show:

octl> sensor
Possible commands:
   set                  Set Sensor Commands
   get                  Get Sensor Commands
   enable               Enable sampling for the current datagroup or sensor for a node-list: enable <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
   disable              Disable sampling for the current datagroup or sensor for a node-list: disable <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
   reset                Reset sampling to service load defaults for the current datagroup or sensor for a node-list: reset <node-list> <datagroup|"all"[:{sensor_label|"all"}]>

octl> sensor

Notice how in both cases, control is returned to the user to complete the command as desired.

To exit interactive mode type quit from the command prompt.

In the following sections, examples for each command will be given using normal one-shot execution mode. However, they can also be executed in interactive mode.

The resource command set (i.e. status, add/remove, drain/resume) is used to display and change information about the resources (nodes) configured in the system. Currently resource add/remove is not supported administratively.

The current implementation displays the node connection state: either up(U), down(D), or unknown(?) and the job state: allocated or unallocated. The node specification is an Sensys node regex.

Usage:
% octl resource status

Example output:

TOTAL NODES : 10
NODES                : STATE  SCHED_STATE
-----------------------------------------
node001              : U      UNALLOCATED
node[3:2-10]         : ?            UNDEF

This command only takes the nodes out of the available resource pool of the scheduler. It has nothing to do with the running status of the nodes.

Usage:

% octl resource drain <nodelist>

Example:

% octl resource drain c[2:1-10]

This command only brings the nodes into the available resource pool of the scheduler. It has nothing to do with the running status of the nodes.

Usage:

% octl resource resume <nodelist>

Example:

% octl resource resume c[2:1-10]

The diagnostic command set allows running diagnostics on remote Sensys daemons. These commands require a node regex specification for determining which remote daemons to run on. See the Sensys Regex section for details on how to construct the regex.

% octl diag cpu node[3:1-10]
Success

% octl diag eth node[3:1-10]
Success

% octl diag mem node[3:1-10]
Success

The workflow command allows the user to add/remove/list workflows from any management node on a cluster.

% octl workflow add workflow.xml [target]

After using the above command, user will be able to see the workflow_id on the console. Here Target is the aggregator list and is an optional argument. User can use the target argument and overwrite the aggregator info provided in the XML file

% octl workflow list target

After using the above command, user will be able to see a list of workflows running on a target/aggregator with workflow names and workflow ids

% octl workflow remove target workflow_name workflow_id

After using the above command, user will be able to remove a workflow running in the target. Here, wild-card character '*' is allowed for workflow_name and workflow_id. Remember that if wild-card character is used for workflow_name, all the workflows in target will be removed regardless of what the workflow_id will be.

Sample workflow XML files can be found at section Data Smoothing Algorithms Analytics.

It is mandatory to have name attribute for workflow and step tags. In every workflow, it is mandatory to have filter as the first step. Parser will not accept an attribute or element other than name or step used in workflow element.

Default loading of workflow file is supported and the workflow file should be located at the following location for the feature to work.

<user install directory>/etc/orcm-default-config.xml

If default workflow is found at the above location, it is loaded regardless of the aggregator tag present in the XML file

This command is designed to retrieve from the inventory database the list of sensors for given node(s) that can be used in the Analytics plugins (described immediately above in 3.1.1.6) The command syntax is:

% octl sensor get inventory <node-regex|logical-group> [sensor_search_string]

Or in the interactive shell:

octl> sensor get inventory <node-regex|logical-group> [sensor_search_string]

The node-regex is described in section Sensys Regex and the logical grouping is described in section Sensys Logical Grouping. The sensor_search_string at this time is limited to searching the stored plugin names. Valid formats include omitting the parameter to get all sensors matching the regex or logical name. Using '*' for sensor_search_string does the same action as omitting sensor_search_string. Other search syntax include partial plugin names (NOTE: search is done as if your string was followed by the wildcard character '*').

% octl sensor get inventory node01 ip

This will match all plugins that start with ip like ipmi. An equivalent would be:

% octl sensor get inventory node01 'ip*'

This explicitly has the wildcard implied by the first example. All wildcard usage must be a starts-with wildcard. In 99% of use cases this is sufficient. If this is not ideal for a specific case then the suggestion would be to make a script filter in bash, python or other language appropriate to the job. An example:

% octl sensor get inventory node01 '*pm*'

This will not match the ipmi plugin and will in-fact return an error.

The output is in standard CSV format with double quotes for both machine parsing (importable into most spreadsheet programs) and human readability. The first line for each new node output is a CSV header with names for the columns. The nodes are output one after another since each node may have different sensors listed in the database. This feature on the OCTL tool does not merge results into one set or union of sensors.

This command is designed to retrieve the sample rate for the individual sensor for a list of nodes. The command syntax is:

% octl sensor get sample-rate <sensor-name> <node-regex|logical-group>

Or in the interactive shell:

octl> sensor get sample-rate <sensor-name> <node-regex|logical-group>

% octl sensor get sample-rate base c[2:1-10]

% octl sensor get sample-rate coretemp c[2:1-10]

% octl sensor get sample-rate freq c[2:1-10]

This command is designed to set the sample rate for the individual sensor for a list of nodes. Note that for per sensor (e.g. coretemp, freq) sample rate, this command will take effect only if the sensor is started in a progress thread. The setting of the base sample rate does not have this requirement. The command syntax is:

% octl sensor set sample-rate <sensor-name> <node-regex|logical-group>

Or in the interactive shell:

octl> sensor set sample-rate <sensor-name> <node-regex|logical-group>

% octl sensor set sample-rate base 10 c[2:1-10]

Setting per sensor sample rate in the examples below will take effect only if the sensor is started in a progress thread.

% octl sensor set sample-rate coretemp 10 c[2:1-10]

% octl sensor set sample-rate freq 10 c[2:1-10]

The sensor sampling control is designed to enable, disable, or reset the sensor sampling for datagroups (plugin names). The orcmd service startup state of the sensor data sampling is controlled by the following set of MCA paramaters:

sensor_{sensor-name|"base"}collect_metrics = "true" | "false"

Using base turns on (true) or off (false) all sensors loaded using the sensor MCA parameter (only loaded sensors can be controlled not plugins excluded from being loaded). Using the sensor_name instead of base overrides the sensor_base_collect_metrics MCA parameter. The default values of individual datagroup (plugin) is inherited from sensor_base_collect_metrics at orcmd service load time. This octl command requires the orcmsched be running in the cluster.

The command and interactive shell

octl sensor {enable|disable|reset} {node-regex|logical-group} {<datagroup|"all"}

For example:

% octl sensor disable node01 all
% octl sensor enable node01 errcounts

After these commands node01 will only be logging data from the errcounts datagroup (plugin). Then:

% octl sensor reset node01 all

restores node01 to its original sampling state (defined by MCA parameters at orcmd service load time).

For finer control the following syntax can be used

octl sensor {enable|disable|reset} {node-regex|logical-group} {datagroup|"all"}:{sensor-label-name}

where sensor-label-name is and individual label from the datagroup. This effectively gives control over individual sensor data items. For example in the coretemp datagroup the sensor labels use the naming convention core_N_ where N is the zero based core number. So the following steps will have orcmd sample only core3 from the coretemp datagroup on node01.

% octl sensor disable node01 all
% octl sensor enable node01 errcounts:core3

The string all can also be used for labels except when the datagroup is specified as all. Using all:all is not legal, instead just use all. Also using all:{sensor-label-name} is not legal since there is no commonality of sensor-label-name between datagroups.

Not all current sensor datagroups respond to sensor sampling control. Notable exceptions are

heartbeat - This is not a real senor datagroup.
dmidata - No control possible as this doesn't collect periodic data.
resusage - Only datagroup control. No sensor level control possible.
nodepower - Only datagroup control. Only one sensor label is returned.
mcedata - Only datagroup control. Only one sensor label is returned.
syslog - Only datagroup control. Only one sensor label is returned.

Storage control is designed to control environmental(raw) and event data. There are couple of MCA parameters provided to control them respectively. store_raw_data controls environmental data and store_event_data controls event data.

But, to controls these MCA parameters on fly, following four OCTL commands are provided.

% octl store raw_data target

Note that this command controls only sensor environmental data. It doesn't turn on/off the other storage policies (like event data)

% octl store event_data target

Note that this command controls only event data. It doesn't turn on/off the other storage policies (like environmental data)

% octl store all target

% octl store none target

For octl store command Target means the aggregator list.

This component is used to query the database in order to obtain basic information of the nodes and its sensors. In order to get the query component working you need to meet this environment:

Set the following flags on the openmpi-mca-params.conf file:

db_postgres_database=orcmdb
db_postgres_user=orcmuser:orcmpassword
db_postgres_uri=localhost

Launch an octl console:

    % bin/octl

Whenever you need to specify a node list in a query command you can use:

Simple comma separated list. Example: host01,host02,host03
Sensys nodes regular expression: Example: host[2:1-3] to specify every node.
Sensys nodes with wildcards: Example: host* to specify every node.
'*' to specify every node.

query history

octl> query history [start-date [end-date]] <nodelist>

Returns all the data logged by the provided nodes (<nodelist>) during the specified time. Date has the format:

YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)

octl> query history 2015-11-13 15:00:00 2015-11-13 16:00:00 master4
TIME,HOSTNAME,DATA ITEM,VALUE,UNITS
2015-10-21 15:31:47,master4,procstat_orcmd_pid,11960,
2015-10-21 15:31:48,master4,procstat_running_processes,0,
2015-10-21 15:31:48,master4,procstat_zombie_processes,0,
2015-10-21 15:31:49,master4,coretemp_core0,23,degrees C
4 rows were found (0.091 seconds)

octl> query sensor <sensor-list> [start-date [end-date]]  <upper-bound lower-bound> [nodelist]

Returns the logged data corresponding to the given sensor, time and node list. Additionally, the query can be constrained within a range of values defined by an upper and lower bound. Notice that logs containing the bound values will not be included in the result. Date has the format:

YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)

octl> query sensor coretemp* 2015-11-13 14:00:00 2015-11-13 16:00:00 0.1 1 master4
TIME,HOSTNAME,DATA ITEM,VALUE,UNITS
2015-10-21 15:31:38,master4,coretemp_core 0,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 1,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 2,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 3,35,degrees C
...
10 rows were found (0.141 seconds)

octl> query log <search word> [start-date [end-date]] [nodelist]

Returns the logged data coming from the syslog of the corresponding to the given nodes and search word. The search word accepts the '*' wildcard. Date has the format:

YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)

octl> query log *access* master4,c01
HOSTNAME,SENSOR_LOG,MESSAGE
master4,syslog_log_message_0,<86>Oct 28 08:30:29 c01: access granted for user root (uid=0)
c01,syslog_log_message_0,<86>Oct 28 08:30:33 master4: access granted for user root (uid=0)
2 rows were found (0.056 seconds)

octl> query idle [minimum idle time] <nodelist>

Returns the nodes in <nodelist> that has been idle for the given time or more. Minimum idle time has the format:

#.#U
U = H for hours, M for minutes, S for seconds (default)
hh:mm:ss

octl> query idle 60 master4
octl> query idle 60S master4
NODE,IDLE_TIME
master4,03:13:59.024666
1 rows were found (0.016 seconds)

octl> query node status <nodelist>

Returns the status logged in the data base for the nodes in <nodelist>.

octl> query node status master4

There is no output for now as this query returns the field status on the node table and this field is not being populated yet.

octl> query event data [start-date [end-date]] <nodelist>

Returns events from database. Date has the format:

YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)

octl> query event data 2016-02-16 08:22:00 2016-02-16 08:22:14 master4
EVENT_ID,TIME,SEVERITY,TYPE,HOSTNAME,MESSAGE
66846,2016-02-16 08:22:04,CRITICAL,EXCEPTION,master4,core 0 value 44.00 degrees C,greater than threshold 25.00 degrees C
66847,2016-02-16 08:22:04,CRITICAL,EXCEPTION,master4,core 1 value 42.00 degrees C,greater than threshold 25.00 degrees C
66865,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 2 value 44.00 degrees C,greater than threshold 25.00 degrees C
66866,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 3 value 41.00 degrees C,greater than threshold 25.00 degrees C
66881,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 4 value 45.00 degrees C,greater than threshold 25.00 degrees C
66882,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 5 value 46.00 degrees C,greater than threshold 25.00 degrees C
6 rows were found (0.396 seconds)

octl> query event sensor-data <event-id> [interval] <sensor-list> [nodelist]

Returns the sensor data around an event. Interval has the format:

[before/after] #.#U
U = H for hours, M for minutes (default), S for seconds
[before/after] hh:mm:ss

octl> query event sensor-data 1 after 10S coretemp* master4
TIME,HOSTNAME,DATA_ITEM,VALUE,UNITS
2016-02-09 13:53:52,master4,coretemp_core 0,23,degrees C
2016-02-09 13:53:52,master4,coretemp_core 1,23,degrees C
2016-02-09 13:53:52,master4,coretemp_core 2,23,degrees C
3 rows were found (0.396 seconds)

Please notice that due to the amount of information that could result from the usage of the history or sensor subcommands, this tool is currently limiting those subcommands to return only 100 rows. If users need to inspect more data or require more SQL-oriented functionality, they are advised to access the DB directly by other means.

The notifier commands are used for configuring and querying the policies for error and exception notification during runtime. Following command line options are used for setting up severity levels and corresponding actions.

octl> notifier set policy <severity> <action> <nodelist>

Returns success - when operation is successful or Returns failed - with an error message.

octl> notifier set policy emerg  smtp   rack01
octl> notifier set policy alert  smtp   rack01
octl> notifier set policy crit   smtp   rack01
octl> notifier set policy error  syslog rack01
octl> notifier set policy warn   syslog rack01
octl> notifier set policy notice syslog rack01
octl> notifier set policy debug  syslog rack01
Notifier set policy on Node:rack01
Success

Following command line options are used for retrieving severity levels and corresponding actions:

octl> notifier get policy <nodelist>

Returns success - when operation is successful or Returns failed - with an error message.

octl> notifier get policy rack01
Node          Severity      Action
-----------------------------------
rack01        EMERG         smtp
rack01        ALERT         syslog
rack01        CRIT          syslog
rack01        ERROR         syslog
rack01        WARN          syslog
rack01        NOTICE        syslog
rack01        INFO          syslog
rack01        DEBUG         syslog

Following command line options are used for changing the existing smtp configuration:

octl> notifier set smtp-policy <key> <value> <nodelist>

Returns success - when operation is successful or Returns failed - with an error message.

octl> notifier set smtp-policy server_name <email-server>  rack01
octl> notifier set smtp-policy server_port <portno>  rack01
octl> notifier set smtp-policy to_addr <email-addr>  rack01
octl> notifier set smtp-policy from_addr <email-addr>  rack01
octl> notifier set smtp-policy from_name <name>  rack01
octl> notifier set smtp-policy subject <email-subject>  rack01
octl> notifier set smtp-policy body_preffix <email-body-preffix>  rack01
octl> notifier set smtp-policy body_suffix <email-body-suffix>  rack01
octl> notifier set smtp-policy priority <1-high,2-normal,3-low>  rack01
Notifier set smtp-policy on Node:rack01
Success

Following command line options are used for retrieving smtp configuration information from a given nodelist:

octl> notifier get smtp-policy <nodelist>

Returns success - when operation is successful or Returns failed - with an error message.

octl> notifier get smtp-policy rack01
NODE            SMTP_KEY          SMTP_VALUE
-----------------------------------------------
rack01          server_name   emailserver.com
rack01          server_port   25
rack01          to_addr       admin@email.com
rack01          from_addr     system@email.com
rack01          from_name     RAS Monitoring System
rack01          subject       EVENT-NOTIFICATION
rack01          body_prefix   NOTIFICATION MESSAGE BEGIN
rack01          body_suffix   NOTIFICATION MESSAGE END
rack01          priority      1

The chassis-id command is used for enabling or disabling the chassis ID LED from a given node. Every chassis-id action will be stored as an event into database. The command syntax is:

octl> chassis-id {state|enable [seconds]|disable} <node_list>

Where node_list could be

Sensys regex
logical group
a node list separated by a comma

The state sub-command retrieves the current state of the chassis ID LED on the requested nodes.

Usage:
    octl> chassis-id state myNode

Output:

    Node          Chassis ID LED
    -----------------------------------
    myNode               OFF

The enable sub-command turns ON the chassis ID LED on the requested nodes following the next rules:

The chassis ID LED will be turned ON the specified seconds.
If no seconds are specified, the chassis ID LED will be turned ON indefinitely.
The maximum value for seconds is 255 seconds.

Usage:
    octl> chassis-id enable [seconds] myNode

Output:
    myNode: Success!
Or:
    myNode: Failed
    Failure information

The disable sub-command turns OFF the chassis ID LED on the requested nodes

Usage:
    octl> chassis-id disable myNode

Output:
    myNode: Success!
Or:
    myNode: Failed
    Failure information

NOTE: This command is hardware-dependent and it might not be supported on all platforms.

NOTE: ipmi support is required for this feature.

NOTE: ipmi and nodepower sensors may interfere with chassis-id command due to limitations in the BMC and/or the ipmiutil driver.

NOTE: Both enable and disable subcommands perform write operations. It is important to configure BMC access with the proper privilege level, otherwise the chassis-id feature might not be able to change the LED state.

octl

Interactive CLI

Resource

resource status

resource drain

resource resume

Diag

Run cpu diagnostics on node001 through node010

Run ethernet diagnostics on node001 through node010

Run memory diagnostics on node001 through node010

Workflow

Adding the workflow

List the workflows

Remove the workflow

Sensor Inventory Listing

Examples:

Sensor Sample Rate Listing

Examples

Sensor Sample Rate Setting

Examples

Sensor Sampling Control

Storage Control

To store Environmental data

To store Event data

To store both Environmental and Event data

To disable storing both Environmental and Event data

Query

query history

EXAMPLE

query sensor:

EXAMPLE

query log:

EXAMPLE

query idle

EXAMPLE

query node status:

EXAMPLE

query event data

EXAMPLE

query event sensor-data

EXAMPLE

Notifier

notifier set policy

EXAMPLE

notifier get policy

EXAMPLE

notifier set smtp-policy

EXAMPLE

notifier get smtp-policy

EXAMPLE

Chassis-id

chassis-id state

chassis-id enable

chassis-id disable