octl
Admin-focused tool for interacting with Sensys. This tool has the ability to run as an interactive shell or as a single one-shot command. Currently the tool provides information about configured resources, sensors and, allows to modify the workflows of Sensys.
The octl command itself takes the following options:
Usage: octl [OPTIONS]
Open Resilient Cluster Manager "octl" Tool
-am <arg0> Aggregate MCA parameter set file list
-gomca|--gomca <arg0> <arg1>
Pass global MCA parameters that are applicable to
all contexts (arg0 is the parameter name; arg1 is
the parameter value)
-h|--help This help message
-l|-config-file|--config-file <arg0>
Logical group configuration file for this orcm
chain
-omca|--omca <arg0> <arg1>
Pass context-specific MCA parameters; they are
considered global if --gomca is not used and only
one context is specified (arg0 is the parameter
name; arg1 is the parameter value)
-tune <arg0> Application profile options file list
-v|--verbose Be Verbose
-V|--version Show version information
Interactive shell:
Use 'quit' or 'exit' - for exiting the shell
Use '<tab>' or '<?>' for help
The subcommands have the option to take arguments specific to that command as well.
Interactive CLI
The interactive mode of the CLI is invoked by running the command without any subcommands. Optional arguments such as MCA parameters can be specified as well.
% octl
*** WELCOME TO OCTL ***
Possible commands:
resource Resource Information
diag Diagnostics
sensor sensor
notifier notifier
grouping Logical Grouping Information
Workflow Workflow information
store Sensor store Commands
query Query data from DB
chassis-id Enable/Disable chassis identify LED.
quit Exit the shell
octl>
Once in the interactive shell, the <tab>
key can be used to either autocomplete unambiguous partial commands or list possible completions for ambiguous partial commands. The <?>
key will display more information about all of the commands at the current hierarchy.
For example, pressing <tab>
after entering res
will autocomplete the resource
command, and pressing <tab>
after the resource
command is fully entered, will show:
octl> resource
status add remove drain resume
octl> resource
As another example, pressing <?>
after entering sensor
, will show:
octl> sensor
Possible commands:
set Set Sensor Commands
get Get Sensor Commands
enable Enable sampling for the current datagroup or sensor for a node-list: enable <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
disable Disable sampling for the current datagroup or sensor for a node-list: disable <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
reset Reset sampling to service load defaults for the current datagroup or sensor for a node-list: reset <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
octl> sensor
Notice how in both cases, control is returned to the user to complete the command as desired.
To exit interactive mode type quit
from the command prompt.
In the following sections, examples for each command will be given using normal one-shot execution mode. However, they can also be executed in interactive mode.
Resource
The resource
command set (i.e. status
, add
/remove
, drain
/resume
) is used to display and change information about the resources (nodes) configured in the system. Currently resource add/remove
is not supported administratively.
resource status
The current implementation displays the node connection state: either up(U), down(D), or unknown(?) and the job state: allocated or unallocated. The node specification is an Sensys node regex.
Usage:
% octl resource status
Example output:
TOTAL NODES : 10
NODES : STATE SCHED_STATE
-----------------------------------------
node001 : U UNALLOCATED
node[3:2-10] : ? UNDEF
resource drain
This command only takes the nodes out of the available resource pool of the scheduler. It has nothing to do with the running status of the nodes.
Usage:
% octl resource drain <nodelist>
Example:
% octl resource drain c[2:1-10]
resource resume
This command only brings the nodes into the available resource pool of the scheduler. It has nothing to do with the running status of the nodes.
Usage:
% octl resource resume <nodelist>
Example:
% octl resource resume c[2:1-10]
Diag
The diagnostic
command set allows running diagnostics on remote Sensys daemons. These commands require a node regex specification for determining which remote daemons to run on. See the Sensys Regex section for details on how to construct the regex.
Run cpu diagnostics on node001 through node010
% octl diag cpu node[3:1-10]
Success
Run ethernet diagnostics on node001 through node010
% octl diag eth node[3:1-10]
Success
Run memory diagnostics on node001 through node010
% octl diag mem node[3:1-10]
Success
Workflow
The workflow
command allows the user to add/remove/list workflows from any management node on a cluster.
Adding the workflow
% octl workflow add workflow.xml [target]
After using the above command, user will be able to see the workflow_id
on the console. Here Target is the aggregator list and is an optional argument. User can use the target argument and overwrite the aggregator info provided in the XML file
List the workflows
% octl workflow list target
After using the above command, user will be able to see a list of workflows running on a target/aggregator with workflow names and workflow ids
Remove the workflow
% octl workflow remove target workflow_name workflow_id
After using the above command, user will be able to remove a workflow running in the target. Here, wild-card character '*' is allowed for workflow_name
and workflow_id
. Remember that if wild-card character is used for workflow_name
, all the workflows in target will be removed regardless of what the workflow_id
will be.
Sample workflow XML files can be found at section Data Smoothing Algorithms Analytics.
It is mandatory to have name attribute for workflow and step tags. In every workflow, it is mandatory to have filter
as the first step.
Parser will not accept an attribute or element other than name or step used in workflow element.
Default loading of workflow file is supported and the workflow file should be located at the following location for the feature to work.
<user install directory>/etc/orcm-default-config.xml
If default workflow is found at the above location, it is loaded regardless of the aggregator tag present in the XML file
Sensor Inventory Listing
This command is designed to retrieve from the inventory database the list of sensors for given node(s) that can be used in the Analytics plugins (described immediately above in 3.1.1.6) The command syntax is:
% octl sensor get inventory <node-regex|logical-group> [sensor_search_string]
Or in the interactive shell:
octl> sensor get inventory <node-regex|logical-group> [sensor_search_string]
The node-regex is described in section Sensys Regex and the logical grouping is described in section Sensys Logical Grouping.
The sensor_search_string
at this time is limited to searching the stored plugin names. Valid formats include omitting the parameter to get all sensors matching the regex or logical name. Using '*' for sensor_search_string
does the same action as omitting sensor_search_string
.
Other search syntax include partial plugin names (NOTE: search is done as if your string was followed by the wildcard character '*').
Examples:
% octl sensor get inventory node01 ip
This will match all plugins that start with ip like ipmi. An equivalent would be:
% octl sensor get inventory node01 'ip*'
This explicitly has the wildcard implied by the first example. All wildcard usage must be a starts-with wildcard. In 99% of use cases this is sufficient. If this is not ideal for a specific case then the suggestion would be to make a script filter in bash, python or other language appropriate to the job. An example:
% octl sensor get inventory node01 '*pm*'
This will not match the ipmi plugin and will in-fact return an error.
The output is in standard CSV format with double quotes for both machine parsing (importable into most spreadsheet programs) and human readability. The first line for each new node output is a CSV header with names for the columns. The nodes are output one after another since each node may have different sensors listed in the database. This feature on the OCTL tool does not merge results into one set or union of sensors.
Sensor Sample Rate Listing
This command is designed to retrieve the sample rate for the individual sensor for a list of nodes. The command syntax is:
% octl sensor get sample-rate <sensor-name> <node-regex|logical-group>
Or in the interactive shell:
octl> sensor get sample-rate <sensor-name> <node-regex|logical-group>
Examples
% octl sensor get sample-rate base c[2:1-10]
% octl sensor get sample-rate coretemp c[2:1-10]
% octl sensor get sample-rate freq c[2:1-10]
Sensor Sample Rate Setting
This command is designed to set the sample rate for the individual sensor for a list of nodes. Note that for per sensor (e.g. coretemp, freq) sample rate, this command will take effect only if the sensor is started in a progress thread. The setting of the base sample rate does not have this requirement. The command syntax is:
% octl sensor set sample-rate <sensor-name> <node-regex|logical-group>
Or in the interactive shell:
octl> sensor set sample-rate <sensor-name> <node-regex|logical-group>
Examples
% octl sensor set sample-rate base 10 c[2:1-10]
Setting per sensor sample rate in the examples below will take effect only if the sensor is started in a progress thread.
% octl sensor set sample-rate coretemp 10 c[2:1-10]
% octl sensor set sample-rate freq 10 c[2:1-10]
Sensor Sampling Control
The sensor sampling control is designed to enable, disable, or reset the sensor sampling for datagroups (plugin names). The orcmd
service startup state of the sensor data sampling is controlled by the following set of MCA paramaters:
sensor_{sensor-name|"base"}collect_metrics = "true" | "false"
Using base
turns on (true
) or off (false
) all sensors loaded using the sensor
MCA parameter (only loaded sensors can be controlled not plugins excluded from being loaded). Using the sensor_name
instead of base
overrides the sensor_base_collect_metrics
MCA parameter. The default values of individual datagroup (plugin) is inherited from sensor_base_collect_metrics
at orcmd
service load time. This octl
command requires the orcmsched
be running in the cluster.
The command and interactive shell
octl sensor {enable|disable|reset} {node-regex|logical-group} {<datagroup|"all"}
For example:
% octl sensor disable node01 all
% octl sensor enable node01 errcounts
After these commands node01 will only be logging data from the errcounts datagroup (plugin). Then:
% octl sensor reset node01 all
restores node01 to its original sampling state (defined by MCA parameters at orcmd
service load time).
For finer control the following syntax can be used
octl sensor {enable|disable|reset} {node-regex|logical-group} {datagroup|"all"}:{sensor-label-name}
where sensor-label-name
is and individual label from the datagroup
. This effectively gives control over individual sensor data items. For example in the coretemp
datagroup
the sensor labels use the naming convention core_N_
where N
is the zero based core number. So the following steps will have orcmd
sample only core3
from the coretemp
datagroup
on node01
.
% octl sensor disable node01 all
% octl sensor enable node01 errcounts:core3
The string all
can also be used for labels except when the datagroup
is specified as all
. Using all:all
is not legal, instead just use all
. Also using all:{sensor-label-name}
is not legal since there is no commonality of sensor-label-name between datagroups
.
Not all current sensor datagroups respond to sensor sampling control. Notable exceptions are
heartbeat
- This is not a real senor datagroup.dmidata
- No control possible as this doesn't collect periodic data.resusage
- Only datagroup control. No sensor level control possible.nodepower
- Only datagroup control. Only one sensor label is returned.mcedata
- Only datagroup control. Only one sensor label is returned.syslog
- Only datagroup control. Only one sensor label is returned.
Storage Control
Storage control is designed to control environmental(raw) and event data. There are couple of MCA parameters provided to control them respectively. store_raw_data
controls environmental data and store_event_data
controls event data.
But, to controls these MCA parameters on fly, following four OCTL commands are provided.
To store Environmental data
% octl store raw_data target
Note that this command controls only sensor environmental data. It doesn't turn on/off the other storage policies (like event data)
To store Event data
% octl store event_data target
Note that this command controls only event data. It doesn't turn on/off the other storage policies (like environmental data)
To store both Environmental and Event data
% octl store all target
To disable storing both Environmental and Event data
% octl store none target
For octl store
command Target means the aggregator list.
Query
This component is used to query the database in order to obtain basic information of the nodes and its sensors. In order to get the query component working you need to meet this environment:
- Set the following flags on the
openmpi-mca-params.conf
file:
db_postgres_database=orcmdb
db_postgres_user=orcmuser:orcmpassword
db_postgres_uri=localhost
- Launch an octl console:
% bin/octl
Whenever you need to specify a node list in a query command you can use:
- Simple comma separated list. Example:
host01,host02,host03
- Sensys nodes regular expression: Example:
host[2:1-3]
to specify every node. - Sensys nodes with wildcards: Example:
host*
to specify every node. - '*' to specify every node.
query history
octl> query history [start-date [end-date]] <nodelist>
Returns all the data logged by the provided nodes (<nodelist>
) during the specified time.
Date has the format:
YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)
EXAMPLE
octl> query history 2015-11-13 15:00:00 2015-11-13 16:00:00 master4
TIME,HOSTNAME,DATA ITEM,VALUE,UNITS
2015-10-21 15:31:47,master4,procstat_orcmd_pid,11960,
2015-10-21 15:31:48,master4,procstat_running_processes,0,
2015-10-21 15:31:48,master4,procstat_zombie_processes,0,
2015-10-21 15:31:49,master4,coretemp_core0,23,degrees C
4 rows were found (0.091 seconds)
query sensor:
octl> query sensor <sensor-list> [start-date [end-date]] <upper-bound lower-bound> [nodelist]
Returns the logged data corresponding to the given sensor, time and node list. Additionally, the query can be constrained within a range of values defined by an upper and lower bound. Notice that logs containing the bound values will not be included in the result. Date has the format:
YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)
EXAMPLE
octl> query sensor coretemp* 2015-11-13 14:00:00 2015-11-13 16:00:00 0.1 1 master4
TIME,HOSTNAME,DATA ITEM,VALUE,UNITS
2015-10-21 15:31:38,master4,coretemp_core 0,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 1,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 2,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 3,35,degrees C
...
10 rows were found (0.141 seconds)
query log:
octl> query log <search word> [start-date [end-date]] [nodelist]
Returns the logged data coming from the syslog of the corresponding to the given nodes and search word. The search word accepts the '*' wildcard. Date has the format:
YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)
EXAMPLE
octl> query log *access* master4,c01
HOSTNAME,SENSOR_LOG,MESSAGE
master4,syslog_log_message_0,<86>Oct 28 08:30:29 c01: access granted for user root (uid=0)
c01,syslog_log_message_0,<86>Oct 28 08:30:33 master4: access granted for user root (uid=0)
2 rows were found (0.056 seconds)
query idle
octl> query idle [minimum idle time] <nodelist>
Returns the nodes in <nodelist>
that has been idle for the given time or more.
Minimum idle time has the format:
#.#U
U = H for hours, M for minutes, S for seconds (default)
hh:mm:ss
EXAMPLE
octl> query idle 60 master4
octl> query idle 60S master4
NODE,IDLE_TIME
master4,03:13:59.024666
1 rows were found (0.016 seconds)
query node status:
octl> query node status <nodelist>
Returns the status logged in the data base for the nodes in <nodelist>
.
EXAMPLE
octl> query node status master4
There is no output for now as this query returns the field status
on the node
table and this field is not being populated yet.
query event data
octl> query event data [start-date [end-date]] <nodelist>
Returns events from database. Date has the format:
YYYY-MM-DD hh:mm:ss
YYYY-MM-DD (00:00:00 will be added)
hh:mm:ss (current day will be added)
EXAMPLE
octl> query event data 2016-02-16 08:22:00 2016-02-16 08:22:14 master4
EVENT_ID,TIME,SEVERITY,TYPE,HOSTNAME,MESSAGE
66846,2016-02-16 08:22:04,CRITICAL,EXCEPTION,master4,core 0 value 44.00 degrees C,greater than threshold 25.00 degrees C
66847,2016-02-16 08:22:04,CRITICAL,EXCEPTION,master4,core 1 value 42.00 degrees C,greater than threshold 25.00 degrees C
66865,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 2 value 44.00 degrees C,greater than threshold 25.00 degrees C
66866,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 3 value 41.00 degrees C,greater than threshold 25.00 degrees C
66881,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 4 value 45.00 degrees C,greater than threshold 25.00 degrees C
66882,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 5 value 46.00 degrees C,greater than threshold 25.00 degrees C
6 rows were found (0.396 seconds)
query event sensor-data
octl> query event sensor-data <event-id> [interval] <sensor-list> [nodelist]
Returns the sensor data around an event. Interval has the format:
[before/after] #.#U
U = H for hours, M for minutes (default), S for seconds
[before/after] hh:mm:ss
EXAMPLE
octl> query event sensor-data 1 after 10S coretemp* master4
TIME,HOSTNAME,DATA_ITEM,VALUE,UNITS
2016-02-09 13:53:52,master4,coretemp_core 0,23,degrees C
2016-02-09 13:53:52,master4,coretemp_core 1,23,degrees C
2016-02-09 13:53:52,master4,coretemp_core 2,23,degrees C
3 rows were found (0.396 seconds)
Please notice that due to the amount of information that could result from the usage of the history
or sensor
subcommands, this tool is currently limiting those subcommands to return only 100 rows. If users need to inspect more data or require more SQL-oriented functionality, they are advised to access the DB directly by other means.
Notifier
The notifier
commands are used for configuring and querying the policies for
error and exception notification during runtime.
Following command line options are used for setting up severity levels and corresponding actions.
notifier set policy
octl> notifier set policy <severity> <action> <nodelist>
Returns success - when operation is successful or Returns failed - with an error message.
EXAMPLE
octl> notifier set policy emerg smtp rack01
octl> notifier set policy alert smtp rack01
octl> notifier set policy crit smtp rack01
octl> notifier set policy error syslog rack01
octl> notifier set policy warn syslog rack01
octl> notifier set policy notice syslog rack01
octl> notifier set policy debug syslog rack01
Notifier set policy on Node:rack01
Success
Following command line options are used for retrieving severity levels and corresponding actions:
notifier get policy
octl> notifier get policy <nodelist>
Returns success - when operation is successful or Returns failed - with an error message.
EXAMPLE
octl> notifier get policy rack01
Node Severity Action
-----------------------------------
rack01 EMERG smtp
rack01 ALERT syslog
rack01 CRIT syslog
rack01 ERROR syslog
rack01 WARN syslog
rack01 NOTICE syslog
rack01 INFO syslog
rack01 DEBUG syslog
Following command line options are used for changing the existing smtp configuration:
notifier set smtp-policy
octl> notifier set smtp-policy <key> <value> <nodelist>
Returns success - when operation is successful or Returns failed - with an error message.
EXAMPLE
octl> notifier set smtp-policy server_name <email-server> rack01
octl> notifier set smtp-policy server_port <portno> rack01
octl> notifier set smtp-policy to_addr <email-addr> rack01
octl> notifier set smtp-policy from_addr <email-addr> rack01
octl> notifier set smtp-policy from_name <name> rack01
octl> notifier set smtp-policy subject <email-subject> rack01
octl> notifier set smtp-policy body_preffix <email-body-preffix> rack01
octl> notifier set smtp-policy body_suffix <email-body-suffix> rack01
octl> notifier set smtp-policy priority <1-high,2-normal,3-low> rack01
Notifier set smtp-policy on Node:rack01
Success
Following command line options are used for retrieving smtp configuration information from a given nodelist:
notifier get smtp-policy
octl> notifier get smtp-policy <nodelist>
Returns success - when operation is successful or Returns failed - with an error message.
EXAMPLE
octl> notifier get smtp-policy rack01
NODE SMTP_KEY SMTP_VALUE
-----------------------------------------------
rack01 server_name emailserver.com
rack01 server_port 25
rack01 to_addr admin@email.com
rack01 from_addr system@email.com
rack01 from_name RAS Monitoring System
rack01 subject EVENT-NOTIFICATION
rack01 body_prefix NOTIFICATION MESSAGE BEGIN
rack01 body_suffix NOTIFICATION MESSAGE END
rack01 priority 1
Chassis-id
The chassis-id
command is used for enabling or disabling the chassis ID LED from a given node. Every chassis-id
action will be stored as an event into database. The command syntax is:
octl> chassis-id {state|enable [seconds]|disable} <node_list>
Where node_list
could be
- Sensys regex
- logical group
- a node list separated by a comma
chassis-id state
The state
sub-command retrieves the current state of the chassis ID LED on the requested nodes.
Usage:
octl> chassis-id state myNode
Output:
Node Chassis ID LED
-----------------------------------
myNode OFF
chassis-id enable
The enable
sub-command turns ON the chassis ID LED on the requested nodes following the next rules:
- The chassis ID LED will be turned ON the specified
seconds
. - If no
seconds
are specified, the chassis ID LED will be turned ON indefinitely. - The maximum value for
seconds
is 255 seconds.
Usage:
octl> chassis-id enable [seconds] myNode
Output:
myNode: Success!
Or:
myNode: Failed
Failure information
chassis-id disable
The disable
sub-command turns OFF the chassis ID LED on the requested nodes
Usage:
octl> chassis-id disable myNode
Output:
myNode: Success!
Or:
myNode: Failed
Failure information
NOTE: This command is hardware-dependent and it might not be supported on all platforms.
NOTE: ipmi
support is required for this feature.
NOTE: ipmi
and nodepower
sensors may interfere with chassis-id
command due to limitations in the BMC and/or the ipmiutil
driver.
NOTE: Both enable
and disable
subcommands perform write operations. It is important to configure BMC access with the proper privilege level, otherwise the chassis-id
feature might not be able to change the LED state.