- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Barreleye
Lustre Server CPU Usage
Each portion of CPU usage is for all servers reported in a separate table i.e measurement. The reported usage states are:
idle
Is reported in aggregation.cpu-average.cpu.idle
. When there is really nothing the kernel can do, it just as to waste away this slice of time. Technically, when the runnable queue is empty and there are no I/O operations going on, the CPU usage is marked as idle.
system
Is reported in aggregation.cpu-average.cpu.system
. This means the CPU is running kernel code. This includes device drivers and kernel modules.
user
Is reported in aggregation.cpu-average.cpu.user
. The CPU is running code in user-mode. This includes your application code. Note that if an application tries to read from disk or write to network, it actually goes to sleep while the kernel performs that work, and wakes up the application again.
steal
Is reported in aggregation.cpu-average.cpu.steal
. DDN Lustre servers are virtual machines. In a virtualized environment, the hypervisor may “steal” cycles that are meant for your CPUs and give them to another, for various reasons. This time is accounted for as steal.
nice
Is reported in aggregation.cpu-average.cpu.nice
. The user code can be executed in “normal” priority, or various degrees of “below normal” priority. You can, for example, run some kind of report generation process at a lower priority and interactive processes at normal priority. Nice is when the CPU is executing a user task having below-normal priority.
wait
Is reported in aggregation.cpu-average.cpu.wait
. Sometimes the CPU has only one thing to do – wait for the results of a disk/network read/write. This isn’t as uncommon as you’d think. A file server for example would nearly spend all it’s life waiting for disk reads and network writes to complete. I/O Wait is when the CPU is waiting for an I/O operation to complete, and the CPU can’t be used for anything else.
interrupt & softirq
interrupt is reported in aggregation.cpu-average.cpu.interrupt
, softirq is reported in aggregation.cpu-average.cpu.softirq
. Both cases tell that the kernel is servicing interrupt requests.
Visualizations
The data collected in the tables are visualized in the Grafana dashboard "CPU usage by Type per Server" on mon-login01.
Table Structure
Each of the tables has the same structure.
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
hawk-oss01 | ||
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 |
Please not that the explanation of CPU usage states was taken from https://www.opsdash.com/blog/cpu-usage-linux.html
Network Data
The Infiniband counters of the Lustre Servers are collected in four tables, i.e. measurements. Two of which, i.e. counters_error and counters_info, report port based metrics whereas the other two, i.e. hw_counters_error and hw_counters_info, report function based metrics.
Visualizations
The data collected in the tables are visualized in the Grafana dashboard Network Metrics by Server (Selectable) on mon-login01.
Table Structure
The four tables have the same structure with respect to the tag keys. The counters are differentiated by optype
.
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
driver_index | 0 | |
1 | ||
driver_type | mlx5 | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
hawk-oss01 | ||
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
optype | VL15_dropped | |
link_downed | ||
link_error_recovery | ||
local_link_integrity_errors | ||
port_rcv_constraint_errors | ||
port_rcv_remote_physical_errors | ||
port_rcv_switch_relay_errors | ||
port_xmit_constraint_errors | ||
port_xmit_discards | ||
symbol_error | ||
port_number | 1 |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
driver_index | 0 | |
1 | ||
driver_type | mlx5 | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
hawk-oss01 | ||
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
optype | excessive_buffer_overrun_errors | |
port_rcv_data | ||
port_rcv_errors | ||
port_rcv_packets | ||
port_xmit_data | ||
port_xmit_packets | ||
port_number | 1 |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
driver_index | 0 | |
1 | ||
driver_type | mlx5 | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
hawk-oss01 | ||
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
optype | duplicate_request | |
implied_nak_seq_err | ||
local_ack_timeout_err | ||
out_of_buffer | ||
out_of_sequence | ||
packet_seq_err | ||
req_cqe_error | ||
req_cqe_flush_error | ||
req_remote_access_errors | ||
req_remote_invalid_request | ||
resp_cqe_error | ||
resp_cqe_flush_error | ||
resp_local_length_error | ||
resp_remote_access_errors | ||
rnr_nak_retry_err | ||
rx_icrc_encapsulated | ||
port_number | 1 |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
driver_index | 0 | |
1 | ||
driver_type | mlx5 | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
hawk-oss01 | ||
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
optype | lifespan | |
roce_adp_retrans | ||
roce_adp_retrans_to | ||
roce_slow_restart | ||
roce_slow_restart_cnps | ||
roce_slow_restart_trans | ||
rx_atomic_requests | ||
rx_dct_connect | ||
rx_read_requests | ||
rx_write_requests | ||
port_number | 1 |
For more information about Infiniband counters please see https://enterprise-support.nvidia.com/s/article/understanding-mlx5-linux-counters-and-status-parameters.
Meta Data Server (MDS) and Meta Data Target (MDT) Metrics
While the table cq_md_stats_by_optype reports meta data operations agains the full file system, all other measurements of metrics collected from the lustre meta data servers and accordingly the meta data targets can in principle be grouped into three different categories.
- Metrics by MDS and MDT.
- md_stats
- md_stats_max
- md_stats_min
- md_stats_sum
- md_stats_sumsq
- mdt_filesinfo_free
- mdt_filesinfo_total
- mdt_filesinfo_used
- mdt_kbytesinfo_free
- mdt_kbytesinfo_total
- mdt_kbytesinfo_used
- Metrics differentiating between user-, group, and job-id.
- cq_mdt_acctuser_samples_by_user_id
- cq_mdt_jobstats_samples_by_ll_job_gid
- cq_mdt_jobstats_samples_by_ll_job_id
- cq_mdt_jobstats_samples_by_ll_job_uid
- mdt_acctuser_samples
- mdt_jobstats_max
- mdt_jobstats_min
- mdt_jobstats_samples
- mdt_jobstats_sum
- mdt_jobstats_sumsq
- Metrics differentiating between clients.
- exp_md_stats
- exp_md_stats_max_latency
- exp_md_stats_min_latency
- exp_md_stats_sum_latency
- exp_md_stats_sumsq_latency
Total Meta Data Operations
The table cq_md_stats_by_optype collects the total sum of meta data operations against the complete file system in a continuous query.
Visualizations
The Dashboard ws10-barreleye uses the table in the panel "Lustre Aggregated Metadata".
Table Structure
Key | Value | Explanation |
---|---|---|
optype | close | |
getattr | ||
getxattr | ||
mkdir | ||
mknod | ||
open | ||
rename | ||
rmdir | ||
setattr | ||
setxattr | ||
statfs | ||
unlink | ||
sum | float |
Metrics by MDS and MDT.
Meta Data Operations grouped by MDS and MDT.
The table md_stats collects meta data operations per meta data target. The table shares its structure with
- md_stats_max
- md_stats_min
- md_stats_sum
- md_stats_sumsq
In which ....
Visualizations
The Dashboard ws10-barreleye uses the table in the panel "Lustre Aggregated Metadata".
Table Structure
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | close | |
getattr | ||
getxattr | ||
mkdir | ||
mknod | ||
open | ||
rename | ||
rmdir | ||
setattr | ||
setxattr | ||
statfs | ||
unlink | ||
value | float |
Inode and filespace usage by MDS and MDT.
The tables md_filesinfo_* collect information about the number of free, total and used inodes on each MDT while the tables mdt_kbytesinfo_* collect information about the free, total and used filespace on each MDT.
All six tables share the same structure.
Visualizations
...
Table Structure
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
value | float |
Metrics differentiating between user-, group, and job-id
Key | Value | Explanation |
---|---|---|
optype | usage_inodes | |
usage_kbytes | ||
user_id | 0 | |
1001 | ||
1002 | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 | ||
... | ||
sum | float |
Key | Value | Explanation |
---|---|---|
ll_job_gid | 0 | |
0:0 | ||
0:0: | ||
11142 | ||
12793 | ||
12801 | ||
12803 | ||
12812 | ||
12831 | ||
12833 | ||
... | ||
optype | close | |
crossdir_rename | ||
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
punch | ||
read_bytes | ||
rename | ||
rmdir | ||
samedir_rename | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink | ||
write_bytes |
Key | Value | Explanation |
---|---|---|
ll_job_id | 0 | |
1 | ||
10 | ||
100010.cl1intern__1 | ||
100010.cl1intern__I | ||
100010.cl1intern__S | ||
100010.cl1intern__a | ||
100010.cl1intern__c | ||
100010.cl1intern__d | ||
100010.cl1intern__f | ||
... | ||
optype | close | |
crossdir_rename | ||
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
punch | ||
read_bytes | ||
rename | ||
rmdir | ||
samedir_rename | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink | ||
write_bytes |
Key | Value | Explanation |
---|---|---|
ll_job_uid | 0 | |
0: | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 | ||
14207 | ||
... | ||
optype | close | |
crossdir_rename | ||
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
punch | ||
read_bytes | ||
rename | ||
rmdir | ||
samedir_rename | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink | ||
write_bytes |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | usage_inodes | |
usage_kbytes | ||
user_id | 0 | |
1001 | ||
1002 | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
job_id | 0:0:2072485.hawk-pbs5 | |
0:0:2073250.hawk-pbs5 | ||
0:0:2073899.hawk-pbs5 | ||
0:0:2074114.hawk-pbs5 | ||
0:0:2075166.hawk-pbs5 | ||
0:0:2075906.hawk-pbs5 | ||
0:0:2077436.hawk-pbs5 | ||
0:0:2079673.hawk-pbs5 | ||
0:0:2081442.hawk-pbs5 | ||
0:0:2081474.hawk-pbs5 | ||
ll_job_gid | 0 | |
0:0 | ||
0:0: | ||
11142 | ||
12793 | ||
12801 | ||
12803 | ||
12812 | ||
12831 | ||
12833 | ||
ll_job_id | 0 | |
1 | ||
10 | ||
100010.cl1intern__1 | ||
100010.cl1intern__I | ||
100010.cl1intern__S | ||
100010.cl1intern__a | ||
100010.cl1intern__c | ||
100010.cl1intern__d | ||
100010.cl1intern__f | ||
ll_job_uid | 0 | |
0: | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 | ||
14207 | ||
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | max_punch | |
max_read_bytes | ||
max_write_bytes |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
job_id | 0:0:2072485.hawk-pbs5 | |
0:0:2073250.hawk-pbs5 | ||
0:0:2073899.hawk-pbs5 | ||
0:0:2074114.hawk-pbs5 | ||
0:0:2075166.hawk-pbs5 | ||
0:0:2075906.hawk-pbs5 | ||
0:0:2077436.hawk-pbs5 | ||
0:0:2079673.hawk-pbs5 | ||
0:0:2081442.hawk-pbs5 | ||
0:0:2081474.hawk-pbs5 | ||
ll_job_gid | 0 | |
0:0 | ||
0:0: | ||
11142 | ||
12793 | ||
12801 | ||
12803 | ||
12812 | ||
12831 | ||
12833 | ||
ll_job_id | 0 | |
1 | ||
10 | ||
100010.cl1intern__1 | ||
100010.cl1intern__I | ||
100010.cl1intern__S | ||
100010.cl1intern__a | ||
100010.cl1intern__c | ||
100010.cl1intern__d | ||
100010.cl1intern__f | ||
ll_job_uid | 0 | |
0: | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 | ||
14207 | ||
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | min_punch | |
min_read_bytes | ||
min_write_bytes |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
job_id | 0:0:2072485.hawk-pbs5 | |
0:0:2073250.hawk-pbs5 | ||
0:0:2073899.hawk-pbs5 | ||
0:0:2074114.hawk-pbs5 | ||
0:0:2075166.hawk-pbs5 | ||
0:0:2075906.hawk-pbs5 | ||
0:0:2077436.hawk-pbs5 | ||
0:0:2079673.hawk-pbs5 | ||
0:0:2081442.hawk-pbs5 | ||
0:0:2081474.hawk-pbs5 | ||
ll_job_gid | 0 | |
0:0 | ||
0:0: | ||
11142 | ||
12793 | ||
12801 | ||
12803 | ||
12812 | ||
12831 | ||
12833 | ||
ll_job_id | 0 | |
1 | ||
10 | ||
100010.cl1intern__1 | ||
100010.cl1intern__I | ||
100010.cl1intern__S | ||
100010.cl1intern__a | ||
100010.cl1intern__c | ||
100010.cl1intern__d | ||
100010.cl1intern__f | ||
ll_job_uid | 0 | |
0: | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 | ||
14207 | ||
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | close | |
crossdir_rename | ||
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
punch | ||
read_bytes | ||
rename | ||
rmdir | ||
samedir_rename | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink | ||
write_bytes |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
job_id | 0:0:2072485.hawk-pbs5 | |
0:0:2073250.hawk-pbs5 | ||
0:0:2073899.hawk-pbs5 | ||
0:0:2074114.hawk-pbs5 | ||
0:0:2075166.hawk-pbs5 | ||
0:0:2075906.hawk-pbs5 | ||
0:0:2077436.hawk-pbs5 | ||
0:0:2079673.hawk-pbs5 | ||
0:0:2081442.hawk-pbs5 | ||
0:0:2081474.hawk-pbs5 | ||
ll_job_gid | 0 | |
0:0 | ||
0:0: | ||
11142 | ||
12793 | ||
12801 | ||
12803 | ||
12812 | ||
12831 | ||
12833 | ||
ll_job_id | 0 | |
1 | ||
10 | ||
100010.cl1intern__1 | ||
100010.cl1intern__I | ||
100010.cl1intern__S | ||
100010.cl1intern__a | ||
100010.cl1intern__c | ||
100010.cl1intern__d | ||
100010.cl1intern__f | ||
ll_job_uid | 0 | |
0: | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 | ||
14207 | ||
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | sum_punch | |
sum_read_bytes | ||
sum_write_bytes |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
job_id | 0:0:2072485.hawk-pbs5 | |
0:0:2073250.hawk-pbs5 | ||
0:0:2073899.hawk-pbs5 | ||
0:0:2074114.hawk-pbs5 | ||
0:0:2075166.hawk-pbs5 | ||
0:0:2075906.hawk-pbs5 | ||
0:0:2077436.hawk-pbs5 | ||
0:0:2079673.hawk-pbs5 | ||
0:0:2081442.hawk-pbs5 | ||
0:0:2081474.hawk-pbs5 | ||
ll_job_gid | 0 | |
0:0 | ||
0:0: | ||
11142 | ||
12793 | ||
12801 | ||
12803 | ||
12812 | ||
12831 | ||
12833 | ||
ll_job_id | 0 | |
1 | ||
10 | ||
100010.cl1intern__1 | ||
100010.cl1intern__I | ||
100010.cl1intern__S | ||
100010.cl1intern__a | ||
100010.cl1intern__c | ||
100010.cl1intern__d | ||
100010.cl1intern__f | ||
ll_job_uid | 0 | |
0: | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
13967 | ||
14207 | ||
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | sumsq_punch | |
sumsq_read_bytes | ||
sumsq_write_bytes |
Metrics differentiating between clients
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
exp_client | 0 | |
10.148.0.32 | ||
10.148.0.33 | ||
10.148.0.34 | ||
10.148.0.36 | ||
10.148.0.37 | ||
10.148.0.38 | ||
10.148.0.39 | ||
10.148.0.40 | ||
10.148.0.41 | ||
... | ||
exp_type | lo | |
o2ib20 | ||
o2ib43 | ||
o2ib44 | ||
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | close | |
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
rename | ||
rmdir | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
exp_client | 0 | |
10.148.0.32 | ||
10.148.0.33 | ||
10.148.0.34 | ||
10.148.0.36 | ||
10.148.0.37 | ||
10.148.0.38 | ||
10.148.0.39 | ||
10.148.0.40 | ||
10.148.0.41 | ||
... | ||
exp_type | lo | |
o2ib20 | ||
o2ib43 | ||
o2ib44 | ||
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | close | |
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
rename | ||
rmdir | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
exp_client | 0 | |
10.148.0.32 | ||
10.148.0.33 | ||
10.148.0.34 | ||
10.148.0.36 | ||
10.148.0.37 | ||
10.148.0.38 | ||
10.148.0.39 | ||
10.148.0.40 | ||
10.148.0.41 | ||
... | ||
exp_type | lo | |
o2ib20 | ||
o2ib43 | ||
o2ib44 | ||
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | close | |
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
rename | ||
rmdir | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
exp_client | 0 | |
10.148.0.32 | ||
10.148.0.33 | ||
10.148.0.34 | ||
10.148.0.36 | ||
10.148.0.37 | ||
10.148.0.38 | ||
10.148.0.39 | ||
10.148.0.40 | ||
10.148.0.41 | ||
... | ||
exp_type | lo | |
o2ib20 | ||
o2ib43 | ||
o2ib44 | ||
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | close | |
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
rename | ||
rmdir | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
exp_client | 0 | |
10.148.0.32 | ||
10.148.0.33 | ||
10.148.0.34 | ||
10.148.0.36 | ||
10.148.0.37 | ||
10.148.0.38 | ||
10.148.0.39 | ||
10.148.0.40 | ||
10.148.0.41 | ||
... | ||
exp_type | lo | |
o2ib20 | ||
o2ib43 | ||
o2ib44 | ||
fqdn | hawk-mds01 | |
hawk-mds02 | ||
fs_name | exafs | |
mdt_index | MDT0000 | |
MDT0001 | ||
MDT0002 | ||
MDT0003 | ||
optype | close | |
getattr | ||
getxattr | ||
link | ||
mkdir | ||
mknod | ||
open | ||
rename | ||
rmdir | ||
setattr | ||
setxattr | ||
statfs | ||
sync | ||
unlink |
Object Storage Server (OSS) and Object Storage Target (OST) Metrics
While the tables
- cq_ost_brw_stats_rpc_bulk_samples_by_size
- cq_ost_kbytesinfo_used_by_fs_name
- cq_ost_stats_bytes_by_optype
report operation and usage stats of the full file system, all other measurements of metrics collected from the lustre object storage servers and accordingly the object storage targets can in principle be grouped into four different categories.
- Metrics by OSS
- ost_io_stats_ost_punch_max
- ost_io_stats_ost_punch_mean
- ost_io_stats_ost_punch_mean_square
- ost_io_stats_ost_punch_min
- ost_io_stats_ost_punch_samples
- ost_io_stats_ost_punch_sum
- ost_io_stats_ost_punch_sum_square
- ost_io_stats_ost_read_max
- ost_io_stats_ost_read_mean
- ost_io_stats_ost_read_mean_square
- ost_io_stats_ost_read_min
- ost_io_stats_ost_read_samples
- ost_io_stats_ost_read_sum
- ost_io_stats_ost_read_sum_square
- ost_io_stats_ost_write_max
- ost_io_stats_ost_write_mean
- ost_io_stats_ost_write_mean_square
- ost_io_stats_ost_write_min
- ost_io_stats_ost_write_samples
- ost_io_stats_ost_write_sum
- ost_io_stats_ost_write_sum_square
- ost_io_stats_req_active_max
- ost_io_stats_req_active_mean
- ost_io_stats_req_active_mean_square
- ost_io_stats_req_active_min
- ost_io_stats_req_active_samples
- ost_io_stats_req_active_sum
- ost_io_stats_req_active_sum_square
- ost_io_stats_req_qdepth_max
- ost_io_stats_req_qdepth_mean
- ost_io_stats_req_qdepth_mean_square
- ost_io_stats_req_qdepth_min
- ost_io_stats_req_qdepth_samples
- ost_io_stats_req_qdepth_sum
- ost_io_stats_req_qdepth_sum_square
- ost_io_stats_req_timeout_max
- ost_io_stats_req_timeout_mean
- ost_io_stats_req_timeout_mean_square
- ost_io_stats_req_timeout_min
- ost_io_stats_req_timeout_samples
- ost_io_stats_req_timeout_sum
- ost_io_stats_req_timeout_sum_square
- ost_io_stats_req_waittime_max
- ost_io_stats_req_waittime_mean
- ost_io_stats_req_waittime_mean_square
- ost_io_stats_req_waittime_min
- ost_io_stats_req_waittime_samples
- ost_io_stats_req_waittime_sum
- ost_io_stats_req_waittime_sum_square
- ost_io_stats_reqbuf_avail_max
- ost_io_stats_reqbuf_avail_mean
- ost_io_stats_reqbuf_avail_mean_square
- ost_io_stats_reqbuf_avail_min
- ost_io_stats_reqbuf_avail_samples
- ost_io_stats_reqbuf_avail_sum
- ost_io_stats_reqbuf_avail_sum_square
- Metrics by OSS and OST
- ost_brw_stats_block_discontiguous_rpc_cum
- ost_brw_stats_block_discontiguous_rpc_percentage
- ost_brw_stats_block_discontiguous_rpc_samples
- ost_brw_stats_fragmented_io_cum
- ost_brw_stats_fragmented_io_percentage
- ost_brw_stats_fragmented_io_samples
- ost_brw_stats_io_in_flight_cum
- ost_brw_stats_io_in_flight_percentage
- ost_brw_stats_io_in_flight_samples
- ost_brw_stats_io_size_cum
- ost_brw_stats_io_size_percentage
- ost_brw_stats_io_size_samples
- ost_brw_stats_page_discontiguous_rpc_cum
- ost_brw_stats_page_discontiguous_rpc_percentage
- ost_brw_stats_page_discontiguous_rpc_samples
- ost_brw_stats_rpc_bulk_cum
- ost_brw_stats_rpc_bulk_percentage
- ost_brw_stats_rpc_bulk_samples
- ost_filesinfo_free
- ost_filesinfo_total
- ost_filesinfo_used
- ost_kbytesinfo_free
- ost_kbytesinfo_total
- ost_kbytesinfo_used
- ost_stats_bytes
- ost_stats_max_latency
- ost_stats_min_latency
- ost_stats_samples
- ost_stats_sum_latency
- ost_stats_sumsq_latency
- Metrics differentiating between user-, group, and job-id
- cq_ost_acctuser_samples_by_user_id
- cq_ost_jobstats_bytes_by_ll_job_gid
- cq_ost_jobstats_bytes_by_ll_job_id
- cq_ost_jobstats_bytes_by_ll_job_uid
- ost_acctuser_samples
- ost_jobstats_bytes
- ost_jobstats_samples
- Metrics differentiating between clients.
- exp_ost_stats_bytes
- exp_ost_stats_samples
Metrics by OSS
All measurements that are stored by OSS share the same table structure.
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-oss01 | |
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
value | float |
Metrics by OSS and OST
To be done
Metrics differentiating between user-, group, and job-id
Key | Value | Explanation |
---|---|---|
optype | usage_inodes | |
usage_kbytes | ||
user_id | 0 | |
1001 | ||
1002 | ||
11363 | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13468 | ||
... | ||
sum | float |
Key | Value | Explanation |
---|---|---|
ll_job_gid | 0 | |
00145 | ||
00277 | ||
00279 | ||
00967 | ||
01141 | ||
01142 | ||
01392 | ||
01540 | ||
02073 | ||
... | ||
optype | sum_read_bytes | |
sum_write_bytes | ||
sum | float |
Key | Value | Explanation |
---|---|---|
ll_job_id | . | |
.hawk-pbs5 | ||
.hawk-pbs5__ | ||
0 | ||
0-bin | ||
00 | ||
01 | ||
01].hawk-pbs | ||
02 | ||
02].hawk-pbs | ||
... | ||
optype | sum_read_bytes | |
sum_write_bytes | ||
sum | float |
Key | Value | Explanation |
---|---|---|
ll_job_uid | .kworker/10 | |
.kworker/101 | ||
.kworker/104 | ||
.kworker/106 | ||
.kworker/107 | ||
.kworker/108 | ||
.kworker/112 | ||
.kworker/113 | ||
.kworker/114 | ||
.kworker/116 | ||
... | ||
optype | sum_read_bytes | |
sum_write_bytes | ||
sum | float |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-oss01 | |
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
fs_name | exafs | |
optype | usage_inodes | |
usage_kbytes | ||
ost_index | OST0000 | |
OST0001 | ||
OST0002 | ||
OST0003 | ||
OST0004 | ||
OST0005 | ||
OST0006 | ||
OST0007 | ||
OST0008 | ||
OST0009 | ||
OST000a | ||
OST000b | ||
OST000c | ||
OST000d | ||
OST000e | ||
OST000f | ||
OST0010 | ||
OST0011 | ||
OST0012 | ||
OST0013 | ||
OST0014 | ||
OST0015 | ||
OST0016 | ||
OST0017 | ||
OST0018 | ||
OST0019 | ||
OST001a | ||
OST001b | ||
OST001c | ||
OST001d | ||
OST001e | ||
OST001f | ||
OST0020 | ||
OST0021 | ||
OST0022 | ||
OST0023 | ||
OST0024 | ||
OST0025 | ||
OST0026 | ||
OST0027 | ||
OST0028 | ||
OST0029 | ||
OST002a | ||
OST002b | ||
OST002c | ||
OST002d | ||
OST002e | ||
OST002f | ||
user_id | 0 | |
1001 | ||
1002 | ||
11363 | ||
11932 | ||
12266 | ||
12356 | ||
12448 | ||
12499 | ||
13420 | ||
... | ||
value | float |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-oss01 | |
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
fs_name | exafs | |
job_id | 00145:31716:2169120.hawk-pbs5 | |
00277:29017:1955996.hawk-pbs5 | ||
00279:15448:1963897.hawk-pbs5 | ||
00279:15448:2056529.hawk-pbs5 | ||
00279:15448:__ResultCombine.e | ||
00967:30969:1983785.hawk-pbs5 | ||
00967:34627:2209444.hawk-pbs5__ | ||
00967:34627:2225903.hawk-pbs5__ | ||
00967:34627:2237056.hawk-pbs5__ | ||
01141:32275:2121954.hawk-pbs5 | ||
... | ||
ll_job_gid | 0 | |
00145 | ||
00277 | ||
00279 | ||
00967 | ||
01141 | ||
01142 | ||
01392 | ||
01540 | ||
02073 | ||
... | ||
ll_job_id | . | |
.hawk-pbs5 | ||
.hawk-pbs5__ | ||
0 | ||
0-bin | ||
00 | ||
01 | ||
01].hawk-pbs | ||
02 | ||
02].hawk-pbs | ||
... | ||
ll_job_uid | .kworker/10 | |
.kworker/101 | ||
.kworker/104 | ||
.kworker/106 | ||
.kworker/107 | ||
.kworker/108 | ||
.kworker/112 | ||
.kworker/113 | ||
.kworker/114 | ||
.kworker/116 | ||
... | ||
optype | sum_read_bytes | |
sum_write_bytes | ||
ost_index | OST0000 | |
OST0001 | ||
OST0002 | ||
OST0003 | ||
OST0004 | ||
OST0005 | ||
OST0006 | ||
OST0007 | ||
OST0008 | ||
OST0009 | ||
OST000a | ||
OST000b | ||
OST000c | ||
OST000d | ||
OST000e | ||
OST000f | ||
OST0010 | ||
OST0011 | ||
OST0012 | ||
OST0013 | ||
OST0014 | ||
OST0015 | ||
OST0016 | ||
OST0017 | ||
OST0018 | ||
OST0019 | ||
OST001a | ||
OST001b | ||
OST001c | ||
OST001d | ||
OST001e | ||
OST001f | ||
OST0020 | ||
OST0021 | ||
OST0022 | ||
OST0023 | ||
OST0024 | ||
OST0025 | ||
OST0026 | ||
OST0027 | ||
OST0028 | ||
OST0029 | ||
OST002a | ||
OST002b | ||
OST002c | ||
OST002d | ||
OST002e | ||
OST002f | ||
value | float |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
fqdn | hawk-oss01 | |
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
fs_name | exafs | |
job_id | 00145:31716:2169120.hawk-pbs5 | |
00277:29017:1955996.hawk-pbs5 | ||
00279:15448:1963897.hawk-pbs5 | ||
00279:15448:2056529.hawk-pbs5 | ||
00279:15448:__ResultCombine.e | ||
00967:30969:1983785.hawk-pbs5 | ||
00967:34627:2209444.hawk-pbs5__ | ||
00967:34627:2225903.hawk-pbs5__ | ||
00967:34627:2237056.hawk-pbs5__ | ||
01141:32275:2121954.hawk-pbs5 | ||
... | ||
ll_job_gid | 0 | |
00145 | ||
00277 | ||
00279 | ||
00967 | ||
01141 | ||
01142 | ||
01392 | ||
01540 | ||
02073 | ||
... | ||
ll_job_id | . | |
.hawk-pbs5 | ||
.hawk-pbs5__ | ||
0 | ||
0-bin | ||
00 | ||
01 | ||
01].hawk-pbs | ||
02 | ||
02].hawk-pbs | ||
... | ||
ll_job_uid | .kworker/10 | |
.kworker/101 | ||
.kworker/104 | ||
.kworker/106 | ||
.kworker/107 | ||
.kworker/108 | ||
.kworker/112 | ||
.kworker/113 | ||
.kworker/114 | ||
.kworker/116 | ||
... | ||
optype | create | |
destroy | ||
get_info | ||
getattr | ||
punch | ||
quotactl | ||
read | ||
read_samples | ||
set_info | ||
setattr | ||
statfs | ||
sync | ||
write | ||
write_samples | ||
ost_index | OST0000 | |
OST0001 | ||
OST0002 | ||
OST0003 | ||
OST0004 | ||
OST0005 | ||
OST0006 | ||
OST0007 | ||
OST0008 | ||
OST0009 | ||
OST000a | ||
OST000b | ||
OST000c | ||
OST000d | ||
OST000e | ||
OST000f | ||
OST0010 | ||
OST0011 | ||
OST0012 | ||
OST0013 | ||
OST0014 | ||
OST0015 | ||
OST0016 | ||
OST0017 | ||
OST0018 | ||
OST0019 | ||
OST001a | ||
OST001b | ||
OST001c | ||
OST001d | ||
OST001e | ||
OST001f | ||
OST0020 | ||
OST0021 | ||
OST0022 | ||
OST0023 | ||
OST0024 | ||
OST0025 | ||
OST0026 | ||
OST0027 | ||
OST0028 | ||
OST0029 | ||
OST002a | ||
OST002b | ||
OST002c | ||
OST002d | ||
OST002e | ||
OST002f | ||
value | float |
Metrics differentiating between clients
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
exp_client | 10.148.0.32 | |
10.148.0.33 | ||
10.148.0.34 | ||
10.148.0.36 | ||
10.148.0.37 | ||
10.148.0.38 | ||
10.148.0.39 | ||
10.148.0.40 | ||
10.148.0.41 | ||
10.148.0.42 | ||
... | ||
exp_type | o2ib20 | |
o2ib43 | ||
o2ib44 | ||
fqdn | hawk-oss01 | |
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
fs_name | exafs | |
optype | read | |
write | ||
ost_index | OST0000 | |
OST0001 | ||
OST0002 | ||
OST0003 | ||
OST0004 | ||
OST0005 | ||
OST0006 | ||
OST0007 | ||
OST0008 | ||
OST0009 | ||
OST000a | ||
OST000b | ||
OST000c | ||
OST000d | ||
OST000e | ||
OST000f | ||
OST0010 | ||
OST0011 | ||
OST0012 | ||
OST0013 | ||
OST0014 | ||
OST0015 | ||
OST0016 | ||
OST0017 | ||
OST0018 | ||
OST0019 | ||
OST001a | ||
OST001b | ||
OST001c | ||
OST001d | ||
OST001e | ||
OST001f | ||
OST0020 | ||
OST0021 | ||
OST0022 | ||
OST0023 | ||
OST0024 | ||
OST0025 | ||
OST0026 | ||
OST0027 | ||
OST0028 | ||
OST0029 | ||
OST002a | ||
OST002b | ||
OST002c | ||
OST002d | ||
OST002e | ||
OST002f | ||
value | float |
Key | Value | Explanation |
---|---|---|
cluster | exafs | |
exp_client | 10.148.0.32 | |
10.148.0.33 | ||
10.148.0.34 | ||
10.148.0.36 | ||
10.148.0.37 | ||
10.148.0.38 | ||
10.148.0.39 | ||
10.148.0.40 | ||
10.148.0.41 | ||
10.148.0.42 | ||
... | ||
exp_type | o2ib20 | |
o2ib43 | ||
o2ib44 | ||
fqdn | hawk-oss01 | |
hawk-oss02 | ||
hawk-oss03 | ||
hawk-oss04 | ||
hawk-oss05 | ||
hawk-oss06 | ||
hawk-oss07 | ||
hawk-oss08 | ||
fs_name | exafs | |
optype | read | |
write | ||
ost_index | OST0000 | |
OST0001 | ||
OST0002 | ||
OST0003 | ||
OST0004 | ||
OST0005 | ||
OST0006 | ||
OST0007 | ||
OST0008 | ||
OST0009 | ||
OST000a | ||
OST000b | ||
OST000c | ||
OST000d | ||
OST000e | ||
OST000f | ||
OST0010 | ||
OST0011 | ||
OST0012 | ||
OST0013 | ||
OST0014 | ||
OST0015 | ||
OST0016 | ||
OST0017 | ||
OST0018 | ||
OST0019 | ||
OST001a | ||
OST001b | ||
OST001c | ||
OST001d | ||
OST001e | ||
OST001f | ||
OST0020 | ||
OST0021 | ||
OST0022 | ||
OST0023 | ||
OST0024 | ||
OST0025 | ||
OST0026 | ||
OST0027 | ||
OST0028 | ||
OST0029 | ||
OST002a | ||
OST002b | ||
OST002c | ||
OST002d | ||
OST002e | ||
OST002f | ||
value | float |