- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Barreleye: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(Created page with "== Lustre Server CPU Usage == Each portion of CPU usage is for all servers reported in a separate table i.e measurement. The reported usage states are: '''idle''' Is reported in <code>aggregation.cpu-average.cpu.idle</code>. When there is really nothing the kernel can do, it just as to waste away this slice of time. Technically, when the runnable queue is empty and there are no I/O operations going on, the CPU usage is marked as ''idle''. '''system''' Is reported in...")
 
(Blanked the page)
Tag: Blanking
 
Line 1: Line 1:
== Lustre Server CPU Usage ==
Each portion of CPU usage is for all servers reported in a separate table i.e measurement. The reported usage states are:


'''idle'''
Is reported in <code>aggregation.cpu-average.cpu.idle</code>. When there is really nothing the kernel can do, it just as to waste away this slice of time. Technically, when the runnable queue is empty and there are no I/O operations going on, the CPU usage is marked as ''idle''.
'''system'''
Is reported in <code>aggregation.cpu-average.cpu.system</code>. This means the CPU is running kernel code. This includes device drivers and kernel modules.
'''user'''
Is reported in <code>aggregation.cpu-average.cpu.user</code>. The CPU is running code in user-mode. This includes your application code. Note that if an application tries to read from disk or write to network, it actually goes to sleep while the kernel performs that work, and wakes up the application again.
'''steal'''
Is reported in <code>aggregation.cpu-average.cpu.steal</code>. DDN Lustre servers are virtual machines. In a virtualized environment, the hypervisor may “steal” cycles that are meant for your CPUs and give them to another, for various reasons. This time is accounted for as <em>steal</em>.
'''nice'''
Is reported in <code>aggregation.cpu-average.cpu.nice</code>. The user code can be executed in “normal” priority, or various degrees of “below normal” priority. You can, for example, run some kind of report generation process at a lower priority and interactive processes at normal priority. ''Nice'' is when the CPU is executing a user task having below-normal priority.
'''wait'''
Is reported in <code>aggregation.cpu-average.cpu.wait</code>. Sometimes the CPU has only one thing to do – wait for the results of a disk/network read/write. This isn’t as uncommon as you’d think. A file server for example would nearly spend all it’s life waiting for disk reads and network writes to complete. ''I/O Wait'' is when the CPU is waiting for an I/O operation to complete, and the CPU can’t be used for anything else.
'''interrupt & softirq'''
interrupt is reported in <code>aggregation.cpu-average.cpu.interrupt</code>, softirq is reported in <code>aggregation.cpu-average.cpu.softirq</code>. Both cases tell that the kernel is servicing interrupt requests.
=== Visualizations ===
The data collected in the tables are visualized in the Grafana dashboard '''[https://mon-login01.hww.hlrs.de/grafana/d/b53a94c8-03c9-4ec3-ae1b-a268afb590a1/cpu-usage-by-type-per-server?orgId=1 "CPU usage by Type per Server"]''' on [[HPE Hawk#Production Frontend|mon-login01]]. 
=== Table Structure ===
Each of the tables has the same structure.
{| class="wikitable mw-collapsible"
|+Measurement:
aggregation.cpu-average.cpu.<usage_state>
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="10"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|}
Please not that the explanation of CPU usage states was taken from https://www.opsdash.com/blog/cpu-usage-linux.html
== Network Data ==
The Infiniband counters of the Lustre Servers are collected in four tables, i.e. measurements. Two of which, i.e. counters_error and counters_info, report port based metrics whereas the other two, i.e. hw_counters_error and hw_counters_info, report function based metrics.
=== Visualizations ===
The data collected in the tables are visualized in the Grafana dashboard '''[https://mon-login01.hww.hlrs.de/grafana/d/ad10d285-3962-4cb2-b025-153d15496c95/network-metrics-by-server-selectable?orgId=1 Network Metrics by Server (Selectable)]''' on [[HPE Hawk#Production Frontend|mon-login01]]. 
=== Table Structure ===
The four tables have the same structure with respect to the tag keys. The counters are differentiated by <code>optype</code>.
{| class="wikitable mw-collapsible"
|+Measurement: counters_error
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2" |driver_index
|0
|
|-
|1
|
|-
|driver_type
|mlx5
|
|-
| rowspan="10" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
| rowspan="10" |optype
|VL15_dropped
|
|-
|link_downed
|
|-
|link_error_recovery
|
|-
|local_link_integrity_errors
|
|-
|port_rcv_constraint_errors
|
|-
|port_rcv_remote_physical_errors
|
|-
|port_rcv_switch_relay_errors
|
|-
|port_xmit_constraint_errors
|
|-
|port_xmit_discards
|
|-
|symbol_error
|
|-
|port_number
|1
|
|}
{| class="wikitable"
|+Measurement: counters_info
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2" |driver_index
|0
|
|-
|1
|
|-
|driver_type
|mlx5
|
|-
| rowspan="10" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
| rowspan="6" |optype
|excessive_buffer_overrun_errors
|
|-
|port_rcv_data
|
|-
|port_rcv_errors
|
|-
|port_rcv_packets
|
|-
|port_xmit_data
|
|-
|port_xmit_packets
|
|-
|port_number
|1
|
|}
{| class="wikitable"
|+Measurement: hw_counters_error
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2" |driver_index
|0
|
|-
|1
|
|-
|driver_type
|mlx5
|
|-
| rowspan="10" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
| rowspan="16" |optype
|duplicate_request
|
|-
|implied_nak_seq_err
|
|-
|local_ack_timeout_err
|
|-
|out_of_buffer
|
|-
|out_of_sequence
|
|-
|packet_seq_err
|
|-
|req_cqe_error
|
|-
|req_cqe_flush_error
|
|-
|req_remote_access_errors
|
|-
|req_remote_invalid_request
|
|-
|resp_cqe_error
|
|-
|resp_cqe_flush_error
|
|-
|resp_local_length_error
|
|-
|resp_remote_access_errors
|
|-
|rnr_nak_retry_err
|
|-
|rx_icrc_encapsulated
|
|-
|port_number
|1
|
|}
{| class="wikitable"
|+Measurement: hw_counters_info
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2" |driver_index
|0
|
|-
|1
|
|-
|driver_type
|mlx5
|
|-
| rowspan="10" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
| rowspan="10" |optype
|lifespan
|
|-
|roce_adp_retrans
|
|-
|roce_adp_retrans_to
|
|-
|roce_slow_restart
|
|-
|roce_slow_restart_cnps
|
|-
|roce_slow_restart_trans
|
|-
|rx_atomic_requests
|
|-
|rx_dct_connect
|
|-
|rx_read_requests
|
|-
|rx_write_requests
|
|-
|port_number
|1
|
|}
For more information about Infiniband counters please see https://enterprise-support.nvidia.com/s/article/understanding-mlx5-linux-counters-and-status-parameters.
== Meta Data Server (MDS) and Meta Data Target (MDT) Metrics ==
While the table cq_md_stats_by_optype reports meta data operations agains the full file system, all other measurements of metrics collected from the lustre meta data servers and accordingly the meta data targets can in principle be grouped into three different categories.
# Metrics by MDS and MDT.
#* md_stats
#* md_stats_max
#* md_stats_min
#* md_stats_sum
#* md_stats_sumsq
#* mdt_filesinfo_free
#* mdt_filesinfo_total
#* mdt_filesinfo_used
#* mdt_kbytesinfo_free
#* mdt_kbytesinfo_total
#* mdt_kbytesinfo_used
# Metrics differentiating between user-, group, and job-id.
#* cq_mdt_acctuser_samples_by_user_id
#* cq_mdt_jobstats_samples_by_ll_job_gid
#* cq_mdt_jobstats_samples_by_ll_job_id
#* cq_mdt_jobstats_samples_by_ll_job_uid
#* mdt_acctuser_samples
#* mdt_jobstats_max
#* mdt_jobstats_min
#* mdt_jobstats_samples
#* mdt_jobstats_sum
#* mdt_jobstats_sumsq
# Metrics differentiating between clients.
#* exp_md_stats
#* exp_md_stats_max_latency
#* exp_md_stats_min_latency
#* exp_md_stats_sum_latency
#* exp_md_stats_sumsq_latency 
=== Total Meta Data Operations ===
The table cq_md_stats_by_optype collects the total sum of meta data operations against the complete file system in a continuous query.
==== Visualizations ====
The [https://mon-login01.hww.hlrs.de/grafana/d/602gi8FVzssdf/ws10-barreleye?orgId=1&refresh=1m&from=1695107019868&to=1695117819868 Dashboard ws10-barreleye] uses the table  in the panel "Lustre Aggregated Metadata".
==== Table Structure ====
{| class="wikitable mw-collapsible"
|+Measurement: cq_md_stats_by_optype
!Key
!Value
!Explanation
|-
| rowspan="12"|optype
|close
|
|-
|getattr
|
|-
|getxattr
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|rename
|
|-
|rmdir
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|unlink
|
|-
| colspan="2" |sum
|float
|}
=== Metrics by MDS and MDT. ===
==== Meta Data Operations grouped by MDS and MDT. ====
The table md_stats collects meta data operations per meta data target. The table shares its structure with
* md_stats_max
* md_stats_min
* md_stats_sum
* md_stats_sumsq
In which ....
===== Visualizations =====
The [https://mon-login01.hww.hlrs.de/grafana/d/602gi8FVzssdf/ws10-barreleye?orgId=1&refresh=1m&from=1695107019868&to=1695117819868 Dashboard ws10-barreleye] uses the table  in the panel "Lustre Aggregated Metadata".
===== Table Structure =====
{| class="wikitable mw-collapsible"
|+Measurement: md_stats
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4" |mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="12" |optype
|close
|
|-
|getattr
|
|-
|getxattr
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|rename
|
|-
|rmdir
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|unlink
|
|-
| colspan="2" |value
|float
|}
==== Inode and filespace usage by MDS and MDT. ====
The tables md_filesinfo_* collect information about the number of free, total and used inodes on each MDT while the tables mdt_kbytesinfo_* collect information about the free, total and used filespace on each MDT.
All six tables share the same structure.
===== Visualizations =====
...
===== Table Structure =====
{| class="wikitable mw-collapsible"
|+Measurement: mdt_[files|md_[files|kbytes]info_[free|total|used]
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4" |mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| colspan="2" |value
|float
|}
=== Metrics differentiating between user-, group, and job-id ===
{| class="wikitable mw-collapsible"
|+Measurement: cq_mdt_acctuser_samples_by_user_id
!Key
!Value
!Explanation
|-
| rowspan="2"|optype
|usage_inodes
|
|-
|usage_kbytes
|
|-
| rowspan="11" |user_id
|0
|
|-
|1001
|
|-
|1002
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|-
|...
|
|-
| colspan="2" |sum
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: cq_mdt_jobstats_samples_by_ll_job_gid
!Key
!Value
!Explanation
|-
| rowspan="11" |ll_job_gid
|0
|
|-
|0:0
|
|-
|0:0:
|
|-
|11142
|
|-
|12793
|
|-
|12801
|
|-
|12803
|
|-
|12812
|
|-
|12831
|
|-
|12833
|
|-
|...
|
|-
| rowspan="19"|optype
|close
|
|-
|crossdir_rename
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|punch
|
|-
|read_bytes
|
|-
|rename
|
|-
|rmdir
|
|-
|samedir_rename
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|-
|write_bytes
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: cq_mdt_jobstats_samples_by_ll_job_id
!Key
!Value
!Explanation
|-
| rowspan="11" |ll_job_id
|0
|
|-
|1
|
|-
|10
|
|-
|100010.cl1intern__1
|
|-
|100010.cl1intern__I
|
|-
|100010.cl1intern__S
|
|-
|100010.cl1intern__a
|
|-
|100010.cl1intern__c
|
|-
|100010.cl1intern__d
|
|-
|100010.cl1intern__f
|
|-
|...
|
|-
| rowspan="19"|optype
|close
|
|-
|crossdir_rename
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|punch
|
|-
|read_bytes
|
|-
|rename
|
|-
|rmdir
|
|-
|samedir_rename
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|-
|write_bytes
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: cq_mdt_jobstats_samples_by_ll_job_uid
!Key
!Value
!Explanation
|-
| rowspan="11" |ll_job_uid
|0
|
|-
|0:
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|-
|14207
|
|-
|...
|
|-
| rowspan="19"|optype
|close
|
|-
|crossdir_rename
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|punch
|
|-
|read_bytes
|
|-
|rename
|
|-
|rmdir
|
|-
|samedir_rename
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|-
|write_bytes
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: mdt_acctuser_samples
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4"|mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="2"|optype
|usage_inodes
|
|-
|usage_kbytes
|
|-
| rowspan="10"|user_id
|0
|
|-
|1001
|
|-
|1002
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: mdt_jobstats_max
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="10"|job_id
|0:0:2072485.hawk-pbs5
|
|-
|0:0:2073250.hawk-pbs5
|
|-
|0:0:2073899.hawk-pbs5
|
|-
|0:0:2074114.hawk-pbs5
|
|-
|0:0:2075166.hawk-pbs5
|
|-
|0:0:2075906.hawk-pbs5
|
|-
|0:0:2077436.hawk-pbs5
|
|-
|0:0:2079673.hawk-pbs5
|
|-
|0:0:2081442.hawk-pbs5
|
|-
|0:0:2081474.hawk-pbs5
|
|-
| rowspan="10"|ll_job_gid
|0
|
|-
|0:0
|
|-
|0:0:
|
|-
|11142
|
|-
|12793
|
|-
|12801
|
|-
|12803
|
|-
|12812
|
|-
|12831
|
|-
|12833
|
|-
| rowspan="10"|ll_job_id
|0
|
|-
|1
|
|-
|10
|
|-
|100010.cl1intern__1
|
|-
|100010.cl1intern__I
|
|-
|100010.cl1intern__S
|
|-
|100010.cl1intern__a
|
|-
|100010.cl1intern__c
|
|-
|100010.cl1intern__d
|
|-
|100010.cl1intern__f
|
|-
| rowspan="10"|ll_job_uid
|0
|
|-
|0:
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|-
|14207
|
|-
| rowspan="4"|mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="3"|optype
|max_punch
|
|-
|max_read_bytes
|
|-
|max_write_bytes
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: mdt_jobstats_min
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="10"|job_id
|0:0:2072485.hawk-pbs5
|
|-
|0:0:2073250.hawk-pbs5
|
|-
|0:0:2073899.hawk-pbs5
|
|-
|0:0:2074114.hawk-pbs5
|
|-
|0:0:2075166.hawk-pbs5
|
|-
|0:0:2075906.hawk-pbs5
|
|-
|0:0:2077436.hawk-pbs5
|
|-
|0:0:2079673.hawk-pbs5
|
|-
|0:0:2081442.hawk-pbs5
|
|-
|0:0:2081474.hawk-pbs5
|
|-
| rowspan="10"|ll_job_gid
|0
|
|-
|0:0
|
|-
|0:0:
|
|-
|11142
|
|-
|12793
|
|-
|12801
|
|-
|12803
|
|-
|12812
|
|-
|12831
|
|-
|12833
|
|-
| rowspan="10"|ll_job_id
|0
|
|-
|1
|
|-
|10
|
|-
|100010.cl1intern__1
|
|-
|100010.cl1intern__I
|
|-
|100010.cl1intern__S
|
|-
|100010.cl1intern__a
|
|-
|100010.cl1intern__c
|
|-
|100010.cl1intern__d
|
|-
|100010.cl1intern__f
|
|-
| rowspan="10"|ll_job_uid
|0
|
|-
|0:
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|-
|14207
|
|-
| rowspan="4"|mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="3"|optype
|min_punch
|
|-
|min_read_bytes
|
|-
|min_write_bytes
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: mdt_jobstats_samples
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="10"|job_id
|0:0:2072485.hawk-pbs5
|
|-
|0:0:2073250.hawk-pbs5
|
|-
|0:0:2073899.hawk-pbs5
|
|-
|0:0:2074114.hawk-pbs5
|
|-
|0:0:2075166.hawk-pbs5
|
|-
|0:0:2075906.hawk-pbs5
|
|-
|0:0:2077436.hawk-pbs5
|
|-
|0:0:2079673.hawk-pbs5
|
|-
|0:0:2081442.hawk-pbs5
|
|-
|0:0:2081474.hawk-pbs5
|
|-
| rowspan="10"|ll_job_gid
|0
|
|-
|0:0
|
|-
|0:0:
|
|-
|11142
|
|-
|12793
|
|-
|12801
|
|-
|12803
|
|-
|12812
|
|-
|12831
|
|-
|12833
|
|-
| rowspan="10"|ll_job_id
|0
|
|-
|1
|
|-
|10
|
|-
|100010.cl1intern__1
|
|-
|100010.cl1intern__I
|
|-
|100010.cl1intern__S
|
|-
|100010.cl1intern__a
|
|-
|100010.cl1intern__c
|
|-
|100010.cl1intern__d
|
|-
|100010.cl1intern__f
|
|-
| rowspan="10"|ll_job_uid
|0
|
|-
|0:
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|-
|14207
|
|-
| rowspan="4"|mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="19"|optype
|close
|
|-
|crossdir_rename
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|punch
|
|-
|read_bytes
|
|-
|rename
|
|-
|rmdir
|
|-
|samedir_rename
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|-
|write_bytes
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: mdt_jobstats_sum
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="10"|job_id
|0:0:2072485.hawk-pbs5
|
|-
|0:0:2073250.hawk-pbs5
|
|-
|0:0:2073899.hawk-pbs5
|
|-
|0:0:2074114.hawk-pbs5
|
|-
|0:0:2075166.hawk-pbs5
|
|-
|0:0:2075906.hawk-pbs5
|
|-
|0:0:2077436.hawk-pbs5
|
|-
|0:0:2079673.hawk-pbs5
|
|-
|0:0:2081442.hawk-pbs5
|
|-
|0:0:2081474.hawk-pbs5
|
|-
| rowspan="10"|ll_job_gid
|0
|
|-
|0:0
|
|-
|0:0:
|
|-
|11142
|
|-
|12793
|
|-
|12801
|
|-
|12803
|
|-
|12812
|
|-
|12831
|
|-
|12833
|
|-
| rowspan="10"|ll_job_id
|0
|
|-
|1
|
|-
|10
|
|-
|100010.cl1intern__1
|
|-
|100010.cl1intern__I
|
|-
|100010.cl1intern__S
|
|-
|100010.cl1intern__a
|
|-
|100010.cl1intern__c
|
|-
|100010.cl1intern__d
|
|-
|100010.cl1intern__f
|
|-
| rowspan="10"|ll_job_uid
|0
|
|-
|0:
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|-
|14207
|
|-
| rowspan="4"|mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="3"|optype
|sum_punch
|
|-
|sum_read_bytes
|
|-
|sum_write_bytes
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: mdt_jobstats_sumsq
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="2"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="10"|job_id
|0:0:2072485.hawk-pbs5
|
|-
|0:0:2073250.hawk-pbs5
|
|-
|0:0:2073899.hawk-pbs5
|
|-
|0:0:2074114.hawk-pbs5
|
|-
|0:0:2075166.hawk-pbs5
|
|-
|0:0:2075906.hawk-pbs5
|
|-
|0:0:2077436.hawk-pbs5
|
|-
|0:0:2079673.hawk-pbs5
|
|-
|0:0:2081442.hawk-pbs5
|
|-
|0:0:2081474.hawk-pbs5
|
|-
| rowspan="10"|ll_job_gid
|0
|
|-
|0:0
|
|-
|0:0:
|
|-
|11142
|
|-
|12793
|
|-
|12801
|
|-
|12803
|
|-
|12812
|
|-
|12831
|
|-
|12833
|
|-
| rowspan="10"|ll_job_id
|0
|
|-
|1
|
|-
|10
|
|-
|100010.cl1intern__1
|
|-
|100010.cl1intern__I
|
|-
|100010.cl1intern__S
|
|-
|100010.cl1intern__a
|
|-
|100010.cl1intern__c
|
|-
|100010.cl1intern__d
|
|-
|100010.cl1intern__f
|
|-
| rowspan="10"|ll_job_uid
|0
|
|-
|0:
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|13967
|
|-
|14207
|
|-
| rowspan="4"|mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="3"|optype
|sumsq_punch
|
|-
|sumsq_read_bytes
|
|-
|sumsq_write_bytes
|
|}
=== Metrics differentiating between clients ===
{| class="wikitable mw-collapsible"
|+Measurement: exp_md_stats
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="11" |exp_client
|0
|
|-
|10.148.0.32
|
|-
|10.148.0.33
|
|-
|10.148.0.34
|
|-
|10.148.0.36
|
|-
|10.148.0.37
|
|-
|10.148.0.38
|
|-
|10.148.0.39
|
|-
|10.148.0.40
|
|-
|10.148.0.41
|
|-
|...
|
|-
| rowspan="4"|exp_type
|lo
|
|-
|o2ib20
|
|-
|o2ib43
|
|-
|o2ib44
|
|-
| rowspan="2"|fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4"|mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="14"|optype
|close
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|rename
|
|-
|rmdir
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: exp_md_stats_max_latency
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="11" |exp_client
|0
|
|-
|10.148.0.32
|
|-
|10.148.0.33
|
|-
|10.148.0.34
|
|-
|10.148.0.36
|
|-
|10.148.0.37
|
|-
|10.148.0.38
|
|-
|10.148.0.39
|
|-
|10.148.0.40
|
|-
|10.148.0.41
|
|-
|...
|
|-
| rowspan="4" |exp_type
|lo
|
|-
|o2ib20
|
|-
|o2ib43
|
|-
|o2ib44
|
|-
| rowspan="2" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4" |mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="14" |optype
|close
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|rename
|
|-
|rmdir
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: exp_md_stats_min_latency
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="11" |exp_client
|0
|
|-
|10.148.0.32
|
|-
|10.148.0.33
|
|-
|10.148.0.34
|
|-
|10.148.0.36
|
|-
|10.148.0.37
|
|-
|10.148.0.38
|
|-
|10.148.0.39
|
|-
|10.148.0.40
|
|-
|10.148.0.41
|
|-
|...
|
|-
| rowspan="4" |exp_type
|lo
|
|-
|o2ib20
|
|-
|o2ib43
|
|-
|o2ib44
|
|-
| rowspan="2" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4" |mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="14" |optype
|close
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|rename
|
|-
|rmdir
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: exp_md_stats_sum_latency
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="11" |exp_client
|0
|
|-
|10.148.0.32
|
|-
|10.148.0.33
|
|-
|10.148.0.34
|
|-
|10.148.0.36
|
|-
|10.148.0.37
|
|-
|10.148.0.38
|
|-
|10.148.0.39
|
|-
|10.148.0.40
|
|-
|10.148.0.41
|
|-
|...
|
|-
| rowspan="4" |exp_type
|lo
|
|-
|o2ib20
|
|-
|o2ib43
|
|-
|o2ib44
|
|-
| rowspan="2" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4" |mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="14" |optype
|close
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|rename
|
|-
|rmdir
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|}
{| class="wikitable mw-collapsible"
|+Measurement: exp_md_stats_sumsq_latency
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="11" |exp_client
|0
|
|-
|10.148.0.32
|
|-
|10.148.0.33
|
|-
|10.148.0.34
|
|-
|10.148.0.36
|
|-
|10.148.0.37
|
|-
|10.148.0.38
|
|-
|10.148.0.39
|
|-
|10.148.0.40
|
|-
|10.148.0.41
|
|-
|...
|
|-
| rowspan="4" |exp_type
|lo
|
|-
|o2ib20
|
|-
|o2ib43
|
|-
|o2ib44
|
|-
| rowspan="2" |fqdn
|hawk-mds01
|
|-
|hawk-mds02
|
|-
|fs_name
|exafs
|
|-
| rowspan="4" |mdt_index
|MDT0000
|
|-
|MDT0001
|
|-
|MDT0002
|
|-
|MDT0003
|
|-
| rowspan="14" |optype
|close
|
|-
|getattr
|
|-
|getxattr
|
|-
|link
|
|-
|mkdir
|
|-
|mknod
|
|-
|open
|
|-
|rename
|
|-
|rmdir
|
|-
|setattr
|
|-
|setxattr
|
|-
|statfs
|
|-
|sync
|
|-
|unlink
|
|}
== Object Storage Server (OSS) and Object Storage Target (OST) Metrics ==
While the tables
* cq_ost_brw_stats_rpc_bulk_samples_by_size
* cq_ost_kbytesinfo_used_by_fs_name
* cq_ost_stats_bytes_by_optype
report operation and usage stats of the full file system, all other measurements of metrics collected from the lustre object storage servers and accordingly the object storage targets can in principle be grouped into four different categories.
# Metrics by OSS
#* ost_io_stats_ost_punch_max
#* ost_io_stats_ost_punch_mean
#* ost_io_stats_ost_punch_mean_square
#* ost_io_stats_ost_punch_min
#* ost_io_stats_ost_punch_samples
#* ost_io_stats_ost_punch_sum
#* ost_io_stats_ost_punch_sum_square
#*ost_io_stats_ost_read_max
#*ost_io_stats_ost_read_mean
#*ost_io_stats_ost_read_mean_square
#*ost_io_stats_ost_read_min
#*ost_io_stats_ost_read_samples
#*ost_io_stats_ost_read_sum
#*ost_io_stats_ost_read_sum_square
#*ost_io_stats_ost_write_max
#*ost_io_stats_ost_write_mean
#*ost_io_stats_ost_write_mean_square
#*ost_io_stats_ost_write_min
#*ost_io_stats_ost_write_samples
#*ost_io_stats_ost_write_sum
#*ost_io_stats_ost_write_sum_square
#*ost_io_stats_req_active_max
#*ost_io_stats_req_active_mean
#*ost_io_stats_req_active_mean_square
#*ost_io_stats_req_active_min
#*ost_io_stats_req_active_samples
#*ost_io_stats_req_active_sum
#*ost_io_stats_req_active_sum_square
#*ost_io_stats_req_qdepth_max
#*ost_io_stats_req_qdepth_mean
#*ost_io_stats_req_qdepth_mean_square
#*ost_io_stats_req_qdepth_min
#*ost_io_stats_req_qdepth_samples
#*ost_io_stats_req_qdepth_sum
#*ost_io_stats_req_qdepth_sum_square
#*ost_io_stats_req_timeout_max
#*ost_io_stats_req_timeout_mean
#*ost_io_stats_req_timeout_mean_square
#*ost_io_stats_req_timeout_min
#*ost_io_stats_req_timeout_samples
#*ost_io_stats_req_timeout_sum
#*ost_io_stats_req_timeout_sum_square
#*ost_io_stats_req_waittime_max
#*ost_io_stats_req_waittime_mean
#*ost_io_stats_req_waittime_mean_square
#*ost_io_stats_req_waittime_min
#*ost_io_stats_req_waittime_samples
#*ost_io_stats_req_waittime_sum
#*ost_io_stats_req_waittime_sum_square
#*ost_io_stats_reqbuf_avail_max
#*ost_io_stats_reqbuf_avail_mean
#*ost_io_stats_reqbuf_avail_mean_square
#*ost_io_stats_reqbuf_avail_min
#*ost_io_stats_reqbuf_avail_samples
#*ost_io_stats_reqbuf_avail_sum
#*ost_io_stats_reqbuf_avail_sum_square
#Metrics by OSS and OST
#*ost_brw_stats_block_discontiguous_rpc_cum
#*ost_brw_stats_block_discontiguous_rpc_percentage
#*ost_brw_stats_block_discontiguous_rpc_samples
#*ost_brw_stats_fragmented_io_cum
#*ost_brw_stats_fragmented_io_percentage
#*ost_brw_stats_fragmented_io_samples
#*ost_brw_stats_io_in_flight_cum
#*ost_brw_stats_io_in_flight_percentage
#*ost_brw_stats_io_in_flight_samples
#*ost_brw_stats_io_size_cum
#*ost_brw_stats_io_size_percentage
#*ost_brw_stats_io_size_samples
#*ost_brw_stats_page_discontiguous_rpc_cum
#*ost_brw_stats_page_discontiguous_rpc_percentage
#*ost_brw_stats_page_discontiguous_rpc_samples
#*ost_brw_stats_rpc_bulk_cum
#*ost_brw_stats_rpc_bulk_percentage
#*ost_brw_stats_rpc_bulk_samples
#*ost_filesinfo_free
#*ost_filesinfo_total
#*ost_filesinfo_used
#*ost_kbytesinfo_free
#*ost_kbytesinfo_total
#*ost_kbytesinfo_used
#*ost_stats_bytes
#*ost_stats_max_latency
#*ost_stats_min_latency
#*ost_stats_samples
#*ost_stats_sum_latency
#*ost_stats_sumsq_latency
# Metrics differentiating between user-, group, and job-id
#* cq_ost_acctuser_samples_by_user_id
#* cq_ost_jobstats_bytes_by_ll_job_gid
#* cq_ost_jobstats_bytes_by_ll_job_id
#* cq_ost_jobstats_bytes_by_ll_job_uid
#* ost_acctuser_samples
#* ost_jobstats_bytes
#* ost_jobstats_samples
# Metrics differentiating between clients.
#* exp_ost_stats_bytes
#* exp_ost_stats_samples
=== Metrics by OSS ===
All measurements that are stored by OSS share the same table structure.
{| class="wikitable mw-collapsible"
|+Measurement: ost_io_stats_<operation>_<aggregation>
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="8"|fqdn
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
| colspan="2" |value
|float
|}
=== Metrics by OSS and OST ===
To be done
=== Metrics differentiating between user-, group, and job-id ===
{| class="wikitable mw-collapsible"
|+Measurement: cq_ost_acctuser_samples_by_user_id
!Key
!Value
!Explanation
|-
| rowspan="2"|optype
|usage_inodes
|
|-
|usage_kbytes
|
|-
| rowspan="11" |user_id
|0
|
|-
|1001
|
|-
|1002
|
|-
|11363
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13468
|
|-
|...
|
|-
| colspan="2" |sum
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: cq_ost_jobstats_bytes_by_ll_job_gid
!Key
!Value
!Explanation
|-
| rowspan="11" |ll_job_gid
|0
|
|-
|00145
|
|-
|00277
|
|-
|00279
|
|-
|00967
|
|-
|01141
|
|-
|01142
|
|-
|01392
|
|-
|01540
|
|-
|02073
|
|-
|...
|
|-
| rowspan="2"|optype
|sum_read_bytes
|
|-
|sum_write_bytes
|
|-
| colspan="2" |sum
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: cq_ost_jobstats_bytes_by_ll_job_id
!Key
!Value
!Explanation
|-
| rowspan="11" |ll_job_id
|.
|
|-
|.hawk-pbs5
|
|-
|.hawk-pbs5__
|
|-
|0
|
|-
|0-bin
|
|-
|00
|
|-
|01
|
|-
|01].hawk-pbs
|
|-
|02
|
|-
|02].hawk-pbs
|
|-
|...
|
|-
| rowspan="2"|optype
|sum_read_bytes
|
|-
|sum_write_bytes
|
|-
| colspan="2" |sum
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: cq_ost_jobstats_bytes_by_ll_job_uid
!Key
!Value
!Explanation
|-
| rowspan="11" |ll_job_uid
|.kworker/10
|
|-
|.kworker/101
|
|-
|.kworker/104
|
|-
|.kworker/106
|
|-
|.kworker/107
|
|-
|.kworker/108
|
|-
|.kworker/112
|
|-
|.kworker/113
|
|-
|.kworker/114
|
|-
|.kworker/116
|
|-
|...
|
|-
| rowspan="2"|optype
|sum_read_bytes
|
|-
|sum_write_bytes
|
|-
| colspan="2" |sum
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: ost_acctuser_samples
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="8"|fqdn
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
|fs_name
|exafs
|
|-
| rowspan="2"|optype
|usage_inodes
|
|-
|usage_kbytes
|
|-
| rowspan="48"|ost_index
|OST0000
|
|-
|OST0001
|
|-
|OST0002
|
|-
|OST0003
|
|-
|OST0004
|
|-
|OST0005
|
|-
|OST0006
|
|-
|OST0007
|
|-
|OST0008
|
|-
|OST0009
|
|-
|OST000a
|
|-
|OST000b
|
|-
|OST000c
|
|-
|OST000d
|
|-
|OST000e
|
|-
|OST000f
|
|-
|OST0010
|
|-
|OST0011
|
|-
|OST0012
|
|-
|OST0013
|
|-
|OST0014
|
|-
|OST0015
|
|-
|OST0016
|
|-
|OST0017
|
|-
|OST0018
|
|-
|OST0019
|
|-
|OST001a
|
|-
|OST001b
|
|-
|OST001c
|
|-
|OST001d
|
|-
|OST001e
|
|-
|OST001f
|
|-
|OST0020
|
|-
|OST0021
|
|-
|OST0022
|
|-
|OST0023
|
|-
|OST0024
|
|-
|OST0025
|
|-
|OST0026
|
|-
|OST0027
|
|-
|OST0028
|
|-
|OST0029
|
|-
|OST002a
|
|-
|OST002b
|
|-
|OST002c
|
|-
|OST002d
|
|-
|OST002e
|
|-
|OST002f
|
|-
| rowspan="11" |user_id
|0
|
|-
|1001
|
|-
|1002
|
|-
|11363
|
|-
|11932
|
|-
|12266
|
|-
|12356
|
|-
|12448
|
|-
|12499
|
|-
|13420
|
|-
|...
|
|-
| colspan="2" |value
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: ost_jobstats_bytes
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="8"|fqdn
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
|fs_name
|exafs
|
|-
| rowspan="11" |job_id
|00145:31716:2169120.hawk-pbs5
|
|-
|00277:29017:1955996.hawk-pbs5
|
|-
|00279:15448:1963897.hawk-pbs5
|
|-
|00279:15448:2056529.hawk-pbs5
|
|-
|00279:15448:__ResultCombine.e
|
|-
|00967:30969:1983785.hawk-pbs5
|
|-
|00967:34627:2209444.hawk-pbs5__
|
|-
|00967:34627:2225903.hawk-pbs5__
|
|-
|00967:34627:2237056.hawk-pbs5__
|
|-
|01141:32275:2121954.hawk-pbs5
|
|-
|...
|
|-
| rowspan="11" |ll_job_gid
|0
|
|-
|00145
|
|-
|00277
|
|-
|00279
|
|-
|00967
|
|-
|01141
|
|-
|01142
|
|-
|01392
|
|-
|01540
|
|-
|02073
|
|-
|...
|
|-
| rowspan="11" |ll_job_id
|.
|
|-
|.hawk-pbs5
|
|-
|.hawk-pbs5__
|
|-
|0
|
|-
|0-bin
|
|-
|00
|
|-
|01
|
|-
|01].hawk-pbs
|
|-
|02
|
|-
|02].hawk-pbs
|
|-
|...
|
|-
| rowspan="11" |ll_job_uid
|.kworker/10
|
|-
|.kworker/101
|
|-
|.kworker/104
|
|-
|.kworker/106
|
|-
|.kworker/107
|
|-
|.kworker/108
|
|-
|.kworker/112
|
|-
|.kworker/113
|
|-
|.kworker/114
|
|-
|.kworker/116
|
|-
|...
|
|-
| rowspan="2"|optype
|sum_read_bytes
|
|-
|sum_write_bytes
|
|-
| rowspan="48"|ost_index
|OST0000
|
|-
|OST0001
|
|-
|OST0002
|
|-
|OST0003
|
|-
|OST0004
|
|-
|OST0005
|
|-
|OST0006
|
|-
|OST0007
|
|-
|OST0008
|
|-
|OST0009
|
|-
|OST000a
|
|-
|OST000b
|
|-
|OST000c
|
|-
|OST000d
|
|-
|OST000e
|
|-
|OST000f
|
|-
|OST0010
|
|-
|OST0011
|
|-
|OST0012
|
|-
|OST0013
|
|-
|OST0014
|
|-
|OST0015
|
|-
|OST0016
|
|-
|OST0017
|
|-
|OST0018
|
|-
|OST0019
|
|-
|OST001a
|
|-
|OST001b
|
|-
|OST001c
|
|-
|OST001d
|
|-
|OST001e
|
|-
|OST001f
|
|-
|OST0020
|
|-
|OST0021
|
|-
|OST0022
|
|-
|OST0023
|
|-
|OST0024
|
|-
|OST0025
|
|-
|OST0026
|
|-
|OST0027
|
|-
|OST0028
|
|-
|OST0029
|
|-
|OST002a
|
|-
|OST002b
|
|-
|OST002c
|
|-
|OST002d
|
|-
|OST002e
|
|-
|OST002f
|
|-
| colspan="2" |value
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: ost_jobstats_samples
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="8"|fqdn
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
|fs_name
|exafs
|
|-
| rowspan="11" |job_id
|00145:31716:2169120.hawk-pbs5
|
|-
|00277:29017:1955996.hawk-pbs5
|
|-
|00279:15448:1963897.hawk-pbs5
|
|-
|00279:15448:2056529.hawk-pbs5
|
|-
|00279:15448:__ResultCombine.e
|
|-
|00967:30969:1983785.hawk-pbs5
|
|-
|00967:34627:2209444.hawk-pbs5__
|
|-
|00967:34627:2225903.hawk-pbs5__
|
|-
|00967:34627:2237056.hawk-pbs5__
|
|-
|01141:32275:2121954.hawk-pbs5
|
|-
|...
|
|-
| rowspan="11" |ll_job_gid
|0
|
|-
|00145
|
|-
|00277
|
|-
|00279
|
|-
|00967
|
|-
|01141
|
|-
|01142
|
|-
|01392
|
|-
|01540
|
|-
|02073
|
|-
|...
|
|-
| rowspan="11" |ll_job_id
|.
|
|-
|.hawk-pbs5
|
|-
|.hawk-pbs5__
|
|-
|0
|
|-
|0-bin
|
|-
|00
|
|-
|01
|
|-
|01].hawk-pbs
|
|-
|02
|
|-
|02].hawk-pbs
|
|-
|...
|
|-
| rowspan="11" |ll_job_uid
|.kworker/10
|
|-
|.kworker/101
|
|-
|.kworker/104
|
|-
|.kworker/106
|
|-
|.kworker/107
|
|-
|.kworker/108
|
|-
|.kworker/112
|
|-
|.kworker/113
|
|-
|.kworker/114
|
|-
|.kworker/116
|
|-
|...
|
|-
| rowspan="14"|optype
|create
|
|-
|destroy
|
|-
|get_info
|
|-
|getattr
|
|-
|punch
|
|-
|quotactl
|
|-
|read
|
|-
|read_samples
|
|-
|set_info
|
|-
|setattr
|
|-
|statfs
|
|-
|sync
|
|-
|write
|
|-
|write_samples
|
|-
| rowspan="48"|ost_index
|OST0000
|
|-
|OST0001
|
|-
|OST0002
|
|-
|OST0003
|
|-
|OST0004
|
|-
|OST0005
|
|-
|OST0006
|
|-
|OST0007
|
|-
|OST0008
|
|-
|OST0009
|
|-
|OST000a
|
|-
|OST000b
|
|-
|OST000c
|
|-
|OST000d
|
|-
|OST000e
|
|-
|OST000f
|
|-
|OST0010
|
|-
|OST0011
|
|-
|OST0012
|
|-
|OST0013
|
|-
|OST0014
|
|-
|OST0015
|
|-
|OST0016
|
|-
|OST0017
|
|-
|OST0018
|
|-
|OST0019
|
|-
|OST001a
|
|-
|OST001b
|
|-
|OST001c
|
|-
|OST001d
|
|-
|OST001e
|
|-
|OST001f
|
|-
|OST0020
|
|-
|OST0021
|
|-
|OST0022
|
|-
|OST0023
|
|-
|OST0024
|
|-
|OST0025
|
|-
|OST0026
|
|-
|OST0027
|
|-
|OST0028
|
|-
|OST0029
|
|-
|OST002a
|
|-
|OST002b
|
|-
|OST002c
|
|-
|OST002d
|
|-
|OST002e
|
|-
|OST002f
|
|-
| colspan="2" |value
|float
|}
=== Metrics differentiating between clients ===
{| class="wikitable mw-collapsible"
|+Measurement: exp_ost_stats_bytes
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="11" |exp_client
|10.148.0.32
|
|-
|10.148.0.33
|
|-
|10.148.0.34
|
|-
|10.148.0.36
|
|-
|10.148.0.37
|
|-
|10.148.0.38
|
|-
|10.148.0.39
|
|-
|10.148.0.40
|
|-
|10.148.0.41
|
|-
|10.148.0.42
|
|-
|...
|
|-
| rowspan="3"|exp_type
|o2ib20
|
|-
|o2ib43
|
|-
|o2ib44
|
|-
| rowspan="8"|fqdn
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
|fs_name
|exafs
|
|-
| rowspan="2"|optype
|read
|
|-
|write
|
|-
| rowspan="48"|ost_index
|OST0000
|
|-
|OST0001
|
|-
|OST0002
|
|-
|OST0003
|
|-
|OST0004
|
|-
|OST0005
|
|-
|OST0006
|
|-
|OST0007
|
|-
|OST0008
|
|-
|OST0009
|
|-
|OST000a
|
|-
|OST000b
|
|-
|OST000c
|
|-
|OST000d
|
|-
|OST000e
|
|-
|OST000f
|
|-
|OST0010
|
|-
|OST0011
|
|-
|OST0012
|
|-
|OST0013
|
|-
|OST0014
|
|-
|OST0015
|
|-
|OST0016
|
|-
|OST0017
|
|-
|OST0018
|
|-
|OST0019
|
|-
|OST001a
|
|-
|OST001b
|
|-
|OST001c
|
|-
|OST001d
|
|-
|OST001e
|
|-
|OST001f
|
|-
|OST0020
|
|-
|OST0021
|
|-
|OST0022
|
|-
|OST0023
|
|-
|OST0024
|
|-
|OST0025
|
|-
|OST0026
|
|-
|OST0027
|
|-
|OST0028
|
|-
|OST0029
|
|-
|OST002a
|
|-
|OST002b
|
|-
|OST002c
|
|-
|OST002d
|
|-
|OST002e
|
|-
|OST002f
|
|-
| colspan="2" |value
|float
|}
{| class="wikitable mw-collapsible"
|+Measurement: exp_ost_stats_samples
!Key
!Value
!Explanation
|-
|cluster
|exafs
|
|-
| rowspan="11" |exp_client
|10.148.0.32
|
|-
|10.148.0.33
|
|-
|10.148.0.34
|
|-
|10.148.0.36
|
|-
|10.148.0.37
|
|-
|10.148.0.38
|
|-
|10.148.0.39
|
|-
|10.148.0.40
|
|-
|10.148.0.41
|
|-
|10.148.0.42
|
|-
|...
|
|-
| rowspan="3"|exp_type
|o2ib20
|
|-
|o2ib43
|
|-
|o2ib44
|
|-
| rowspan="8"|fqdn
|hawk-oss01
|
|-
|hawk-oss02
|
|-
|hawk-oss03
|
|-
|hawk-oss04
|
|-
|hawk-oss05
|
|-
|hawk-oss06
|
|-
|hawk-oss07
|
|-
|hawk-oss08
|
|-
|fs_name
|exafs
|
|-
| rowspan="2"|optype
|read
|
|-
|write
|
|-
| rowspan="48"|ost_index
|OST0000
|
|-
|OST0001
|
|-
|OST0002
|
|-
|OST0003
|
|-
|OST0004
|
|-
|OST0005
|
|-
|OST0006
|
|-
|OST0007
|
|-
|OST0008
|
|-
|OST0009
|
|-
|OST000a
|
|-
|OST000b
|
|-
|OST000c
|
|-
|OST000d
|
|-
|OST000e
|
|-
|OST000f
|
|-
|OST0010
|
|-
|OST0011
|
|-
|OST0012
|
|-
|OST0013
|
|-
|OST0014
|
|-
|OST0015
|
|-
|OST0016
|
|-
|OST0017
|
|-
|OST0018
|
|-
|OST0019
|
|-
|OST001a
|
|-
|OST001b
|
|-
|OST001c
|
|-
|OST001d
|
|-
|OST001e
|
|-
|OST001f
|
|-
|OST0020
|
|-
|OST0021
|
|-
|OST0022
|
|-
|OST0023
|
|-
|OST0024
|
|-
|OST0025
|
|-
|OST0026
|
|-
|OST0027
|
|-
|OST0028
|
|-
|OST0029
|
|-
|OST002a
|
|-
|OST002b
|
|-
|OST002c
|
|-
|OST002d
|
|-
|OST002e
|
|-
|OST002f
|
|-
| colspan="2" |value
|float
|}

Latest revision as of 09:38, 3 April 2024