- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "Workspace mechanism"

From HLRS Platforms
Jump to navigationJump to search
Line 85: Line 85:
== Quota limits ==
== Quota limits ==
==== New mechanism and policy for Hawk ====
==== New mechanism and policy for Hawk and vulcan (ws3 filesystem) ====
Due to the cache-based structure, the native quota tools are not
Due to the cache-based structure, the native quota tools are not
sufficient to obtain the information. HLRS has provided a special command '''''ws_quota''''' to display the limits as well as the current usage of both the user and the group.  
sufficient to obtain the information. HLRS has provided a special command '''''ws_quota''''' to display the limits as well as the current usage of both the user and the group.  

Revision as of 11:33, 21 September 2021

This mechanism allows you to keep data outside your home not only during a run, but also after a run. The idea is to allocate disk space for a number of days, and giving it a name, which allows you to identify a workspace, and to distinguish several workspaces. It is also possible to allocate workspaces on different filesystems, which are prepared for workspaces on the local host. The toolset is an Open Source Project (see hpc-workspace@github). Further documentation can be found at github as well, e.g. the Workspace user guide@github.

Allocating new workspace:

 MYSCR=`ws_allocate SimulatesomeThing 10` 
    MYSCR will contain the name of a directory which exists for 10 days, is on a temporary filesystem, and is owned by the caller. The directory is not deleted after the job, but after 10 days of realworld time. In a second job, you can just use the same line to get the same directory. Please note that the directory of the example will be deleted 10 days after first usage, no matter how often it is used and what duration was specified in the subsequent calls. The name may not contain any special characters, only digits and letters are allowed (the only exceptions are dash, dot, and underscore which are also possible, but not as first character of the name).
   ws_allocate [-F filesystem] name duration 
    The option -F filesystem specifies the filesystem on which your workspace will be located. If this option is omitted, then your workspace will be located on a default filesystem.
    If the option "duration" is ommited your workspace will have default lifetime 1 day.

Find your existing workspace path:

 MYSCR=`ws_find SimulatesomeThing` 
    MYSCR will contain the name of the directory where your prior allocated workspace is located

Listing your workspaces and available workspace filesystems:

    Lists all workspaces of the default workspace filesystem, their names and locations as well as remaining live time.
 ws_list -a
    Lists your workspaces in all workspace filesystems.
 ws_list -s
    short output: list names of workspaces only (this is useful for scripting, e.g. the output can be used for ws_find to finally obtain the directories)
 ws_list -F <fstype>
    If several filesystems are used, you can limit the listing to workspaces of a specific filesystem type only. On Laki the filesystem NEC_lustre is available, on Hazelhen the filesystem univ_1. default is also a possible value.
 ws_list -l
    list all available workspace filesystems

Release a workspace:

 ws_release [-F filesystem] name 
    Release the workspace 'name' on the specified filesystem. If the optione '-F filesystem' is omitted, then the workspace 'name' will be released on the default workspace filesystem. The user are responsible for releasing the correct workspace. After a workspace is released or the workspace is expired, the directory of the workspace is moved to some kind of a trash can.

Please see also this note w.r.t. quota before doing a ws_release!

Register your workspaces:

 ws_register -F filesystem dir
    This command will create/update in directory 'dir/filesystem' symbolic links of your workspaces in the specified 'filesystem'. If filesystem = ALL, then all of your workspaces in all available filesystems will be registered in the specified 'dir'. The symbolic links could be useful for getting some more information about your workspaces e.g. find ..., du ..., ls ..., ....

Extend your workspace duration:

Starting with Version 3.0 of the workspace tools there is a possibility to extend an existing workspace:

ws_extend [-f <fsname>] <wsname> [<duration>]

or, alternatively, using ws_allocate with the flag -x

ws_allocate -x [-f <fsname>] <wsname> <duration>

with wsname being the name of an existing workspace. When using ws_extend the duration may be ommitted. Extension is allowed for a small number of times, which is displayed with ws_list <wsname>.

Remind your workspace expiration date

 ws_send_ical  <WS_name> <your_Email-address>

This little script may help you to take care of your work space. For this task it checks the remaining time of a given work space (first parameter) and send a calendar invitation to the Email address (second parameter). If you are using a mail programm with an integrated calendar function, you can accept this event and your calendar / PDA will remind you ...

Quota limits

New mechanism and policy for Hawk and vulcan (ws3 filesystem)

Due to the cache-based structure, the native quota tools are not sufficient to obtain the information. HLRS has provided a special command ws_quota to display the limits as well as the current usage of both the user and the group. If you as a user or your group exceed the quota limit, no further jobs will be executed in the batch queues! In this case, you and your group will receive an email in order to let you know about this fact. This policy is due to the fact that also the filesystem is an expensive resource which should be used as less as possible hence!

If you can not run jobs anymore due to hitting the quota limit, please pursue the following strategy:

  • All members of the respective group should check whether they are a member of this group only. You can check the groups you are assigned to by the id command.
  • If (and only if!) all members of the respective group are assigned to this group only, every member of your group should check his/her personal contribution w.r.t. data volume and file count by means of the ws_quota command. If the sum of the results is close to the overall numbers of the group, you can use the individual numbers to figure out who contributes most to the quota and hence should reduce data volume and/or file count.
  • If some members of the respective group are also assigned to other groups, it's not reasonable to rely on the numbers given by ws_quota in order to assess who contributes a large amount of the overall quota! This is due to the fact that users might have files which are assigned to different groups. ws_quota - however - prints the overall numbers only, independent of which group those files are assigned to!

lustre usage on vulcan

usually quotas are enforced for all filesystems holding workspaces on user and group basis. To check the current usage, use following commands:

        lfs quota <file_system>

Please note, for a lustre based filesystem one has to use the lfs command to get the quota information. This is different to e.g. the HOME, for which the quota command shows a different quota.

Notes for expired or released workspaces

Expired or released workspace directories are kept for a few days and will be counted to the quota! If you want to delete files immediately to free up space, use the "rm" command before "ws_release".

Workspace was expired, what can I do?

After a workspace was expired, a system cleaner moves that workspace for some days to a trash directory which is not accessible by users. Some days later the trash directory will be really cleaned. Users can list their own expired workspace which are still in the trash directory by using:

        ws_restore -l

To restore expired workspaces is also possible by users. First you need a valid workspace. If you don't have a valid workspace, so please create one (see above). Caution: Ommiting the "duration" option will result in default lifetime (1 day). Then you can restore a expired workspace which are listed with "ws_restore -l" using:

        ws_restore [options] workspace_name target_name

workspace_name is one of the names listed by "ws_restore -l". target_name is one of your valid workspace id's listed with "ws_list".