- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Workspace migration: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
Line 2: Line 2:


{{Warning
{{Warning
| text = This page describe the necessary steps to migrate workspaces to the new filesystems.<br/>  
| text = This page describe the necessary steps to migrate workspaces to the another workspace filesystems.<br/>  


End of 2021 all data on the old vulcan ws2 filesystems will be deleted. Follow the below guide to transfer your data to the new ws3 filesystems.
In October of 2024 all data on the old hawk ws10 filesystems will be deleted. Follow the below guide to transfer your data to the ws11 workspace filesystems.


}}
}}
Line 12: Line 12:




On vulcan cluster a new workspace filesystem (ws3) has been integrated. The currently existing workspace file system (ws2) will be shut down by the end of the year 2021.  
On hawk the workspace filesystem ws11 will be the new default workspace filesystem. The currently existing default workspace filesystem (ws10) will be shut down in October 2024. The policy settings from ws10 will be transferred to the ws11 filesystem.
Now, users have to migrate their workspaces located on the old filesystems onto the new filesystem. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to  
Now, users have to migrate their workspaces located on the ws10 filesystems onto the ws11 filesystem. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to  
be migrated to the new filesystem.
be migrated to the ws11 filesystem.




Line 22: Line 22:
! mounted on
! mounted on
|-
|-
| NEC_lustre
| ws10.0
| /lustre/nec/ws2
| /lustre/hpe/ws10/ws10.0
|-
ws10.1
| /lustre/hpe/ws10/ws10.1
|-
ws10.2
| /lustre/hpe/ws10/ws10.2
|-
ws10.3
| /lustre/hpe/ws10/ws10.3
|-
ws10.3P
| /lustre/hpe/ws10/ws10.3P
|-
|-
|}
|}
Line 33: Line 45:
== How to proceed ==
== How to proceed ==


* from <Font color=red>October 4th 2021 10:00</Font> on new workspaces will be allocated on the replacement filesystem. Existing workspaces will be listed further on.  
* from <Font color=red>July 15th 2024 10:00</Font>: switching default workspace filesystem from ws10 to ws11. ws11 has the same policy settings as ws10.
* workspaces located on old filesystems can not be extended anymore.
* From that point new workspaces will be allocated on the ws11 filesystem. Existing workspaces on ws10 will be listed further on.  
* workspaces located on ws10 filesystems can not be extended anymore.
* if you have to migrate data from workspaces on one to another filesystems, do not use the ''mv'' command to transfer data. For large amount of data, this will fail due to time limits. Currently we recommend for e.g. millions of small files or for large amount of data to use following command inside a single node batch job: ''rsync -a  --hard-links  Old_ws/  new_ws/''   
* if you have to migrate data from workspaces on one to another filesystems, do not use the ''mv'' command to transfer data. For large amount of data, this will fail due to time limits. Currently we recommend for e.g. millions of small files or for large amount of data to use following command inside a single node batch job: ''rsync -a  --hard-links  Old_ws/  new_ws/''   
* You can try to use the [[Workspace_migration#Using_mpifileutils_for_data_transfer | mpifileutils '''dcp''' or '''dsync''']]
* You can try to use the [[Workspace_migration#Using_mpifileutils_for_data_transfer | mpifileutils '''dcp''' or '''dsync''']]
* take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.  
* take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.  
* On <Font color=red>November 3rd 2021</Font> the “old” workspaces on ws2 will be disconnected from the vulcan compute nodes. The filesystems will be available on the frontend systems for some further days.
* On <Font color=red>October 15th 2024</Font> the “old” workspaces on ws10 will be disconnected from the hawk compute nodes. The filesystems will be available on the frontend systems for some further days.
* <Font color=red>November 2021</font> all data on the old filesystems will be deleted.
* <Font color=red>November 2024</font> all data on the old ws10 filesystems will be deleted.


== Operation of the workspaces on ws3: ==
== Operation of the workspaces on ws3: ==

Revision as of 11:14, 4 July 2024


Warning: This page describe the necessary steps to migrate workspaces to the another workspace filesystems.
In October of 2024 all data on the old hawk ws10 filesystems will be deleted. Follow the below guide to transfer your data to the ws11 workspace filesystems.


User migration to new workspaces

On hawk the workspace filesystem ws11 will be the new default workspace filesystem. The currently existing default workspace filesystem (ws10) will be shut down in October 2024. The policy settings from ws10 will be transferred to the ws11 filesystem. Now, users have to migrate their workspaces located on the ws10 filesystems onto the ws11 filesystem. Run the command ws_list -a on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to be migrated to the ws11 filesystem.


ws10.1 ws10.2 ws10.3 ws10.3P
File System mounted on
ws10.0 /lustre/hpe/ws10/ws10.0
/lustre/hpe/ws10/ws10.1
/lustre/hpe/ws10/ws10.2
/lustre/hpe/ws10/ws10.3
/lustre/hpe/ws10/ws10.3P

Before you start

Migration for large amount of data consumes a lot of IO ressources. Please review and remove data not needed any more or move it into HPSS.

How to proceed

  • from July 15th 2024 10:00: switching default workspace filesystem from ws10 to ws11. ws11 has the same policy settings as ws10.
  • From that point new workspaces will be allocated on the ws11 filesystem. Existing workspaces on ws10 will be listed further on.
  • workspaces located on ws10 filesystems can not be extended anymore.
  • if you have to migrate data from workspaces on one to another filesystems, do not use the mv command to transfer data. For large amount of data, this will fail due to time limits. Currently we recommend for e.g. millions of small files or for large amount of data to use following command inside a single node batch job: rsync -a --hard-links Old_ws/ new_ws/
  • You can try to use the mpifileutils dcp or dsync
  • take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.
  • On October 15th 2024 the “old” workspaces on ws10 will be disconnected from the hawk compute nodes. The filesystems will be available on the frontend systems for some further days.
  • November 2024 all data on the old ws10 filesystems will be deleted.

Operation of the workspaces on ws3:

  • No job of any group member will be scheduled for computation as long as the group quota is exceeded.
  • accounting
  • max. lifetime of a workspace is currently 60 days
  • default lifetime of a workspace is 1 day
  • please read related man pages or online workspace mechanism document
in particular note that the workspace tools allow to explicitly address a specific workspace file system using the -F option (e.g. ws_allocate -F ws3 my_workspace 10)
  • to list your available workspace file systems use ws_list -l
  • users can restore expired workspaces using ws_restore

Please read https://kb.hlrs.de/platforms/index.php/Storage_usage_policy

Using mpifileutils for data transfer

This mpifileutils suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for large datasets, providing speedups of up to 50x compared to single process jobs. It can only be run on compute nodes via mpirun.

dcp or dsync is similar to cp -r or rsync; simply give it a source directory and destination and dcp / dsync will recursively copy the source directory to the destination in parallel.

dcp / dsync has a number of useful options; use dcp -h or dsync -h to see a description or use the [User Guide]

It should be invoked via mpirun.

We highly recommend to use dcp / dsync with an empty ~/.profile and ~/.bashrc only! Furthermore, take care that only the following modules are loaded when using mpifileutils (this can be achieved by logging into the system without modifying the list of modules and loading the tools/mpifileutils module only):
1) system/pbs/19.1.1(default)
2) system/batchsystem/auto
3) system/site_names
4) system/ws/1.3.5b(default)
5) system/wrappers/1.0(default)
6) mpi/ucx/1.8.1
7) tools/binutils/2.32
8) compiler/gnu/9.2.0(default)
9) mpi/openmpi/4.0.5-gnu-9.2.0(default)
10) tools/mpifileutils


dcp

Parallel MPI application to recursively copy files and directories.

dcp is a file copy tool in the spirit of cp(1) that evenly distributes the work of scanning the directory tree, and copying file data across a large cluster without any centralized state. It is designed for copying files that are located on a distributed parallel file system, and it splits large file copies across multiple processes.

Run dcp with the -p option to preserve permissions and timestamps, and ownership.

-p  : preserve permissions and timestamps, and ownership

--chunksize C: Copy files larger than C Bytes in C Byte chunks (default ist 4MB)

We highly recommend using the -p option.

dsync

Parallel MPI application to synchronize two files or two directory trees.

dsync makes DEST match SRC, adding missing entries from DEST, and updating existing entries in DEST as necessary so that SRC and DEST have identical content, ownership, timestamps, and permissions.

--chunksize C: Copy files larger than C Bytes in C Byte chunks (default ist 4MB)


Job Script example

Here is an example of a job script.

You have to change the SOURCEDIR and TARGETDIR according to your setup. Also the number of nodes and wallclock time should be adjusted.


#!/bin/bash
#PBS -N parallel-copy
#PBS -l select=2:node_type=hsw:mpiprocs=20
#PBS -l walltime=00:20:00

module load tools/mpifileutils

SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
TARGETDIR=<YOUR TARGET DIRECTORY HERE>

sleep 5
nodes=$(cat $PBS_NODEFILE | sort -u | wc -l)
let cores=nodes*20

time_start=$(date "+%c  :: %s")
#mpirun -np $cores dcp -p --bufsize 8MB ${SOURCEDIR}/ ${TARGETDIR}/
mpirun -np $cores dsync --bufsize 8MB $SOURCEDIR $TARGETDIR
time_end=$(date "+%c  :: %s")

tt_start=$(echo $time_start | awk {'print $9'})
tt_end=$(echo $time_end | awk {'print $9'})
(( total_time=$tt_end-$tt_start ))
echo "Total runtime in seconds: $total_time"