Workspace migration: Difference between revisions

Latest revision as of 13:39, 26 April 2022

Warning: This page describe the necessary steps to migrate workspaces to the new filesystems.
End of 2021 all data on the old vulcan ws2 filesystems will be deleted. Follow the below guide to transfer your data to the new ws3 filesystems.

User migration to new workspaces

On vulcan cluster a new workspace filesystem (ws3) has been integrated. The currently existing workspace file system (ws2) will be shut down by the end of the year 2021. Now, users have to migrate their workspaces located on the old filesystems onto the new filesystem. Run the command ws_list -a on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to be migrated to the new filesystem.

File System	mounted on
NEC_lustre	/lustre/nec/ws2

Before you start

Migration for large amount of data consumes a lot of IO ressources. Please review and remove data not needed any more or move it into HPSS.

How to proceed

from October 4th 2021 10:00 on new workspaces will be allocated on the replacement filesystem. Existing workspaces will be listed further on.
workspaces located on old filesystems can not be extended anymore.
if you have to migrate data from workspaces on one to another filesystems, do not use the mv command to transfer data. For large amount of data, this will fail due to time limits. Currently we recommend for e.g. millions of small files or for large amount of data to use following command inside a single node batch job: rsync -a --hard-links Old_ws/ new_ws/
You can try to use the mpifileutils dcp or dsync
take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.
On November 3rd 2021 the “old” workspaces on ws2 will be disconnected from the vulcan compute nodes. The filesystems will be available on the frontend systems for some further days.
November 2021 all data on the old filesystems will be deleted.

Operation of the workspaces on ws3:

No job of any group member will be scheduled for computation as long as the group quota is exceeded.
accounting
max. lifetime of a workspace is currently 60 days
default lifetime of a workspace is 1 day
please read related man pages or online workspace mechanism document

in particular note that the workspace tools allow to explicitly address a specific workspace file system using the -F option (e.g. ws_allocate -F ws3 my_workspace 10)

to list your available workspace file systems use ws_list -l
users can restore expired workspaces using ws_restore

Please read https://kb.hlrs.de/platforms/index.php/Storage_usage_policy

Using mpifileutils for data transfer

This mpifileutils suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for large datasets, providing speedups of up to 50x compared to single process jobs. It can only be run on compute nodes via mpirun.

dcp or dsync is similar to cp -r or rsync; simply give it a source directory and destination and dcp / dsync will recursively copy the source directory to the destination in parallel.

dcp / dsync has a number of useful options; use dcp -h or dsync -h to see a description or use the [User Guide]

It should be invoked via mpirun.

We highly recommend to use dcp / dsync with an empty ~/.profile and ~/.bashrc only! Furthermore, take care that only the following modules are loaded when using mpifileutils (this can be achieved by logging into the system without modifying the list of modules and loading the tools/mpifileutils module only):
1) system/pbs/19.1.1(default)
2) system/batchsystem/auto
3) system/site_names
4) system/ws/1.3.5b(default)
5) system/wrappers/1.0(default)
6) mpi/ucx/1.8.1
7) tools/binutils/2.32
8) compiler/gnu/9.2.0(default)
9) mpi/openmpi/4.0.5-gnu-9.2.0(default)
10) tools/mpifileutils

dcp

Parallel MPI application to recursively copy files and directories.

dcp is a file copy tool in the spirit of cp(1) that evenly distributes the work of scanning the directory tree, and copying file data across a large cluster without any centralized state. It is designed for copying files that are located on a distributed parallel file system, and it splits large file copies across multiple processes.

Run dcp with the -p option to preserve permissions and timestamps, and ownership.

-p : preserve permissions and timestamps, and ownership

--chunksize C: Copy files larger than C Bytes in C Byte chunks (default ist 4MB)

We highly recommend using the -p option.

dsync

Parallel MPI application to synchronize two files or two directory trees.

dsync makes DEST match SRC, adding missing entries from DEST, and updating existing entries in DEST as necessary so that SRC and DEST have identical content, ownership, timestamps, and permissions.

--chunksize C: Copy files larger than C Bytes in C Byte chunks (default ist 4MB)

Job Script example

Here is an example of a job script.

You have to change the SOURCEDIR and TARGETDIR according to your setup. Also the number of nodes and wallclock time should be adjusted.

#!/bin/bash
#PBS -N parallel-copy
#PBS -l select=2:node_type=hsw:mpiprocs=20
#PBS -l walltime=00:20:00

module load tools/mpifileutils

SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
TARGETDIR=<YOUR TARGET DIRECTORY HERE>

sleep 5
nodes=$(cat $PBS_NODEFILE | sort -u | wc -l)
let cores=nodes*20

time_start=$(date "+%c  :: %s")
#mpirun -np $cores dcp -p --bufsize 8MB ${SOURCEDIR}/ ${TARGETDIR}/
mpirun -np $cores dsync --bufsize 8MB $SOURCEDIR $TARGETDIR
time_end=$(date "+%c  :: %s")

tt_start=$(echo $time_start | awk {'print $9'})
tt_end=$(echo $time_end | awk {'print $9'})
(( total_time=$tt_end-$tt_start ))
echo "Total runtime in seconds: $total_time"

@@ Line 1: / Line 1: @@
-<!-- {{Warning
+{{Warning
-| text = This page originally describe the necessary steps to migrate workspaces to the new filesystems back in 2017. This page is mostly kept for documentation purposes.<br/>
+| text = This page describe the necessary steps to migrate workspaces to the new filesystems.<br/>
-Long-term, the most valuable information is the description of the utility ''pcp'', which allows copying directory structures in parallel on Lustre filesystems. Some of the original scripts shown on this page have been modified to account for changes in the HLRS environment. They are up-to-date as of July 2018.
+End of 2021 all data on the old vulcan ws2 filesystems will be deleted. Follow the below guide to transfer your data to the new ws3 filesystems.
-}} -->
+}}
@@ Line 11: / Line 12: @@
-With the installation of HPE Apollo 9000 (Hawk) a new fast workspace filesystem was integrated. For a certain transition period the workspace filesystem of the predecessor system Cray XC40 hazelhen (Sonexion ws9) was integrated at the same time.
+On vulcan cluster a new workspace filesystem (ws3) has been integrated. The currently existing workspace file system (ws2) will be shut down by the end of the year 2021.
-Now, users have to migrate their workspaces located on the old filesystem onto the new filesystems. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to
+Now, users have to migrate their workspaces located on the old filesystems onto the new filesystem. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to
 be migrated to the new filesystem.
@@ Line 21: / Line 22: @@
 ! mounted on
 |-
-| ws9.0
+| NEC_lustre
-| /lustre/cray/ws9/0
+| /lustre/nec/ws2
-|-
-| ws9.1
-| /lustre/cray/ws9/1
-|-
-| ws9.2
-| /lustre/cray/ws9/2
-|-
-| ws9.3
-| /lustre/cray/ws9/3
-|-
-| ws9.4
-| /lustre/cray/ws9/4
-|-
-| ws9.5
-| /lustre/cray/ws9/5
-|-
-| ws9.6
-| /lustre/cray/ws9/6
-|-
-| ws9.6p
-| /lustre/cray/ws9/6
 |-
 |}
@@ Line 53: / Line 33: @@
 == How to proceed ==
-* from <Font color=red>NovDecember XXth 2020 X0:00</Font> on new workspaces will be allocated on the replacement filesystems. Existing workspaces will be listed further on.
+* from <Font color=red>October 4th 2021 10:00</Font> on new workspaces will be allocated on the replacement filesystem. Existing workspaces will be listed further on.
 * workspaces located on old filesystems can not be extended anymore.
-* if you have to migrate data from workspaces on one of the above listed filesystems, do not use the ''mv'' command to transfer data. For large amount of data, this will fail due to time limits. We recommend using [[Workspace_migration#Using_a_parallel_copy_for_data_transfer | parallel copy programm ''pcp'']] for large amount of data in large files. If this fails for e.g. millions of small files following command may help: ''rsync -a  --hard-links   Old_ws/   new_ws/''
+* if you have to migrate data from workspaces on one to another filesystems, do not use the ''mv'' command to transfer data. For large amount of data, this will fail due to time limits. Currently we recommend for e.g. millions of small files or for large amount of data to use following command inside a single node batch job: ''rsync -a  --hard-links   Old_ws/   new_ws/''
+* You can try to use the [[Workspace_migration#Using_mpifileutils_for_data_transfer | mpifileutils '''dcp''' or '''dsync''']]
 * take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.
-* On <Font color=red>January 31th 2021</Font> the “old” workspaces ws9.* will be disconnected from the HPE Apollo 9000 compute nodes. The filesystems will be available on the frontend systems for data migration until February 14th 2021.
+* On <Font color=red>November 3rd 2021</Font> the “old” workspaces on ws2 will be disconnected from the vulcan compute nodes. The filesystems will be available on the frontend systems for some further days.
-* <Font color=red>February 15th 2021</font> all data on the old filesystems will be deleted.
+* <Font color=red>November 2021</font> all data on the old filesystems will be deleted.
-== Operation of the workspaces: ==
+== Operation of the workspaces on ws3: ==
 * No job of any group member will be scheduled for computation as long as the group quota is exceeded.
@@ Line 67: / Line 48: @@
 * default lifetime of a workspace is 1 day
 * please read related man pages or online [[Workspace_mechanism | workspace mechanism document]]<BR>
-: in particular note that the workspace tools allow to explicitly address a specific workspace file system using the <tt>-F</tt> option (e.g. <tt>ws_allocate -F ws14.1 my_workspace 10</tt>)
+: in particular note that the workspace tools allow to explicitly address a specific workspace file system using the <tt>-F</tt> option (e.g. <tt>ws_allocate -F ws3 my_workspace 10</tt>)
 * to list your available workspace file systems use <tt>ws_list -l</tt>
 * users can restore expired workspaces using ''ws_restore''
@@ Line 73: / Line 54: @@
 Please read https://kb.hlrs.de/platforms/index.php/Storage_usage_policy
-== Using a parallel copy for data transfer ==
+== Using mpifileutils for data transfer ==
+This mpifileutils suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for large  datasets, providing speedups of up to 50x compared to single process jobs. It can only be run on compute nodes via mpirun.
-pcp is a python based parallel copy using MPI. It can only be run on compute nodes via mpirun.
+dcp or dsync is similar to cp -r or rsync; simply give it a source directory and destination and dcp / dsync will recursively copy the source directory to the destination in parallel.
-pcp is similar to cp -r; simply give it a source directory and destination and pcp will recursively copy the source directory to the destination in parallel.
+dcp / dsync has a number of useful options; use dcp -h or dsync -h to see a description or use the [[https://mpifileutils.readthedocs.io/en/v0.11/ User Guide]]
-pcp has a number of useful options; use pcp -h to see a description.
+It should be invoked via mpirun.
-This program traverses a directory tree and copies files in the tree in parallel. It does not copy individual files in parallel. It should be invoked via aprun.
+We highly recommend to use dcp / dsync with an empty ~/.profile and ~/.bashrc only! Furthermore, take care that only the following modules are loaded when using mpifileutils (this can be achieved by logging into the system without modifying the list of modules and loading the tools/mpifileutils module only): <br>
+) system/pbs/19.1.1(default) <br>
+) system/batchsystem/auto <br>
+) system/site_names <br>
+) system/ws/1.3.5b(default) <br>
+) system/wrappers/1.0(default) <br>
+) mpi/ucx/1.8.1 <br>
+) tools/binutils/2.32 <br>
+) compiler/gnu/9.2.0(default) <br>
+) mpi/openmpi/4.0.5-gnu-9.2.0(default) <br>
+) tools/mpifileutils <br>
-=== Basic arguments ===
-If run with the '''-l''' flag or '''-lf''' flags pcp will be stripe aware.
-'''-l''' will cause stripe information to be copied from the source files and directories.
+=== dcp ===
+Parallel MPI application to recursively copy files and directories.
-'''-lf''' will cause all files and directories on the destination to be striped, regardless of the striping on the source.
+dcp is a file copy tool in the spirit of cp(1) that evenly distributes the work of scanning the directory tree, and copying file data across a large cluster without any centralized state. It is designed for copying files that are located on a distributed parallel file system, and it splits large file copies across multiple processes.
-Striping behavior can be further modified with -ls and -ld.
+Run '''dcp''' with the '''-p''' option to preserve permissions and timestamps, and ownership.<br>
-'''-ls''' will set a minimum file size.
+'''-p'''  : preserve permissions and timestamps, and ownership
-Files below this size will not be striped, regardless of the source striping.
-'''-ld''' will cause all directories to be unstriped.
+'''--chunksize C''': Copy files larger than C Bytes in C Byte chunks (default ist 4MB)
-'''-b C''': Copy files larger than C Mbytes in C Mbyte chunks
+We highly recommend using the '''-p''' option.
-=== Algorithm ===
+=== dsync ===
+Parallel MPI application to synchronize two files or two directory trees.
-pcp runs in two phases:
+dsync makes DEST match SRC, adding missing entries from DEST, and updating existing entries in DEST as necessary so that SRC and DEST have identical content, ownership, timestamps, and permissions.
-Phase I is a parallel walk of the file tree, involving all MPI ranks in a peer-to-peer algorithm. The walk constructs the list of files to be copied and creates the destination directory hierarchy.
+'''--chunksize C''': Copy files larger than C Bytes in C Byte chunks (default ist 4MB)
-In phase II, the actual files are copied. Phase II uses a master-slave algorithm.
-R0 is the master and dispatches file copy instructions to the slaves (R1...Rn).
 === Job Script example ===
 Here is an example of a job script.
 You have to change the SOURCEDIR and TARGETDIR according to your setup.
 Also the number of nodes and wallclock time should be adjusted.
-Again, pcp does NOT parallelize a single copy operation, but the number of copy operations are distributed over the nodes.
- > cat pcp.qsub
   #!/bin/bash
-  #PBS -N IO_copy_test
+  #PBS -N parallel-copy
-  #PBS -l select=8:mpiprocs=128:node_type=rome
+  #PBS -l select=2:node_type=hsw:mpiprocs=20
-  #PBS -l walltime=0:30:00
+  #PBS -l walltime=00:20:00
- #PBS -joe
- cd $PBS_O_WORKDIR
-  # originally:
+  module load tools/mpifileutils
- # module load tools/python/2.7.8
- # currently:
- module load python-site/2.7
   SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
   TARGETDIR=<YOUR TARGET DIRECTORY HERE>
   sleep 5
   nodes=$(cat $PBS_NODEFILE | sort -u | wc -l)
-  let cores=nodes*128
+  let cores=nodes*20
- /usr/bin/time -p mpirun -n $cores -N128  pcp -l -ls 1048576  -b 4096 $SOURCEDIR $TARGETDIR
- >
-Output of a run with the script
- R0: All workers have reported in.
- Starting 192 processes.
- Will copy lustre stripe information.
- Files larger than 4096 Mbytes will be copied in parallel chunks.
- Will not stripe files smaller than 1.00 Mbytes
- Starting phase I: Scanning and copying directory structure...
- Phase I done: Scanned 115532 files, 1007 dirs in 00 hrs 00 mins 01 secs (106900 items/sec).
-files will be copied.
-  Starting phase II: Copying files...
+  time_start=$(date "+%c  :: %s")
-  Phase II done.
+  #mpirun -np $cores dcp -p --bufsize 8MB ${SOURCEDIR}/ ${TARGETDIR}/
+ mpirun -np $cores dsync --bufsize 8MB $SOURCEDIR $TARGETDIR
+ time_end=$(date "+%c  :: %s")
-  Copy Statisics:
+  tt_start=$(echo $time_start | awk {'print $9'})
- Rank 1 copied 7.00 Gbytes in 839 files (38.17 Mbytes/s)
+  tt_end=$(echo $time_end | awk {'print $9'})
-  Rank 2 copied 6.37 Gbytes in 825 files (34.75 Mbytes/s)
+  (( total_time=$tt_end-$tt_start ))
-  ...
+  echo "Total runtime in seconds: $total_time"
- Rank 190 copied 7.84 Gbytes in 495 files (42.78 Mbytes/s)
- Rank 191 copied 7.25 Gbytes in 784 files (39.63 Mbytes/s)
-  Total data copied: 1.47 Tbytes in 115606 files (7.09 Gbytes/s)
- Total Time for copy: 00 hrs 03 mins 31 secs
- Warnings 0
- Application 6324961 resources: utime ~4257s, stime ~4005s, Rss ~33000, inblocks ~3148863732, outblocks ~3148201243
- real 259.90
- user 0.02
- sys 0.01

Workspace migration: Difference between revisions

Latest revision as of 13:39, 26 April 2022

Contents

User migration to new workspaces

Before you start

How to proceed

Operation of the workspaces on ws3:

Using mpifileutils for data transfer

dcp

dsync

Job Script example

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools