- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "Workspace migration"

From HLRS Platforms
Jump to navigationJump to search
(Update warning message with ws9 shutdown date)
 
(24 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  
  
<!-- {{Warning
+
{{Warning
| text = This page originally describe the necessary steps to migrate workspaces to the new filesystems back in 2017. This page is mostly kept for documentation purposes.<br/>  
+
| text = This page describe the necessary steps to migrate workspaces to the new filesystems.<br/>  
  
Long-term, the most valuable information is the description of the utility ''pcp'', which allows copying directory structures in parallel on Lustre filesystems. Some of the original scripts shown on this page have been modified to account for changes in the HLRS environment. They are up-to-date as of July 2018.
+
July 31nd 2021 all data on the old ws9 filesystems will be deleted. Follow the below guide to transfer your data to the new ws10 filesystems.
}} -->
+
 
 +
}}
  
  
Line 12: Line 13:
  
 
With the installation of HPE Apollo 9000 (Hawk) a new fast workspace filesystem was integrated. For a certain transition period the workspace filesystem of the predecessor system Cray XC40 hazelhen (Sonexion ws9) was integrated at the same time.
 
With the installation of HPE Apollo 9000 (Hawk) a new fast workspace filesystem was integrated. For a certain transition period the workspace filesystem of the predecessor system Cray XC40 hazelhen (Sonexion ws9) was integrated at the same time.
Now, users have to migrate their workspaces located on the old filesystem onto the new filesystems. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to  
+
Now, users have to migrate their workspaces located on the old filesystems onto the new filesystem. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to  
 
be migrated to the new filesystem.
 
be migrated to the new filesystem.
  
Line 53: Line 54:
 
== How to proceed ==
 
== How to proceed ==
  
* from <Font color=red>NovDecember XXth 2020 X0:00</Font> on new workspaces will be allocated on the replacement filesystems. Existing workspaces will be listed further on.  
+
* from <Font color=red>May 18th 2021 10:00</Font> on new workspaces will be allocated on the replacement filesystem. Existing workspaces will be listed further on.  
 
* workspaces located on old filesystems can not be extended anymore.
 
* workspaces located on old filesystems can not be extended anymore.
 
* if you have to migrate data from workspaces on one of the above listed filesystems, do not use the ''mv'' command to transfer data. For large amount of data, this will fail due to time limits. We recommend using [[Workspace_migration#Using_a_parallel_copy_for_data_transfer | parallel copy programm ''pcp'']] for large amount of data in large files. If this fails for e.g. millions of small files following command may help: ''rsync -a  --hard-links  Old_ws/  new_ws/''
 
* if you have to migrate data from workspaces on one of the above listed filesystems, do not use the ''mv'' command to transfer data. For large amount of data, this will fail due to time limits. We recommend using [[Workspace_migration#Using_a_parallel_copy_for_data_transfer | parallel copy programm ''pcp'']] for large amount of data in large files. If this fails for e.g. millions of small files following command may help: ''rsync -a  --hard-links  Old_ws/  new_ws/''
 
* take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.  
 
* take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.  
* On <Font color=red>January 31th 2021</Font> the “old” workspaces ws9.* will be disconnected from the HPE Apollo 9000 compute nodes. The filesystems will be available on the frontend systems for data migration until February 14th 2021.
+
* On <Font color=red>July 10th 2021</Font> the “old” workspaces ws9.* will be disconnected from the HPE Apollo 9000 compute nodes. The filesystems will be available on the frontend systems for data migration until July 30th 2021.
* <Font color=red>February 15th 2021</font> all data on the old filesystems will be deleted.
+
* <Font color=red>July 31nd 2021</font> all data on the old filesystems will be deleted.
  
 
== Operation of the workspaces: ==
 
== Operation of the workspaces: ==
Line 67: Line 68:
 
* default lifetime of a workspace is 1 day
 
* default lifetime of a workspace is 1 day
 
* please read related man pages or online [[Workspace_mechanism | workspace mechanism document]]<BR>
 
* please read related man pages or online [[Workspace_mechanism | workspace mechanism document]]<BR>
: in particular note that the workspace tools allow to explicitly address a specific workspace file system using the <tt>-F</tt> option (e.g. <tt>ws_allocate -F ws14.1 my_workspace 10</tt>)
+
: in particular note that the workspace tools allow to explicitly address a specific workspace file system using the <tt>-F</tt> option (e.g. <tt>ws_allocate -F ws10.1 my_workspace 10</tt>)
 
* to list your available workspace file systems use <tt>ws_list -l</tt>  
 
* to list your available workspace file systems use <tt>ws_list -l</tt>  
 
* users can restore expired workspaces using ''ws_restore''
 
* users can restore expired workspaces using ''ws_restore''
Line 81: Line 82:
 
pcp has a number of useful options; use pcp -h to see a description.
 
pcp has a number of useful options; use pcp -h to see a description.
  
This program traverses a directory tree and copies files in the tree in parallel. It does not copy individual files in parallel. It should be invoked via aprun.
+
This program traverses a directory tree and copies files in the tree in parallel. It does not copy individual files in parallel. It should be invoked via mpirun.
  
 
=== Basic arguments ===
 
=== Basic arguments ===
  
If run with the '''-l''' flag or '''-lf''' flags pcp will be stripe aware.
+
Run '''pcp''' with the '''-p''' option to preserve permissions and timestamps, and ownership.<br>
 
 
'''-l''' will cause stripe information to be copied from the source files and directories.
 
 
 
'''-lf''' will cause all files and directories on the destination to be striped, regardless of the striping on the source.
 
 
 
Striping behavior can be further modified with -ls and -ld.
 
 
 
'''-ls''' will set a minimum file size.
 
Files below this size will not be striped, regardless of the source striping.  
 
  
'''-ld''' will cause all directories to be unstriped.
+
'''-p''' : preserve permissions and timestamps, and ownership
  
 
'''-b C''': Copy files larger than C Mbytes in C Mbyte chunks
 
'''-b C''': Copy files larger than C Mbytes in C Mbyte chunks
Line 102: Line 94:
 
=== Algorithm ===
 
=== Algorithm ===
  
pcp runs in two phases:
+
pcp runs in two phases, respectively in three phases, if the '''-p''' option is used:
 
   
 
   
Phase I is a parallel walk of the file tree, involving all MPI ranks in a peer-to-peer algorithm. The walk constructs the list of files to be copied and creates the destination directory hierarchy.
+
Phase I is a parallel walk of the source file tree, involving all MPI ranks in a peer-to-peer algorithm. The walk constructs the list of files to be copied and creates the destination directory hierarchy.
  
 
In phase II, the actual files are copied. Phase II uses a master-slave algorithm.
 
In phase II, the actual files are copied. Phase II uses a master-slave algorithm.
 
R0 is the master and dispatches file copy instructions to the slaves (R1...Rn).
 
R0 is the master and dispatches file copy instructions to the slaves (R1...Rn).
 +
 +
In phase III the permissions, timestamp and ownership are set as in the source directory (if '''-p''' option is used).
 +
 +
We highly recommend using the '''-p''' option.
  
 
=== Job Script example ===
 
=== Job Script example ===
Line 117: Line 113:
 
Again, pcp does NOT parallelize a single copy operation, but the number of copy operations are distributed over the nodes.  
 
Again, pcp does NOT parallelize a single copy operation, but the number of copy operations are distributed over the nodes.  
  
> cat pcp.qsub
 
 
  #!/bin/bash
 
  #!/bin/bash
  #PBS -N IO_copy_test
+
  #PBS -N parallel-copy
  #PBS -l select=8:mpiprocs=128:node_type=rome
+
  #PBS -l select=2:mpiprocs=128
  #PBS -l walltime=0:30:00
+
  #PBS -l walltime=00:20:00
#PBS -joe
 
 
   
 
   
  cd $PBS_O_WORKDIR
+
  module load pcp/2.0.0-39-ge19b
 
# originally:
 
# module load tools/python/2.7.8
 
# currently:
 
module load python-site/2.7
 
 
   
 
   
 
  SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
 
  SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
 
  TARGETDIR=<YOUR TARGET DIRECTORY HERE>
 
  TARGETDIR=<YOUR TARGET DIRECTORY HERE>
 
   
 
   
  sleep 5  
+
  sleep 5
 
  nodes=$(cat $PBS_NODEFILE | sort -u | wc -l)
 
  nodes=$(cat $PBS_NODEFILE | sort -u | wc -l)
 
  let cores=nodes*128
 
  let cores=nodes*128
 
   
 
   
  /usr/bin/time -p mpirun -n $cores -N128  pcp -l -ls 1048576  -b 4096 $SOURCEDIR $TARGETDIR
+
  time_start=$(date "+%c  :: %s")
  >
+
mpirun -np $cores pcp -p -b 4096 $SOURCEDIR $TARGETDIR
 +
  time_end=$(date "+%c  :: %s")
 +
 +
tt_start=$(echo $time_start | awk {'print $9'})
 +
tt_end=$(echo $time_end | awk {'print $9'})
 +
(( total_time=$tt_end-$tt_start ))
 +
echo "Total runtime in seconds: $total_time"
  
 
Output of a run with the script
 
Output of a run with the script
  
 
  R0: All workers have reported in.
 
  R0: All workers have reported in.
  Starting 192 processes.
+
  Starting 256 processes.
Will copy lustre stripe information.
 
 
  Files larger than 4096 Mbytes will be copied in parallel chunks.
 
  Files larger than 4096 Mbytes will be copied in parallel chunks.
Will not stripe files smaller than 1.00 Mbytes
 
 
   
 
   
 
  Starting phase I: Scanning and copying directory structure...
 
  Starting phase I: Scanning and copying directory structure...
  Phase I done: Scanned 115532 files, 1007 dirs in 00 hrs 00 mins 01 secs (106900 items/sec).
+
  Phase I done: Scanned 168 files, 4 dirs in 00 hrs 00 mins 00 secs (6134 items/sec).
  115532 files will be copied.
+
  168 files will be copied.
 
   
 
   
 
  Starting phase II: Copying files...
 
  Starting phase II: Copying files...
Line 157: Line 150:
 
   
 
   
 
  Copy Statisics:
 
  Copy Statisics:
  Rank 1 copied 7.00 Gbytes in 839 files (38.17 Mbytes/s)
+
  Rank 1 copied 4.00 Gbytes in 1 files (15.85 Mbytes/s)
  Rank 2 copied 6.37 Gbytes in 825 files (34.75 Mbytes/s)
+
  Rank 2 copied 4.00 Gbytes in 1 files (15.69 Mbytes/s)
 +
Rank 3 copied 4.00 Gbytes in 1 files (15.72 Mbytes/s)
 
  ...
 
  ...
  Rank 190 copied 7.84 Gbytes in 495 files (42.78 Mbytes/s)
+
  Rank 253 copied 4.00 Gbytes in 2 files (24.91 Mbytes/s)
  Rank 191 copied 7.25 Gbytes in 784 files (39.63 Mbytes/s)
+
Rank 254 copied 4.00 Gbytes in 2 files (24.94 Mbytes/s)
  Total data copied: 1.47 Tbytes in 115606 files (7.09 Gbytes/s)
+
  Rank 255 copied 4.00 Gbytes in 2 files (7.34 Mbytes/s)
  Total Time for copy: 00 hrs 03 mins 31 secs
+
  Total data copied: 1.05 Tbytes in 433 files (1.81 Gbytes/s)
 +
  Total Time for copy: 00 hrs 09 mins 50 secs
 
  Warnings 0
 
  Warnings 0
  Application 6324961 resources: utime ~4257s, stime ~4005s, Rss ~33000, inblocks ~3148863732, outblocks ~3148201243
+
   
  real 259.90
+
  Starting phase III: Setting directory timestamps...
  user 0.02
+
  Phase III Done. 00 hrs 00 mins 00 secs
  sys 0.01
+
  Total runtime in seconds: 601

Latest revision as of 09:48, 14 May 2021


Warning: This page describe the necessary steps to migrate workspaces to the new filesystems.
July 31nd 2021 all data on the old ws9 filesystems will be deleted. Follow the below guide to transfer your data to the new ws10 filesystems.


User migration to new workspaces

With the installation of HPE Apollo 9000 (Hawk) a new fast workspace filesystem was integrated. For a certain transition period the workspace filesystem of the predecessor system Cray XC40 hazelhen (Sonexion ws9) was integrated at the same time. Now, users have to migrate their workspaces located on the old filesystems onto the new filesystem. Run the command ws_list -a on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to be migrated to the new filesystem.


File System mounted on
ws9.0 /lustre/cray/ws9/0
ws9.1 /lustre/cray/ws9/1
ws9.2 /lustre/cray/ws9/2
ws9.3 /lustre/cray/ws9/3
ws9.4 /lustre/cray/ws9/4
ws9.5 /lustre/cray/ws9/5
ws9.6 /lustre/cray/ws9/6
ws9.6p /lustre/cray/ws9/6

Before you start

Migration for large amount of data consumes a lot of IO ressources. Please review and remove data not needed any more or move it into HPSS.

How to proceed

  • from May 18th 2021 10:00 on new workspaces will be allocated on the replacement filesystem. Existing workspaces will be listed further on.
  • workspaces located on old filesystems can not be extended anymore.
  • if you have to migrate data from workspaces on one of the above listed filesystems, do not use the mv command to transfer data. For large amount of data, this will fail due to time limits. We recommend using parallel copy programm pcp for large amount of data in large files. If this fails for e.g. millions of small files following command may help: rsync -a --hard-links Old_ws/ new_ws/
  • take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.
  • On July 10th 2021 the “old” workspaces ws9.* will be disconnected from the HPE Apollo 9000 compute nodes. The filesystems will be available on the frontend systems for data migration until July 30th 2021.
  • July 31nd 2021 all data on the old filesystems will be deleted.

Operation of the workspaces:

  • No job of any group member will be scheduled for computation as long as the group quota is exceeded.
  • accounting
  • max. lifetime of a workspace is currently 60 days
  • default lifetime of a workspace is 1 day
  • please read related man pages or online workspace mechanism document
in particular note that the workspace tools allow to explicitly address a specific workspace file system using the -F option (e.g. ws_allocate -F ws10.1 my_workspace 10)
  • to list your available workspace file systems use ws_list -l
  • users can restore expired workspaces using ws_restore

Please read https://kb.hlrs.de/platforms/index.php/Storage_usage_policy

Using a parallel copy for data transfer

pcp is a python based parallel copy using MPI. It can only be run on compute nodes via mpirun.

pcp is similar to cp -r; simply give it a source directory and destination and pcp will recursively copy the source directory to the destination in parallel.

pcp has a number of useful options; use pcp -h to see a description.

This program traverses a directory tree and copies files in the tree in parallel. It does not copy individual files in parallel. It should be invoked via mpirun.

Basic arguments

Run pcp with the -p option to preserve permissions and timestamps, and ownership.

-p : preserve permissions and timestamps, and ownership

-b C: Copy files larger than C Mbytes in C Mbyte chunks

Algorithm

pcp runs in two phases, respectively in three phases, if the -p option is used:

Phase I is a parallel walk of the source file tree, involving all MPI ranks in a peer-to-peer algorithm. The walk constructs the list of files to be copied and creates the destination directory hierarchy.

In phase II, the actual files are copied. Phase II uses a master-slave algorithm. R0 is the master and dispatches file copy instructions to the slaves (R1...Rn).

In phase III the permissions, timestamp and ownership are set as in the source directory (if -p option is used).

We highly recommend using the -p option.

Job Script example

Here is an example of a job script.

You have to change the SOURCEDIR and TARGETDIR according to your setup. Also the number of nodes and wallclock time should be adjusted.

Again, pcp does NOT parallelize a single copy operation, but the number of copy operations are distributed over the nodes.

#!/bin/bash
#PBS -N parallel-copy
#PBS -l select=2:mpiprocs=128
#PBS -l walltime=00:20:00

module load pcp/2.0.0-39-ge19b

SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
TARGETDIR=<YOUR TARGET DIRECTORY HERE>

sleep 5
nodes=$(cat $PBS_NODEFILE | sort -u | wc -l)
let cores=nodes*128

time_start=$(date "+%c  :: %s")
mpirun -np $cores pcp -p -b 4096 $SOURCEDIR $TARGETDIR
time_end=$(date "+%c  :: %s")

tt_start=$(echo $time_start | awk {'print $9'})
tt_end=$(echo $time_end | awk {'print $9'})
(( total_time=$tt_end-$tt_start ))
echo "Total runtime in seconds: $total_time"

Output of a run with the script

R0: All workers have reported in.
Starting 256 processes.
Files larger than 4096 Mbytes will be copied in parallel chunks.

Starting phase I: Scanning and copying directory structure...
Phase I done: Scanned 168 files, 4 dirs in 00 hrs 00 mins 00 secs (6134 items/sec).
168 files will be copied.

Starting phase II: Copying files...
Phase II done.

Copy Statisics:
Rank 1 copied 4.00 Gbytes in 1 files (15.85 Mbytes/s)
Rank 2 copied 4.00 Gbytes in 1 files (15.69 Mbytes/s)
Rank 3 copied 4.00 Gbytes in 1 files (15.72 Mbytes/s)
...
Rank 253 copied 4.00 Gbytes in 2 files (24.91 Mbytes/s)
Rank 254 copied 4.00 Gbytes in 2 files (24.94 Mbytes/s)
Rank 255 copied 4.00 Gbytes in 2 files (7.34 Mbytes/s)
Total data copied: 1.05 Tbytes in 433 files (1.81 Gbytes/s)
Total Time for copy: 00 hrs 09 mins 50 secs
Warnings 0

Starting phase III: Setting directory timestamps...
Phase III Done. 00 hrs 00 mins 00 secs
Total runtime in seconds: 601