- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "Workspace migration"

From HLRS Platforms
(Operation of the workspaces:)
 
(90 intermediate revisions by 6 users not shown)
Line 1: Line 1:
  
== User migration on new workspaces ==
 
  
 +
<!-- {{Warning
 +
| text = This page originally describe the necessary steps to migrate workspaces to the new filesystems back in 2017. This page is mostly kept for documentation purposes.<br/>
  
In December 2016 the workspaces installed in 2011 with the Cray Xe6 System Hermit will be replaced. In preparation for this task, users have to migrate their workspaces onto the replacement filesystems. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names matches workspaces in following table, this workspace needs to  
+
Long-term, the most valuable information is the description of the utility ''pcp'', which allows copying directory structures in parallel on Lustre filesystems. Some of the original scripts shown on this page have been modified to account for changes in the HLRS environment. They are up-to-date as of July 2018.
 +
}} -->
 +
 
 +
 
 +
== User migration to new workspaces ==
 +
 
 +
 
 +
In February 2019 the workspaces installed in 2014 with the Cray XC30 System Hornet will be replaced. This measure is necessary in order to prepare the installation of the successor system Hawk. In preparation for this task, users have to migrate their workspaces onto the replacement filesystems. Run the command ''ws_list -a'' on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to  
 
be migrated.
 
be migrated.
  
Line 9: Line 17:
 
|-
 
|-
 
! File System  
 
! File System  
! alias
 
 
! mounted on
 
! mounted on
 
|-
 
|-
| ws1
+
| ws7
| univ_1
+
| /lustre/cray/ws7
| /lustre/cray/ws1
 
|-
 
| ws3
 
| univ_2
 
| /lustre/cray/ws3
 
|-
 
| ws3
 
| univ_2
 
| /lustre/cray/ws3
 
|-
 
| ws4
 
| res_1
 
| /lustre/cray/ws4
 
 
|-
 
|-
| ws5
+
| ws8
| ind_2
+
| /lustre/cray/ws8
| /lustre/cray/ws6
 
 
|-
 
|-
| ws6
+
| ws78
| res_2
+
| /lustre/cray/ws7 or /lustre/cray/ws8
| /lustre/cray/ws6
 
 
|-
 
|-
 
|}
 
|}
  
 +
== Before you start ==
  
== before you start ==
+
Migration for large amount of data consumes a lot of IO ressources. '''Please review and remove data not needed any more or move it into HPSS.'''
  
Migration for large amount of data consumes a lot of IO ressources. '''Please review and remove
+
== How to proceed ==
data which you do not need any more.'''
 
  
== How to proceed ( Version 1) ==
+
* from <Font color=red>February 18th 2019 10:00</Font> on new workspaces will be allocated on the replacement filesystems. Existing workspaces will be listed further on.
 +
* workspaces located on old filesystems can not be extended anymore.
 +
* if you have to migrate data from workspaces on one of the above listed filesystems, do not use the ''mv'' command to transfer data. For large amount of data, this will fail due to time limits. We recommend using [[Workspace_migration#Using_a_parallel_copy_for_data_transfer | parallel copy programm ''pcp'']] for large amount of data in large files. If this fails for e.g. millions of small files following command may help: ''rsync -a --hard-links  Old_ws/  new_ws/''
 +
* take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.
 +
* On <Font color=red>April 29th 2019</Font> the “old” workspaces ws7, ws8, ws78 will be disconnected from the Cray compute nodes. The filesystems will be available on the frontend systems for data migration until May 13th 2019.
 +
* <Font color=red>May 14th 2019</font> all data on the old filesystems will be deleted.
  
* Users have got access to the replacement Workspaces. To find out which one, try following command:
+
== Operation of the workspaces: ==
** ''ws_allocate –F ws7  test_ws 5''  # if this command run successful, you should prepare your Jobs using this workspace and submit all new compute Jobs utilizing this workspace.
 
** If above command fails, following command should work: ''ws_allocate –F ws8 test _ws  5''      #  if not contact your project supervisor.
 
* Run all new submitted Jobs within workspaces in the new location.
 
* Migrate data from the “old” location into the fresh created workspace (please double check this target directory is located in either ws7 or ws8 directory tree).
 
to migrate we suggest following command:
 
''rsync -a Old_ws  new_ws''
 
* On November 7th 2016 the default will be changed. Please ensure your jobs are using the new workspace directory
 
* On December 7th the “old” workspaces ws1, … ws6 will be disconnected from the Cray system. The Filesystems will be available on the frontend systems for data migration until 11th January 2017
 
*January 15th 2017 all data on the old filesystems will be deleted.
 
  
== Operation of the workspaces will be changed: ==
+
* Due to a drop of performance on high usage of quota, no job of any group member will be scheduled for computation as long as the group quota exceeds 80%. All blocked group members get a notice by E-mail (if a valid address is registered)
 
 
* To create a workspace or extend an existent workspace, an interactive shell is necessary.
 
* Due to a drop of performance on high usage of quota, no job of any group member will be scheduled for computation as long as the group quota exceeds 80%. All blocked group members get a notice by E-mail (if a valid address is registered)
 
 
* accounting
 
* accounting
 +
* max. lifetime of a workspace is currently 60 days
 +
* default lifetime of a workspace is 1 day
 +
* please read related man pages or online [[Workspace_mechanism | workspace mechanism document]]<BR>
 +
: in particular note that the workspace tools allow to explicitly address a specific workspace file system using the <tt>-F</tt> option (e.g. <tt>ws_allocate -F ws9 my_workspace 10</tt>)
 +
* to list your available workspace file systems use <tt>ws_list -l</tt>
 +
* users can restore expired workspaces using ''ws_restore''
  
 
Please read https://kb.hlrs.de/platforms/index.php/Storage_usage_policy
 
Please read https://kb.hlrs.de/platforms/index.php/Storage_usage_policy
 +
 +
== Using a parallel copy for data transfer ==
 +
 +
pcp is a python based parallel copy using MPI. It can only be run on compute nodes via aprun.
 +
 +
pcp is similar to cp -r; simply give it a source directory and destination and pcp will recursively copy the source directory to the destination in parallel.
 +
 +
pcp has a number of useful options; use pcp -h to see a description.
 +
 +
This program traverses a directory tree and copies files in the tree in parallel. It does not copy individual files in parallel. It should be invoked via aprun.
 +
 +
=== Basic arguments ===
 +
 +
If run with the '''-l''' flag or '''-lf''' flags pcp will be stripe aware.
 +
 +
'''-l''' will cause stripe information to be copied from the source files and directories.
 +
 +
'''-lf''' will cause all files and directories on the destination to be striped, regardless of the striping on the source.
 +
 +
Striping behavior can be further modified with -ls and -ld.
 +
 +
'''-ls''' will set a minimum file size.
 +
Files below this size will not be striped, regardless of the source striping.
 +
 +
'''-ld''' will cause all directories to be unstriped.
 +
 +
'''-b C''': Copy files larger than C Mbytes in C Mbyte chunks
 +
 +
=== Algorithm ===
 +
 +
pcp runs in two phases:
 +
 +
Phase I is a parallel walk of the file tree, involving all MPI ranks in a peer-to-peer algorithm. The walk constructs the list of files to be copied and creates the destination directory hierarchy.
 +
 +
In phase II, the actual files are copied. Phase II uses a master-slave algorithm.
 +
R0 is the master and dispatches file copy instructions to the slaves (R1...Rn).
 +
 +
=== Job Script example ===
 +
Here is an example of a job script.
 +
 +
You have to change the SOURCEDIR and TARGETDIR according to your setup.
 +
Also the number of nodes and wallclock time should be adjusted.
 +
 +
Again, pcp does NOT parallelize a single copy operation, but the number of copy operations are distributed over the nodes.
 +
 +
> cat pcp.qsub
 +
#!/bin/bash
 +
#PBS -N IO_copy_test
 +
#PBS -l nodes=8
 +
#PBS -l walltime=0:30:00
 +
#PBS -joe
 +
 +
cd $PBS_O_WORKDIR
 +
 +
# originally:
 +
# module load tools/python/2.7.8
 +
# currently:
 +
module load python-site/2.7
 +
 +
SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
 +
TARGETDIR=<YOUR TARGET DIRECTORY HERE>
 +
 +
sleep 5
 +
nodes=$(qstat -f $PBS_JOBID | awk -F: '/Resource_List.nodes/ {print $1 }' | awk -F= '{print $2}')
 +
let cores=nodes*24
 +
 +
/usr/bin/time -p aprun -n $cores -N24 -d1  pcp -l -ls 1048576  -b 4096 $SOURCEDIR $TARGETDIR
 +
>
 +
 +
Output of a run with the script
 +
 +
R0: All workers have reported in.
 +
Starting 192 processes.
 +
Will copy lustre stripe information.
 +
Files larger than 4096 Mbytes will be copied in parallel chunks.
 +
Will not stripe files smaller than 1.00 Mbytes
 +
 +
Starting phase I: Scanning and copying directory structure...
 +
Phase I done: Scanned 115532 files, 1007 dirs in 00 hrs 00 mins 01 secs (106900 items/sec).
 +
115532 files will be copied.
 +
 +
Starting phase II: Copying files...
 +
Phase II done.
 +
 +
Copy Statisics:
 +
Rank 1 copied 7.00 Gbytes in 839 files (38.17 Mbytes/s)
 +
Rank 2 copied 6.37 Gbytes in 825 files (34.75 Mbytes/s)
 +
...
 +
Rank 190 copied 7.84 Gbytes in 495 files (42.78 Mbytes/s)
 +
Rank 191 copied 7.25 Gbytes in 784 files (39.63 Mbytes/s)
 +
Total data copied: 1.47 Tbytes in 115606 files (7.09 Gbytes/s)
 +
Total Time for copy: 00 hrs 03 mins 31 secs
 +
Warnings 0
 +
Application 6324961 resources: utime ~4257s, stime ~4005s, Rss ~33000, inblocks ~3148863732, outblocks ~3148201243
 +
real 259.90
 +
user 0.02
 +
sys 0.01

Latest revision as of 17:13, 7 February 2019



User migration to new workspaces

In February 2019 the workspaces installed in 2014 with the Cray XC30 System Hornet will be replaced. This measure is necessary in order to prepare the installation of the successor system Hawk. In preparation for this task, users have to migrate their workspaces onto the replacement filesystems. Run the command ws_list -a on a frontend system to display the path for all your workspaces, if path names match mount points in the following table, these workspaces need to be migrated.

File System mounted on
ws7 /lustre/cray/ws7
ws8 /lustre/cray/ws8
ws78 /lustre/cray/ws7 or /lustre/cray/ws8

Before you start

Migration for large amount of data consumes a lot of IO ressources. Please review and remove data not needed any more or move it into HPSS.

How to proceed

  • from February 18th 2019 10:00 on new workspaces will be allocated on the replacement filesystems. Existing workspaces will be listed further on.
  • workspaces located on old filesystems can not be extended anymore.
  • if you have to migrate data from workspaces on one of the above listed filesystems, do not use the mv command to transfer data. For large amount of data, this will fail due to time limits. We recommend using parallel copy programm pcp for large amount of data in large files. If this fails for e.g. millions of small files following command may help: rsync -a --hard-links Old_ws/ new_ws/
  • take care when you create new batch jobs. If you have to migrate your workspace from an old filesystem to the new location, this takes time. Do not run any job while the migration process is active. This may result in inconsistent data.
  • On April 29th 2019 the “old” workspaces ws7, ws8, ws78 will be disconnected from the Cray compute nodes. The filesystems will be available on the frontend systems for data migration until May 13th 2019.
  • May 14th 2019 all data on the old filesystems will be deleted.

Operation of the workspaces:

  • Due to a drop of performance on high usage of quota, no job of any group member will be scheduled for computation as long as the group quota exceeds 80%. All blocked group members get a notice by E-mail (if a valid address is registered)
  • accounting
  • max. lifetime of a workspace is currently 60 days
  • default lifetime of a workspace is 1 day
  • please read related man pages or online workspace mechanism document
in particular note that the workspace tools allow to explicitly address a specific workspace file system using the -F option (e.g. ws_allocate -F ws9 my_workspace 10)
  • to list your available workspace file systems use ws_list -l
  • users can restore expired workspaces using ws_restore

Please read https://kb.hlrs.de/platforms/index.php/Storage_usage_policy

Using a parallel copy for data transfer

pcp is a python based parallel copy using MPI. It can only be run on compute nodes via aprun.

pcp is similar to cp -r; simply give it a source directory and destination and pcp will recursively copy the source directory to the destination in parallel.

pcp has a number of useful options; use pcp -h to see a description.

This program traverses a directory tree and copies files in the tree in parallel. It does not copy individual files in parallel. It should be invoked via aprun.

Basic arguments

If run with the -l flag or -lf flags pcp will be stripe aware.

-l will cause stripe information to be copied from the source files and directories.

-lf will cause all files and directories on the destination to be striped, regardless of the striping on the source.

Striping behavior can be further modified with -ls and -ld.

-ls will set a minimum file size. Files below this size will not be striped, regardless of the source striping.

-ld will cause all directories to be unstriped.

-b C: Copy files larger than C Mbytes in C Mbyte chunks

Algorithm

pcp runs in two phases:

Phase I is a parallel walk of the file tree, involving all MPI ranks in a peer-to-peer algorithm. The walk constructs the list of files to be copied and creates the destination directory hierarchy.

In phase II, the actual files are copied. Phase II uses a master-slave algorithm. R0 is the master and dispatches file copy instructions to the slaves (R1...Rn).

Job Script example

Here is an example of a job script.

You have to change the SOURCEDIR and TARGETDIR according to your setup. Also the number of nodes and wallclock time should be adjusted.

Again, pcp does NOT parallelize a single copy operation, but the number of copy operations are distributed over the nodes.

> cat pcp.qsub
#!/bin/bash
#PBS -N IO_copy_test
#PBS -l nodes=8
#PBS -l walltime=0:30:00
#PBS -joe

cd $PBS_O_WORKDIR

# originally:
# module load tools/python/2.7.8
# currently:
module load python-site/2.7

SOURCEDIR=<YOUR SOURCE DIRECTORY HERE>
TARGETDIR=<YOUR TARGET DIRECTORY HERE>

sleep 5 
nodes=$(qstat -f $PBS_JOBID | awk -F: '/Resource_List.nodes/ {print $1 }' | awk -F= '{print $2}')
let cores=nodes*24

/usr/bin/time -p aprun -n $cores -N24 -d1  pcp -l -ls 1048576  -b 4096 $SOURCEDIR $TARGETDIR
>

Output of a run with the script

R0: All workers have reported in.
Starting 192 processes.
Will copy lustre stripe information.
Files larger than 4096 Mbytes will be copied in parallel chunks.
Will not stripe files smaller than 1.00 Mbytes 

Starting phase I: Scanning and copying directory structure...
Phase I done: Scanned 115532 files, 1007 dirs in 00 hrs 00 mins 01 secs (106900 items/sec).
115532 files will be copied.

Starting phase II: Copying files...
Phase II done.

Copy Statisics:
Rank 1 copied 7.00 Gbytes in 839 files (38.17 Mbytes/s)
Rank 2 copied 6.37 Gbytes in 825 files (34.75 Mbytes/s)
...
Rank 190 copied 7.84 Gbytes in 495 files (42.78 Mbytes/s)
Rank 191 copied 7.25 Gbytes in 784 files (39.63 Mbytes/s)
Total data copied: 1.47 Tbytes in 115606 files (7.09 Gbytes/s)
Total Time for copy: 00 hrs 03 mins 31 secs
Warnings 0
Application 6324961 resources: utime ~4257s, stime ~4005s, Rss ~33000, inblocks ~3148863732, outblocks ~3148201243
real 259.90
user 0.02
sys 0.01