- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Data Transfer with GridFTP: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
No edit summary
Line 90: Line 90:
* -p specifies the number of the parallel streams. The optimal value depends among other things on the round trip time between the source and destination sites. In general, ~ 4 MB should be okay for fast connections with more than 1GE.  
* -p specifies the number of the parallel streams. The optimal value depends among other things on the round trip time between the source and destination sites. In general, ~ 4 MB should be okay for fast connections with more than 1GE.  
* -tcp-bs specifies the size (in bytes) of the  TCP buffer to be used by the underlying ftp data channels. Please note that while higher values yield better performance, many parallel streams (high p) together with large buffer sizes could drive the systems out of memory.
* -tcp-bs specifies the size (in bytes) of the  TCP buffer to be used by the underlying ftp data channels. Please note that while higher values yield better performance, many parallel streams (high p) together with large buffer sizes could drive the systems out of memory.
== Support ==
Björn Schembera [mailto:schembera@hlrs.de schembera@hlrs.de]

Revision as of 09:50, 19 April 2012

This page describes how to set up a Globus GridFTP client on your linux system to perform high data rate transfers to retrieve large amounts of data from HPC systems. At the moment (April 2012) GridFTP is set up ready for HERMIT. For Laki, it will be available by June 2012.

Introduction

For transfering large amounts of data, simple FTP protocol can not utilize high bandwidth channels. For this task, an extension has been definied: GridFTP supports parallel TCP streams and multi-node transfers to achieve a high data rate via high bandwidth connections. Furthermore, transfers can be restarted and third-party transfers can be established. This means one can initiate transfers between two end hosts that are mediated by a third party.

GridFTP has a typical client/server architecture, where the server stores the data or has access to the data. A simple GridFTP client - globus-url-copy - is provided by the Globus Toolkit.

Requirements

Installation & Configuration

  • Since the version 5.2, the GridFTP client is also available packaged as rpm- or deb-package. Install the GridFTP client by following the instructions on this page Be sure to have "globus-proxy-utils" as well (If it the client ist compiled from source, this is included).
  • Create a directory .globus/ in your homedir and place both the certificate and your keyfile into this directory
  • In the above directory, create another directory certificates/ and place all the CA files there. These files can be found at e.g. here as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
  • run
 grid-proxy-init 

This tool verifies the validity of your certificate and creates a proxy, that is internally needed by the GridFTP client. This step has to be repeated before the usage. If something like

Your identity: <YourDNhere>
Creating proxy ............................................... Done
Your proxy is valid until: Wed Apr 18 22:25:32 2012

shows up, everything is installed correctly.

Usage

See http://www.globus.org/toolkit/docs/5.0/5.0.0/data/gridftp/pi/#globus-url-copy for details of the globus-url-copy tool

The basic syntax is:

globus-url-copy [optional command line switches] source destination

where source and destination can be further resolved to

globus-url-copy [optional command line switches] [gsiftp://<server adress>:<port> | file://]<absolute path> [gsiftp://<server adress>:<port> | file://]<absolute path>

Files on remote systems are referenced by gsiftp:// whereas local files a referenced by file://. Be sure always to reference the absolute paths.

To access files on HERMIT, the informations are:

  • server adress: gridftp-fr1.hww.de
  • port: 2812

To access files on LAKI, the informations are (GridFTP on LAKI will be available in June 2012):

  • server adress: gridftp-fr2.hww.de
  • port: 2812


For the referenced directories, you have to specify the absolute path to your workspace. If you are logged into HERMIT, you can find out about your workspace with the command

ws_list

that lists all your available workspaces. Your workspace will reside in a directory like

/univ_1/ws1/ws/<username-name>

Suppose your workspace directory is

/univ_1/ws1/ws/foo-test-0

and you want to copy files from this workspace to the home directory of the machine you are currently logged in, perform these commands:

globus-proxy init

globus-url-copy -tcp-bs 4000000 -p 8 gsiftp://gridftp-fr1.hww.de:2812/univ_1/ws1/ws/foo-test-0/file  file:///home/foo/file

where the parameter

  • -p specifies the number of the parallel streams. The optimal value depends among other things on the round trip time between the source and destination sites. In general, ~ 4 MB should be okay for fast connections with more than 1GE.
  • -tcp-bs specifies the size (in bytes) of the TCP buffer to be used by the underlying ftp data channels. Please note that while higher values yield better performance, many parallel streams (high p) together with large buffer sizes could drive the systems out of memory.


Support

Björn Schembera schembera@hlrs.de