- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Data Transfer with GridFTP: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
Line 30: Line 30:


* In the above directory, create another directory ''certificates/'' and place all the CA files there. These files can be found at e.g. [https://winnetou.surfsara.nl/prace/certs/globuscerts.tar.gz here] as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
* In the above directory, create another directory ''certificates/'' and place all the CA files there. These files can be found at e.g. [https://winnetou.surfsara.nl/prace/certs/globuscerts.tar.gz here] as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
* Load the module ''tools/globus-gridftp-client'' on the Hazelhen frontend node you are currently logged in.
<pre>
module load tools/globus-gridftp-client
  **** Globus GridFTP Client ****
  Initialisation of the Globus GridFTP client for fast data transfer.
  Usage
  When the module is loaded, you can use the Globus GridFTP client with the command
      globus-url-copy
  For data transfer, simply issue the command
      globus-url-copy <optional command line switches> <gsiftp://<server adress>:<port> | file://><absolute path> <gsiftp://<server adress>:<port> | file://><absolute path>
  With the parameter -p <p> you can specify the degree of parallelism of your transfer
  Before data transfer, please issue the command
      grid-proxy-init
  to get a proxy certificate
  For more information on the GridFTP client and HPSS see:
  https://kb.hlrs.de/platforms/index.php/Data_Transfer_with_GridFTP
</pre>


= Installing the GridFTP client at your home institution =
= Installing the GridFTP client at your home institution =

Revision as of 11:53, 5 May 2017

Introduction

For transfering large amounts of data, simple FTP protocol can not utilize high bandwidth channels. For this task, an extension has been definied: GridFTP supports parallel TCP streams and multi-node transfers to achieve a high data rate via high bandwidth connections. Furthermore, transfers can be restarted and third-party transfers can be established. This means one can initiate transfers between two end hosts that are mediated by a third party.

GridFTP has a typical client/server architecture, where the server stores the data or has access to the data. A simple GridFTP client - globus-url-copy - is provided by the Globus Toolkit.

At HLRS, a dedicated GridFTP server is running which has access to the according filesystems. This server can not be accessed directly but has to be controled by a GridFTP client, this means it has to be used as a third-party transfer.

There are two places from where you can conduct third-party transfers by the GridFTP client: Either you use the GridFTP client which is pre-installed at our Hazelhen frontend nodes or you install the GridFTP client somewhere else outside the HLRS network, for example at your home institution.

Requirements

openssl x509 -subject -in <USERCERT> -noout | sed -e 's/subject= //'
  • A Linux System with the GridFTP client installed (which is installed on the HLRS frontend nodes)

Pre-installed GridFTP client on the Hazelhen frontend nodes

  • Create a directory .globus/ in your homedir and place both the certificate and your keyfile into this directory
  • In the above directory, create another directory certificates/ and place all the CA files there. These files can be found at e.g. here as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
  • Load the module tools/globus-gridftp-client on the Hazelhen frontend node you are currently logged in.
module load tools/globus-gridftp-client

   **** Globus GridFTP Client ****

   Initialisation of the Globus GridFTP client for fast data transfer.

   Usage

   When the module is loaded, you can use the Globus GridFTP client with the command 
      globus-url-copy
   For data transfer, simply issue the command 
      globus-url-copy <optional command line switches> <gsiftp://<server adress>:<port> | file://><absolute path> <gsiftp://<server adress>:<port> | file://><absolute path>
   With the parameter -p <p> you can specify the degree of parallelism of your transfer
 
   Before data transfer, please issue the command
      grid-proxy-init
   to get a proxy certificate

   For more information on the GridFTP client and HPSS see:

   https://kb.hlrs.de/platforms/index.php/Data_Transfer_with_GridFTP

Installing the GridFTP client at your home institution

Installation & Configuration

  • Since the version 5.2, the GridFTP client is also available packaged as rpm- or deb-package. Install the GridFTP client by following the instructions on this page Be sure to have "globus-proxy-utils" as well (If it the client ist compiled from source, this is included).
  • Create a directory .globus/ in your homedir and place both the certificate and your keyfile into this directory
  • In the above directory, create another directory certificates/ and place all the CA files there. These files can be found at e.g. here as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
  • run
 grid-proxy-init 

This tool verifies the validity of your certificate and creates a proxy, that is internally needed by the GridFTP client. This step has to be repeated before the usage. If something like

Your identity: <YourDNhere>
Creating proxy ............................................... Done
Your proxy is valid until: Wed Apr 18 22:25:32 2012

shows up, everything is installed correctly.


Firewall issues

Because there is a distinction between control and data connection, some ports of the firewall on the client side have to be opened:

  • Port 2812 for the control channel to the frontend node gridftp-fr1.hww.de (HAZELHEN) or gridftp-fr2.hww.de (LAKI)
  • Ports 20000-20500 for data channels to the backend node (Hostnames on request. These ports have to be opened for both incoming and outgoing connections.

Moreover, you have to set the following environment variables to instruct your client to use the specified ports:

export GLOBUS_TCP_PORT_RANGE=20000,20500
export GLOBUS_TCP_SOURCE_RANGE=20000,20500

Please refer to this document for further information: http://www.globus.org/toolkit/docs/latest-stable/gridftp/user/#gridftp-user-config-client-firewall

Usage

First, run

 grid-proxy-init 

This tool verifies the validity of your certificate and creates a proxy, that is internally needed by the GridFTP client. This step has to be repeated before the usage. If something like

Your identity: <YourDNhere>
Creating proxy ............................................... Done
Your proxy is valid until: Wed Apr 18 22:25:32 2012

shows up, a proxy certificate has been set up properly.

Then, transfers can be started.

See http://toolkit.globus.org/toolkit/docs/6.0/appendices/commands/index.html#globus-url-copy for details of the globus-url-copy tool

The basic syntax is:

globus-url-copy [optional command line switches] source destination

where source and destination can be further resolved to

globus-url-copy [optional command line switches] [gsiftp://<server adress>:<port> | file://]<absolute path> [gsiftp://<server adress>:<port> | file://]<absolute path>

Files on remote systems are referenced by gsiftp:// whereas local files a referenced by file://. Be sure always to reference the absolute paths.

To access files on HAZELHEN, the informations are:

  • server adress: gridftp-fr1.hww.de
  • port: 2812

To access files on LAKI, the informations are:

  • server adress: gridftp-fr2.hww.de
  • port: 2812


For the referenced directories, you have to specify the absolute path to your workspace. If you are logged into HAZELHEN, you can find out about your workspace with the command

ws_list

that lists all your available workspaces. Your workspace will reside in a directory like

/univ_1/ws1/ws/<username-name>

Suppose your workspace directory is

/univ_1/ws1/ws/foo-test-0

and you want to copy files from this workspace to the home directory of the machine you are currently logged in, perform these commands:

grid-proxy-init

globus-url-copy -tcp-bs 4000000 -p 8 gsiftp://gridftp-fr1.hww.de:2812/univ_1/ws1/ws/foo-test-0/file  file:///home/foo/file

If you want to copy files from this workspace to the another machine running a GridFTP server as well, say in the PRACE network, perform these commands:

grid-proxy-init

globus-url-copy -tcp-bs 4000000 -p 8 gsiftp://gridftp-fr1.hww.de:2812/univ_1/ws1/ws/foo-test-0/file  gsiftp://juqueen1p.fz-juelich.de:2812/~

It may be neccessary to play around with the parameters a little bit to achieve optimal performance

Some important parameters

Parameter Description
-help Prints out a detailled list of parameters and their description
-vb Verbose mode, show more information: number of bytes transferred, performance since the last update (every 5 seconds) and average performance for the whole transfer
-dbg Debug mode, gives detailed information for debugging
-p Specifies the number of the parallel streams.
-tcp-bs Specifies the size (in bytes) of the TCP buffer to be used by the underlying ftp data channels. Please note that while higher values yield better performance, many parallel streams (high p) together with large buffer sizes could drive the systems out of memory.

Further Information

Support

Björn Schembera schembera@hlrs.de