- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Data Transfer with GridFTP: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
(Created page with "This page describes how to set up a Globus GridFTP client on your linux system to perform high data rate transfers. == Introduction == For transfering large amounts of data...")
 
(41 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This page describes how to set up a Globus GridFTP client on your linux
== Introduction ==
system to perform high data rate transfers.
 
For transferring large amounts of data, the simple FTP protocol can not fully exploit high bandwidth connections (especially when they have high latencies, like intra- or international Wide Area Networks (WANs)). For this task, an extension has been definied: GridFTP. It supports parallel TCP streams and multi-node transfers (also known as ''Striping'') to achieve a high data rate on high bandwidth connections (even with high latencies). Furthermore, transfers can be restarted and third-party transfers can be established, which means users can initiate transfers between two GridFTP servers that are controlled by a third party (i.e. the user).
 
GridFTP has a typical client/server architecture, where the server stores the data or has access to the data and where the client downloads/uploads data or controls a server to server transfer in a third-party transfer as described above. The Globus Toolkit includes a simple GridFTP client - <code>globus-url-copy</code> - which is described in more detail below. On top of that there exists <code>gtransfer</code> a more user-friendly tool with additional features which is also described in more detail below.
 
At HLRS, dedicated GridFTP servers are available for use which have access to the high-performance file systems of the Hazelhen and Laki supercomputers at HLRS. These servers can be used with a GridFTP client. Usually these GridFTP servers are used in third-party transfers, where users download/upload data from/to another GridFTP server e.g. at their home institution. There are two ways to conduct third-party transfers with our GridFTP servers: Either you use the pre-installed GridFTP clients on our Hazelhen frontend nodes or you install GridFTP clients somewhere else outside the HLRS network, for example at your home institution.
 
 
== Prerequirements for using our GridFTP servers ==
 
* '''A personal X509 certificate.''' For accessing our GridFTP servers and performing your data transfers with GridFTP you need a GSI proxy credential (GPC) signed by your personal X.509 certificate. Please see [http://toolkit.globus.org/toolkit/docs/6.0/gsic/key/index.html "Key concepts of GSI security"] for more information about GSI proxy certificates. This means that you first need a personal X.509 certificate signed by your organization or institute. In addition the source and destination GridFTP services must be able to verify your GPC to enable the data transfer. By default a GPC derived from a personal X.509 certificate issued by one of the grid certificate authorities (CAs) that are member of the [https://www.igtf.net/ IGTF] or their affiliated registration authorities (RAs) is required for data transfers. Please contact your IT department on how to acquire such a personal X.509 certificate.
 
* '''The distinguished name (DN) of your X.509 certificate.''' After receiving your personal X.509 certificate you need to forward the certificate's DN to the HLRS personnel in order to activate access to our GridFTP servers. To determine the DN you can use the following openssl command on your personal X.509 certificate:
 
<pre>
$ openssl x509 -noout -subject -in <YOUR_PERSONAL_X509_CERTIFICATE_FILE>
</pre>
 
* '''A Linux System with a GridFTP client installed''' (e.g. one of the Hazelhen frontend nodes)
 
 
=== Further information on X.509 certificates ===
 
* http://en.wikipedia.org/wiki/X.509
* http://www.eugridpma.org/members/worldmap/
* http://www.prace-project.eu/Certificates-FAQ?lang=en
 
 
== Pre-installed GridFTP client on the Hazelhen frontend nodes ==
 
* Create a GSI proxy credential (GPC) locally at your workstation with either <code>grid-proxy-init</code> (requires installation of Globus packages or manual compilation and installation of the Globus Toolkit, see below) or [https://github.com/HLRS/genproxy <code>genproxy</code>] (just requires the Bash shell and OpenSSL). Afterwards copy the resulting GPC (usually named "x509up_u<UID>") to your home directory at HLRS with scp and configure the environment variable <code>X509_USER_PROXY</code> with the path to your GPC (<code>$</code> denotes a user prompt, user and host names are symbolic!):
<pre>
user@local:~$ genproxy
Your identity: /C=DE/O=GridGermany/OU=Universitaet Stuttgart/OU=[..]/CN=[...]
Enter pass phrase for /home/user/.globus/userkey.pem:
Your proxy `/tmp/x509up_p13706.fileQNqstU.1' is valid until: Fri May 19 11:16:36 CEST 2017
 
user@local:~$ scp /tmp/x509up_p13706.fileQNqstU.1 user@hazelhen.hww.de:X509_USER_PROXY
 
user@local:~$ ssh user@hazelhen.hww.de
 
user@hazelhen:~$ export X509_USER_PROXY="$HOME/X509_USER_PROXY"
</pre>
 
* To use <code>gtransfer</code>, load the <code>tools/gtransfer</code> module (which automatically loads all pre-required modules) on the Hazelhen frontend node you are currently logged in (<code>$</code> denotes a user prompt, user and host names are symbolic!):
<pre>
user@hazelhen:~$ module load tools/gtransfer
load globus-gridftp-client gt-6.0.1478289945 (PATH, MANPATH, GLOBUS_LOCATION, GLOBUS_TCP_PORT_RANGE, GLOBUS_TCP_SOURCE_RANGE, X509_CERT_DIR, LD_LIBRARY_PATH)
 
To make use of the Globus GridFTP client (GGC) you need a GSI proxy credential (GPC)
that authenticates you against the involved GridFTP servers.
 
Create your GPC at your local workstation and copy it to this system (e.g. via scp).
Then make it known to the Globus tools with ($ is the prompt and not part of the
command!):
 
```
$ export X509_USER_PROXY="/path/to/gpc"
```
 
Although you can use the GGC alone to transfer files via GridFTP, we strongly
recommend to use gtransfer - a more advanced GridFTP client on top of GGC, tgftp and
uberftp - instead. To use it, simply load its modulefile with:
 
```
$ module load tools/gtransfer
```
 
load tgftp 0.7.0 (PATH, MANPATH)
In addition to the manual pages (man {tgftp|tgftp_log}), there is also a longer README file available (less /sw/hazelhen/hlrs/tools/tgftp/0.7.0/share/doc/README).
load gtransfer 0.8.1 (PATH, MANPATH)
Bash completion loaded: press the TAB key for completion.
In addition to the manual pages (man {gtransfer|gt|dparam|dpath|halias|gcat|gls|gmkdir|gmv|grm}), there is also a longer README file available (less /sw/hazelhen/hlrs/tools/gtransfer/0.8.1/README.md).
</pre>
 
* To use <code>globus-url-copy</code> alone, load the module <code>tools/globus-gridftp-client</code> on the Hazelhen frontend node you are currently logged in (<code>$</code> denotes a user prompt, user and host names are symbolic!):
<pre>
user@hazelhen:~$ module load tools/globus-gridftp-client
load globus-gridftp-client gt-6.0.1478289945 (PATH, MANPATH, GLOBUS_LOCATION, GLOBUS_TCP_PORT_RANGE, GLOBUS_TCP_SOURCE_RANGE, X509_CERT_DIR, LD_LIBRARY_PATH)
 
To make use of the Globus GridFTP client (GGC) you need a GSI proxy credential (GPC)
that authenticates you against the involved GridFTP servers.
 
Create your GPC at your local workstation and copy it to this system (e.g. via scp).
Then make it known to the Globus tools with ($ is the prompt and not part of the
command!):
 
```
$ export X509_USER_PROXY="/path/to/gpc"
```
 
Although you can use the GGC alone to transfer files via GridFTP, we strongly
recommend to use gtransfer - a more advanced GridFTP client on top of GGC, tgftp and
uberftp - instead. To use it, simply load its modulefile with:
 
```
$ module load tools/gtransfer
```
</pre>
 
== Installing the GridFTP client at your home institution ==
 
* Since version 5.2 of the Globus Toolkit, the GridFTP client is also available as pre-compiled RPM (for '''Red Hat Enterprise Linux 6 and 7''', '''CentOS 6 and 7''', '''Scientific Linux 6 and 7''' and possibly others) or DEB (for '''Debian GNU/Linux 7, 8 and 9''' and '''Ubuntu Linux 14.04 LTS, 16.04 LTS, 16.10 and 17.04''') package. Install the GridFTP client - if a pre-compiled package is available it's usually named <code>globus-gass-copy-progs</code>, <code>make grdiftp</code> will include it for source installs - by following the instructions in the [http://toolkit.globus.org/toolkit/docs/6.0/admin/install/index.html Globus Tookit 6.0 documentation]. Be sure to also install the <code>grid-proxy-init</code> tool - included in the <code>globus-proxy-utils</code> package or in an installation from source with <code>make gridftp</code> - or just use the <code>genproxy</code> tool mentioned above. Only one of these tools is required for the creation of GSI proxy credentials.
 
* Create a directory <code>.globus</code> in your home directory and place both your personal X.509 certificate (as <code>usercert.pem</code>) and your private key file (as <code>userkey.pem</code>) there. To create these files from a PKCS#12 keystore follow these [[X.509-SSH#CONFIGURATION_2|instructions]] but use the names from above for the destination files. When using <code>grid-proxy-init</code> to create a GSI proxy credential, you can also place a PKCS#12 keystore (as <code>usercred.p12</code>) there - the Firefox web browser for example exports user certificates and keys as PKCS#12 keystore.
 
* Additionally create another directory named <code>certificates</code> in <code>.globus</code> and place all the trusted CA certificates there. A collection suitable for use with the Globus Toolkit is provided by SURFsara as a [https://winnetou.surfsara.nl/prace/certs/globuscerts.tar.gz tarball] - download and untar it into the above directory. The included files are needed to authenticate remote entities (i.e. GridFTP servers).
 
* Run <code>grid-proxy-init</code> or <code>genproxy</code> to verify the validity of your personal X.509 certificate and to create a GSI proxy credential signed by your personal X.509 certificate with a default lifetime of 12 hours (for <code>grid-proxy-init</code>) and 24 hours (for <code>genproxy</code>). This step has to be repeated after the created GSI proxy credential has expired.
 
== Usage ==
 
 
=== Workspaces ===
 
The paths to your workspaces are identical on supercomputers and GridFTP servers. To get the path of a specific workspace, first login to the respective supercomputer frontend(s), then determine the workspace name of the workspace you want to use and then enter <code>ws_find <WORKSPACE_NAME></code> to get the actual path to this specific workspace. More information about workspaces at HLRS can be found in the [https://kb.hlrs.de/platforms/index.php/Workspace_mechanism platforms wiki].
 
 
=== gtransfer (gt) ===
 
* Type <code>gt</code> and hit the ENTER/RETURN key to get a brief usage message. Use <code>gt --help</code> and <code>man gt</code> to get a description of all gt options.
* To start a transfer, enter <code>gt</code>, hit the SPACE key and then hit the TAB key three times to make use of the gt bash completion. You'll get a listing of all available options. Start with <code>-s</code> to enter the source address. The <code>-</code> character was already provided by the gt bash completion. After entering <code>s</code> hit the SPACE key and enter your source address, e.g. <code>gsiftp://gridftp.domain.tld:2811</code>. You can also hit the TAB key two times to get the preconfigured GridFTP source server addresses or [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/host-aliases.md host aliases]. Add the path to your desired workspace just like on the supercomputer frontends (e.g. <code>/lustre/cray/ws8/ws/user-workspace/</code>) and then hit the TAB key two to three times to get a listing of the files and directories in your workspace directory on the remote server. Depending on the latency and the number of files present there, it can take a few seconds until you see results and this will only work if your GSI proxy certificate is considered valid by the remote GridFTP server and you are trying to list a directory where you have <code>rx</code> (read and execute) permissions. Type in the beginning of your desired file or directory and hit the TAB key to complete the name. If you want to copy all files in a directory, add <code>/*</code> or just <code>/</code> to the end of the path. Now continue with the destination address. Add <code>-d</code> to the command line, hit the SPACE key and continue with the destination address just like you entered the source address. Enter a <code>/</code> at the end of the destination path.
* To recursively copy all files and directories below a given directory, add the <code>-r</code> option to the gt command line.
 
Example:
<pre>
$ gt <TAB>
 
$ gt -
 
$ gt -<TAB><TAB>
 
$ gt -
--                      --configfile            --gt-max-retries        -m                      -s                      --verbose
-a                      -d                      --gt-progress-indicator  --metric                --source                --version
--auto-clean            --destination            --guc-max-retries        --no-sync                --sync-level           
--auto-optimize          -e                      --help                  -o                      --transfer-list         
-c                      --encrypt-data-channel  -l                      -r                      -v                     
--checksum-data-channel  -f                      --logfile                --recursive              -V
 
$ gt -s <TAB><TAB>
 
$ gt -s
hazelhen:  laki:
 
$ gt -s h<TAB>
 
$ gt -s hazelhen:
 
$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/<TAB>
 
$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file
 
$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file<TAB><TAB>
 
$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file
hazelhen:/lustre/cray/ws8/ws/user-workspace/file1  hazelhen:/lustre/cray/ws8/ws/user-workspace/file2  hazelhen:/lustre/cray/ws8/ws/user-workspace/file3
 
$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file* -d gsiftp://gridftp.domain.tld:2811/~/
</pre>
 
==== Hints ====
 
 
===== I have multiple user accounts at a remote GridFTP server. How can I choose a specific account? =====
 
This can be done by inserting a <code><USER>@</code> portion into your GridFTP URLs or prefixing host aliases with <code><USER>@</code>. Replace <code><USER></code> with your desired username on the remote site.
 
Examples:
 
* GridFTP URL:
<code>gsiftp://gridftp.domain.tld:2811/[...]/files/*</code> => <code>gsiftp://user1@gridftp.domain.tld:2811/[...]/files/*</code>
* Host alias: 
<code>my-gridftp:/[...]/files/</code> => <code>user1@my-gridftp:/[...]/files/</code>


== Introduction ==


For transfering large amounts of data, simple FTP protocol can not
===== Can gtransfer automatically create non-existing directories on the destination side? =====
utilize high bandwidth channels. For this task, an extension has been
definied: GridFTP supports parallel TCP streams and multi-node transfers
to achieve a high data rate via high bandwidth connections.
Furthermore, transfers can be restarted and third-party transfers can be
established. This means one can initiate transfers between two end hosts
that are mediated by a third party.


GridFTP has a typical client/server architecture, where the server stores
Yes, this is possible and activated by default. Just enter the desired name or path in your destination URL and gtransfer will automatically create non-existing directories on the destination side (with the help of [http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/user/#globus-url-copy globus-url-copy]).
the data or has access to the data. A simple GridFTP client - globus-url-copy - is provided by the Globus Toolkit.  


=== Requirements ===


* A personal X509 certificate. (Further information here: https://kb.hlrs.de/platforms/index.php/X.509-SSH )
===== Use host aliases for your GridFTP servers =====
* A Linux System


=== Installation & Configuration ===
There are already two host aliases defined which point to the two GridFTP servers at HLRS:


* Since the version 5.2, the GridFTP client is also available packaged as rpm- or deb-package. Install the GridFTP client by following the instructions on [http://www.globus.org/toolkit/docs/5.2/5.2.0/admin/install/#q-bininst this page] Be sure to have "globus-proxy-utils" as well (If it the client ist compiled from source, this is included). 
* <code>hazelhen:</code>
* <code>laki:</code>


* Create a directory ''.globus/'' in your homedir and place both the certificate and your keyfile into this directory
You can use them instead of the longer host part of a GridFTP URL in the source and destination URLs, e.g. you can use:


* In the above directory, create another directory ''certificates/'' and place all the CA files there. These files can be found at e.g. [http://winnetou.sara.nl/deisa/certs/globuscerts.tar.gz here] as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
* <code>hazelhen:/lustre/cray/ws8/ws/user-workspace</code> instead of
* <code>gsiftp://gridftp-fr1.hww.de:2812/lustre/cray/ws8/ws/user-workspace</code>


* run
To create your own host aliases, please refer to the host aliases documentation linked below.


<pre> grid-proxy-init </pre>
===== What if the gtransfer command fails during a data transfer? =====
   
   
This tool verifies the validity of your certificate and creates a proxy, that is internally needed
[http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/user/#globus-url-copy Globus-url-copy] - the tool gtransfer actually uses through [https://github.com/fr4nk5ch31n3r/tgftp tgftp] to transfer data - is configured by gtransfer to retry the transfer of files that failed to transfer successfully to the destination GridFTP server. And if that fails, gtransfer will retry the whole process three times until giving up on the transfer. And even if that happens, you can later continue a failed or interrupted transfer by simply issuing the very same gtransfer command. Gtransfer stores state information about a transfer in your home directory below <code>.gtransfer</code>. So this mechanism will work in the same home directory and with the same user account and as long as the state files are not touched in between.
  by the GridFTP client. This step has to be repeated before the usage.  
 
===== What if I need to interrupt a data transfer? =====
   
You can always interrupt a gtransfer data transfer by hitting CTRL+C during a data transfer, which effectively sends a <code>SIGINT</code> to the gtransfer process group and interrupts the data transfer. You can continue the transfer from where it was interrupted by issuing the very same gtransfer command - as with failed transfers described above. The same restrictions - same host, same user account, no fiddling with the state files in between - apply here.
 
 
==== Documentation ====
 
 
===== General =====
 
* [https://github.com/fr4nk5ch31n3r/gtransfer/ gtransfer GitHub repository and README]
 
 
===== Man pages =====
 
Man(ual) pages are also available locally on the Hazelhen frontends. Simply enter <code>man</code> and the name of the manpage (e.g. <code>gtransfer</code> or <code>dpath</code>) to read a specific page. If man pages with the same name exist in different sections you also have to specify the section number after the <code>man</code> command but before the name of the man page to read a man page from a specific section. E.g. to read the <code>dparam(5)</code> man page - which contains the file format description for dparams - you would enter<code>man 5 dparam</code>.
 
 
====== Section 1 ======
 
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/gtransfer.1.md gtransfer(1)]
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dparam.1.md dparam(1)]
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dpath.1.md dpath(1)]
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/halias.1.md halias(1)]
 


== Usage ==
====== Section 5 ======
 
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dparam.5.md dparam(5)]
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dpath.5.md dpath(5)]
 
 
===== Special functionality =====


See http://www.globus.org/toolkit/docs/5.0/5.0.0/data/gridftp/pi/#globus-url-copy
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/host-aliases.md Host aliases]
for details of the globus-url-copy tool


The basic syntax is:


<pre>globus-url-copy [optional command line switches] source destination</pre>
=== globus-url-copy (aka Globus GridFTP client (GGC)) ===


where source and destination can be further resolved to
* Type <code>globus-url-copy</code> and hit the ENTER/RETURN key to get a brief usage message. Use <code>globus-url-copy -help</code> and <code>man globus-url-copy</code> to get a description of all globus-url-copy options.
<pre>globus-url-copy [optional command line switches] [gsiftp://<server adress>:<port> | file://]<absolute path> [gsiftp://<server adress>:<port> | file://]<absolute path></pre>


Files on remote systems are referenced by ''gsiftp://'' whereas local files a referenced by ''file://''. Be sure always to reference the absolute paths.
* The basic syntax is:
<pre>
globus-url-copy [optional command line switches] source destination
</pre>


To access files on HERMIT, the informations are:<br>
* Source and destination can be further resolved to:
server adress: gridftp-fr1.hww.de<br>
<pre<
port: 2812<br>
globus-url-copy [optional command line switches] {gsiftp://<server address>:<port> | file://}<absolute path> {gsiftp://<server address>:<port> | file://}<absolute path>
</pre>


To access files on LAKI, the informations are: <br>
* Files on remote systems can be referenced by <code>gsiftp://</code> URLs whereas local files have to be referenced by <code>file://</code> URLs. The usage of gtransfer host aliases is not supported by globus-url-copy, hence you need to enter the server addresses and ports manually. Use the following table for reference:
server adress: gridftp-fr2.hww.de<br>
port: 2812 <br>
(GridFTP on LAKI will be available in June 2012)<br>


For the referenced directories, you have to specify the absolute path to your
{| class="wikitable"
workspace. If you are logged into HERMIT, you can find out about your
!Host
workspace with the command
!Server address
!Port
|-
|Hazelhen
|gridftp-fr1.hww.de
|2812
|-
|Laki
|gridftp-fr2.hww.de
|2812
|}


<pre>ws_list</pre>
Example:
<pre>
$ globus-url-copy -cc 2 -tcp-bs 4M -p 2 -cd gsiftp://gridftp-fr1.hww.de:2812/lustre/cray/ws8/ws/user-workspace/file* gsiftp://gridftp.domain.tld:2811/~/
</pre>


that lists all your available workspaces. Your workspace will reside in
a directory like


<pre>/univ_1/ws1/ws/<username-name></pre>
==== Documentation ====


Suppose your workspace direcotory is
See the [http://toolkit.globus.org/toolkit/docs/6.0/gridftp/user/index.html#globus-url-copy Globus Toolkit documentation on globus-url-copy] for more details about this tool.


<pre>/univ_1/ws1/ws/foo-test-0</pre>
== Further Information ==


and you want to copy files from this workspace to the home directory of
* http://www.globus.org/toolkit/docs/latest-stable/gridftp/ - Offical documentation
the machine you are currently  logged in, perform these commands:  
* http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details - Intended to PRACE user, but could also be helpful to others
* http://www.globus.org/toolkit/docs/latest-stable/gridftp/user/#gridftp-user-config-client-firewall - Firewall issues


<pre>globus-proxy init


globus-url-copy -tcp-bs 4000000 -p 8 gsiftp://gridftp-fr1.hww.de:2812/univ_1/ws1/ws/foo-test-0/file  file:///home/foo/file</pre>
== Support ==


where the parameter -p specifies the number of the parallel streams. The optimal value depends among other
* [http://www.hlrs.de/about-us/organization/people/person/schembera/ Björn Schembera] [mailto:schembera@hlrs.de schembera@hlrs.de]
things on the round trip time between the source and destination sites. In general, ~ 4 MB should be okay for fast connections
with more than 1GE.
-tcp-bs specifies the size (in bytes) of the  TCP buffer to be used by the underlying ftp data channels. Please note that while
higher values yield better performance, many parallel streams (high p) together with large buffer sizes could drive the systems
out of memory.

Revision as of 15:44, 2 November 2017

Introduction

For transferring large amounts of data, the simple FTP protocol can not fully exploit high bandwidth connections (especially when they have high latencies, like intra- or international Wide Area Networks (WANs)). For this task, an extension has been definied: GridFTP. It supports parallel TCP streams and multi-node transfers (also known as Striping) to achieve a high data rate on high bandwidth connections (even with high latencies). Furthermore, transfers can be restarted and third-party transfers can be established, which means users can initiate transfers between two GridFTP servers that are controlled by a third party (i.e. the user).

GridFTP has a typical client/server architecture, where the server stores the data or has access to the data and where the client downloads/uploads data or controls a server to server transfer in a third-party transfer as described above. The Globus Toolkit includes a simple GridFTP client - globus-url-copy - which is described in more detail below. On top of that there exists gtransfer a more user-friendly tool with additional features which is also described in more detail below.

At HLRS, dedicated GridFTP servers are available for use which have access to the high-performance file systems of the Hazelhen and Laki supercomputers at HLRS. These servers can be used with a GridFTP client. Usually these GridFTP servers are used in third-party transfers, where users download/upload data from/to another GridFTP server e.g. at their home institution. There are two ways to conduct third-party transfers with our GridFTP servers: Either you use the pre-installed GridFTP clients on our Hazelhen frontend nodes or you install GridFTP clients somewhere else outside the HLRS network, for example at your home institution.


Prerequirements for using our GridFTP servers

  • A personal X509 certificate. For accessing our GridFTP servers and performing your data transfers with GridFTP you need a GSI proxy credential (GPC) signed by your personal X.509 certificate. Please see "Key concepts of GSI security" for more information about GSI proxy certificates. This means that you first need a personal X.509 certificate signed by your organization or institute. In addition the source and destination GridFTP services must be able to verify your GPC to enable the data transfer. By default a GPC derived from a personal X.509 certificate issued by one of the grid certificate authorities (CAs) that are member of the IGTF or their affiliated registration authorities (RAs) is required for data transfers. Please contact your IT department on how to acquire such a personal X.509 certificate.
  • The distinguished name (DN) of your X.509 certificate. After receiving your personal X.509 certificate you need to forward the certificate's DN to the HLRS personnel in order to activate access to our GridFTP servers. To determine the DN you can use the following openssl command on your personal X.509 certificate:
$ openssl x509 -noout -subject -in <YOUR_PERSONAL_X509_CERTIFICATE_FILE>
  • A Linux System with a GridFTP client installed (e.g. one of the Hazelhen frontend nodes)


Further information on X.509 certificates


Pre-installed GridFTP client on the Hazelhen frontend nodes

  • Create a GSI proxy credential (GPC) locally at your workstation with either grid-proxy-init (requires installation of Globus packages or manual compilation and installation of the Globus Toolkit, see below) or genproxy (just requires the Bash shell and OpenSSL). Afterwards copy the resulting GPC (usually named "x509up_u<UID>") to your home directory at HLRS with scp and configure the environment variable X509_USER_PROXY with the path to your GPC ($ denotes a user prompt, user and host names are symbolic!):
user@local:~$ genproxy
Your identity: /C=DE/O=GridGermany/OU=Universitaet Stuttgart/OU=[..]/CN=[...]
Enter pass phrase for /home/user/.globus/userkey.pem:
Your proxy `/tmp/x509up_p13706.fileQNqstU.1' is valid until: Fri May 19 11:16:36 CEST 2017

user@local:~$ scp /tmp/x509up_p13706.fileQNqstU.1 user@hazelhen.hww.de:X509_USER_PROXY

user@local:~$ ssh user@hazelhen.hww.de

user@hazelhen:~$ export X509_USER_PROXY="$HOME/X509_USER_PROXY"
  • To use gtransfer, load the tools/gtransfer module (which automatically loads all pre-required modules) on the Hazelhen frontend node you are currently logged in ($ denotes a user prompt, user and host names are symbolic!):
user@hazelhen:~$ module load tools/gtransfer
load globus-gridftp-client gt-6.0.1478289945 (PATH, MANPATH, GLOBUS_LOCATION, GLOBUS_TCP_PORT_RANGE, GLOBUS_TCP_SOURCE_RANGE, X509_CERT_DIR, LD_LIBRARY_PATH)

To make use of the Globus GridFTP client (GGC) you need a GSI proxy credential (GPC)
that authenticates you against the involved GridFTP servers.

Create your GPC at your local workstation and copy it to this system (e.g. via scp).
Then make it known to the Globus tools with ($ is the prompt and not part of the
command!):

```
$ export X509_USER_PROXY="/path/to/gpc"
```

Although you can use the GGC alone to transfer files via GridFTP, we strongly
recommend to use gtransfer - a more advanced GridFTP client on top of GGC, tgftp and
uberftp - instead. To use it, simply load its modulefile with:

```
$ module load tools/gtransfer
```

load tgftp 0.7.0 (PATH, MANPATH)
In addition to the manual pages (man {tgftp|tgftp_log}), there is also a longer README file available (less /sw/hazelhen/hlrs/tools/tgftp/0.7.0/share/doc/README).
load gtransfer 0.8.1 (PATH, MANPATH)
Bash completion loaded: press the TAB key for completion.
In addition to the manual pages (man {gtransfer|gt|dparam|dpath|halias|gcat|gls|gmkdir|gmv|grm}), there is also a longer README file available (less /sw/hazelhen/hlrs/tools/gtransfer/0.8.1/README.md).
  • To use globus-url-copy alone, load the module tools/globus-gridftp-client on the Hazelhen frontend node you are currently logged in ($ denotes a user prompt, user and host names are symbolic!):
user@hazelhen:~$ module load tools/globus-gridftp-client
load globus-gridftp-client gt-6.0.1478289945 (PATH, MANPATH, GLOBUS_LOCATION, GLOBUS_TCP_PORT_RANGE, GLOBUS_TCP_SOURCE_RANGE, X509_CERT_DIR, LD_LIBRARY_PATH)

To make use of the Globus GridFTP client (GGC) you need a GSI proxy credential (GPC)
that authenticates you against the involved GridFTP servers.

Create your GPC at your local workstation and copy it to this system (e.g. via scp).
Then make it known to the Globus tools with ($ is the prompt and not part of the
command!):

```
$ export X509_USER_PROXY="/path/to/gpc"
```

Although you can use the GGC alone to transfer files via GridFTP, we strongly
recommend to use gtransfer - a more advanced GridFTP client on top of GGC, tgftp and
uberftp - instead. To use it, simply load its modulefile with:

```
$ module load tools/gtransfer
```

Installing the GridFTP client at your home institution

  • Since version 5.2 of the Globus Toolkit, the GridFTP client is also available as pre-compiled RPM (for Red Hat Enterprise Linux 6 and 7, CentOS 6 and 7, Scientific Linux 6 and 7 and possibly others) or DEB (for Debian GNU/Linux 7, 8 and 9 and Ubuntu Linux 14.04 LTS, 16.04 LTS, 16.10 and 17.04) package. Install the GridFTP client - if a pre-compiled package is available it's usually named globus-gass-copy-progs, make grdiftp will include it for source installs - by following the instructions in the Globus Tookit 6.0 documentation. Be sure to also install the grid-proxy-init tool - included in the globus-proxy-utils package or in an installation from source with make gridftp - or just use the genproxy tool mentioned above. Only one of these tools is required for the creation of GSI proxy credentials.
  • Create a directory .globus in your home directory and place both your personal X.509 certificate (as usercert.pem) and your private key file (as userkey.pem) there. To create these files from a PKCS#12 keystore follow these instructions but use the names from above for the destination files. When using grid-proxy-init to create a GSI proxy credential, you can also place a PKCS#12 keystore (as usercred.p12) there - the Firefox web browser for example exports user certificates and keys as PKCS#12 keystore.
  • Additionally create another directory named certificates in .globus and place all the trusted CA certificates there. A collection suitable for use with the Globus Toolkit is provided by SURFsara as a tarball - download and untar it into the above directory. The included files are needed to authenticate remote entities (i.e. GridFTP servers).
  • Run grid-proxy-init or genproxy to verify the validity of your personal X.509 certificate and to create a GSI proxy credential signed by your personal X.509 certificate with a default lifetime of 12 hours (for grid-proxy-init) and 24 hours (for genproxy). This step has to be repeated after the created GSI proxy credential has expired.

Usage

Workspaces

The paths to your workspaces are identical on supercomputers and GridFTP servers. To get the path of a specific workspace, first login to the respective supercomputer frontend(s), then determine the workspace name of the workspace you want to use and then enter ws_find <WORKSPACE_NAME> to get the actual path to this specific workspace. More information about workspaces at HLRS can be found in the platforms wiki.


gtransfer (gt)

  • Type gt and hit the ENTER/RETURN key to get a brief usage message. Use gt --help and man gt to get a description of all gt options.
  • To start a transfer, enter gt, hit the SPACE key and then hit the TAB key three times to make use of the gt bash completion. You'll get a listing of all available options. Start with -s to enter the source address. The - character was already provided by the gt bash completion. After entering s hit the SPACE key and enter your source address, e.g. gsiftp://gridftp.domain.tld:2811. You can also hit the TAB key two times to get the preconfigured GridFTP source server addresses or host aliases. Add the path to your desired workspace just like on the supercomputer frontends (e.g. /lustre/cray/ws8/ws/user-workspace/) and then hit the TAB key two to three times to get a listing of the files and directories in your workspace directory on the remote server. Depending on the latency and the number of files present there, it can take a few seconds until you see results and this will only work if your GSI proxy certificate is considered valid by the remote GridFTP server and you are trying to list a directory where you have rx (read and execute) permissions. Type in the beginning of your desired file or directory and hit the TAB key to complete the name. If you want to copy all files in a directory, add /* or just / to the end of the path. Now continue with the destination address. Add -d to the command line, hit the SPACE key and continue with the destination address just like you entered the source address. Enter a / at the end of the destination path.
  • To recursively copy all files and directories below a given directory, add the -r option to the gt command line.

Example:

$ gt <TAB>

$ gt -

$ gt -<TAB><TAB>

$ gt -
--                       --configfile             --gt-max-retries         -m                       -s                       --verbose
-a                       -d                       --gt-progress-indicator  --metric                 --source                 --version
--auto-clean             --destination            --guc-max-retries        --no-sync                --sync-level             
--auto-optimize          -e                       --help                   -o                       --transfer-list          
-c                       --encrypt-data-channel   -l                       -r                       -v                       
--checksum-data-channel  -f                       --logfile                --recursive              -V

$ gt -s <TAB><TAB>

$ gt -s
hazelhen:  laki:

$ gt -s h<TAB>

$ gt -s hazelhen:

$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/<TAB>

$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file

$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file<TAB><TAB>

$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file
hazelhen:/lustre/cray/ws8/ws/user-workspace/file1  hazelhen:/lustre/cray/ws8/ws/user-workspace/file2  hazelhen:/lustre/cray/ws8/ws/user-workspace/file3

$ gt -s hazelhen:/lustre/cray/ws8/ws/user-workspace/file* -d gsiftp://gridftp.domain.tld:2811/~/

Hints

I have multiple user accounts at a remote GridFTP server. How can I choose a specific account?

This can be done by inserting a <USER>@ portion into your GridFTP URLs or prefixing host aliases with <USER>@. Replace <USER> with your desired username on the remote site.

Examples:

  • GridFTP URL:

gsiftp://gridftp.domain.tld:2811/[...]/files/* => gsiftp://user1@gridftp.domain.tld:2811/[...]/files/*

  • Host alias:

my-gridftp:/[...]/files/ => user1@my-gridftp:/[...]/files/


Can gtransfer automatically create non-existing directories on the destination side?

Yes, this is possible and activated by default. Just enter the desired name or path in your destination URL and gtransfer will automatically create non-existing directories on the destination side (with the help of globus-url-copy).


Use host aliases for your GridFTP servers

There are already two host aliases defined which point to the two GridFTP servers at HLRS:

  • hazelhen:
  • laki:

You can use them instead of the longer host part of a GridFTP URL in the source and destination URLs, e.g. you can use:

  • hazelhen:/lustre/cray/ws8/ws/user-workspace instead of
  • gsiftp://gridftp-fr1.hww.de:2812/lustre/cray/ws8/ws/user-workspace

To create your own host aliases, please refer to the host aliases documentation linked below.

What if the gtransfer command fails during a data transfer?

Globus-url-copy - the tool gtransfer actually uses through tgftp to transfer data - is configured by gtransfer to retry the transfer of files that failed to transfer successfully to the destination GridFTP server. And if that fails, gtransfer will retry the whole process three times until giving up on the transfer. And even if that happens, you can later continue a failed or interrupted transfer by simply issuing the very same gtransfer command. Gtransfer stores state information about a transfer in your home directory below .gtransfer. So this mechanism will work in the same home directory and with the same user account and as long as the state files are not touched in between.

What if I need to interrupt a data transfer?

You can always interrupt a gtransfer data transfer by hitting CTRL+C during a data transfer, which effectively sends a SIGINT to the gtransfer process group and interrupts the data transfer. You can continue the transfer from where it was interrupted by issuing the very same gtransfer command - as with failed transfers described above. The same restrictions - same host, same user account, no fiddling with the state files in between - apply here.


Documentation

General


Man pages

Man(ual) pages are also available locally on the Hazelhen frontends. Simply enter man and the name of the manpage (e.g. gtransfer or dpath) to read a specific page. If man pages with the same name exist in different sections you also have to specify the section number after the man command but before the name of the man page to read a man page from a specific section. E.g. to read the dparam(5) man page - which contains the file format description for dparams - you would enterman 5 dparam.


Section 1


Section 5


Special functionality


globus-url-copy (aka Globus GridFTP client (GGC))

  • Type globus-url-copy and hit the ENTER/RETURN key to get a brief usage message. Use globus-url-copy -help and man globus-url-copy to get a description of all globus-url-copy options.
  • The basic syntax is:
globus-url-copy [optional command line switches] source destination
  • Source and destination can be further resolved to:

<pre< globus-url-copy [optional command line switches] {gsiftp://<server address>:<port> | file://}<absolute path> {gsiftp://<server address>:<port> | file://}<absolute path>

  • Files on remote systems can be referenced by gsiftp:// URLs whereas local files have to be referenced by file:// URLs. The usage of gtransfer host aliases is not supported by globus-url-copy, hence you need to enter the server addresses and ports manually. Use the following table for reference:
Host Server address Port
Hazelhen gridftp-fr1.hww.de 2812
Laki gridftp-fr2.hww.de 2812

Example:

$ globus-url-copy -cc 2 -tcp-bs 4M -p 2 -cd gsiftp://gridftp-fr1.hww.de:2812/lustre/cray/ws8/ws/user-workspace/file* gsiftp://gridftp.domain.tld:2811/~/


Documentation

See the Globus Toolkit documentation on globus-url-copy for more details about this tool.

Further Information


Support