- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Data Transfer with GridFTP: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
 
(41 intermediate revisions by 4 users not shown)
Line 1: Line 1:
= Introduction =
{{Note
| text = Be aware that by default the data transfer is not encrypted. Only the authentication step is encrypted but the following data transfers are unencrypted. You can add encryption with "-dcpriv" or "-data-channel-private" command line options. For transfering sensitive data you should active encryption independent of the expected performance reductions. Gridftp falls back to unencrypted data channel if a involved server does not support it (e.g. dCache)
| type = warn
}}


For transfering large amounts of data, simple FTP protocol can not
== Introduction ==
utilize high bandwidth channels. For this task, an extension has been
definied: GridFTP supports parallel TCP streams and multi-node transfers
to achieve a high data rate via high bandwidth connections.
Furthermore, transfers can be restarted and third-party transfers can be
established. This means one can initiate transfers between two end hosts
that are mediated by a third party.


GridFTP has a typical client/server architecture, where the server stores
For transferring large amounts of data, the simple FTP protocol can not fully exploit high bandwidth connections (especially when they have high latencies, like intra- or international Wide Area Networks (WANs)). For this task, an extension has been definied: GridFTP. It supports parallel TCP streams and multi-node transfers (also known as ''Striping'') to achieve a high data rate on high bandwidth connections (even with high latencies). Furthermore, transfers can be restarted and third-party transfers can be established, which means users can initiate transfers between two GridFTP servers that are controlled by a third party (i.e. the user).  
the data or has access to the data. A simple GridFTP client - globus-url-copy - is provided by the Globus Toolkit.  


At HLRS, a dedicated GridFTP server is running which has access to the according filesystems. This server can not be accessed directly but has to be controled by a GridFTP client, this means it has to be used as a third-party transfer.  
GridFTP has a typical client/server architecture, where the server stores the data or has access to the data and where the client downloads/uploads data or controls a server to server transfer in a third-party transfer as described above. The Globus Toolkit includes a simple GridFTP client - <code>globus-url-copy</code> - which is described in more detail below. On top of that there exists <code>gtransfer</code> a more user-friendly tool with additional features which is also described in more detail below.  


There are two places from where you can conduct third-party transfers by the GridFTP client: Either you use the GridFTP client which is pre-installed at our Hazelhen frontend nodes or you install the GridFTP client somewhere else outside the HLRS network, for example at your home institution.  
At HLRS, dedicated GridFTP servers are available for use which have access to the high-performance file system of the Hawk supercomputer. These servers can be used with a GridFTP client, which needs to be installed by the user.


= Requirements =
== Prerequirements for using our GridFTP servers ==
 
* '''A personal X509 grid certificate.''' For accessing our GridFTP servers and performing your data transfers with GridFTP you need a GSI proxy credential (GPC) signed by your personal X.509 grid certificate. Please see [https://gridcf.org/gct-docs/latest/gsic/key/index.html "Key concepts of GSI security"] for more information about GSI proxy certificates. This means that you first need a personal X.509 grid certificate signed by your organization or institute. In addition the source and destination GridFTP services must be able to verify your GPC to enable the data transfer. By default a GPC derived from a personal X.509 certificate issued by one of the grid certificate authorities (CAs) that are member of the [https://www.igtf.net/ IGTF] or their affiliated registration authorities (RAs) is required for data transfers. Please contact your IT department on how to acquire such a personal X.509 certificate.
 
* '''The distinguished name (DN) of your X.509 grid certificate.''' After receiving your personal X.509 grid certificate you need to forward the certificate's DN to the HLRS personnel in order to activate access to our GridFTP servers. To determine the DN you can use the following openssl command on your personal X.509 grid certificate:


* A personal X509 certificate. (Further information: http://en.wikipedia.org/wiki/X.509 , http://www.eugridpma.org/members/worldmap/ , http://www.prace-project.eu/Certificates-FAQ?lang=en)
* The DN of this certificate has to be extracted and sent to us. One can extract the DN by the following command:
<pre>
<pre>
openssl x509 -subject -in <USERCERT> -noout | sed -e 's/subject= //'
$ openssl x509 -noout -subject -in <YOUR_PERSONAL_X509_CERTIFICATE_FILE>
</pre>  
</pre>
* A Linux System with the GridFTP client installed (which is installed on the HLRS frontend nodes)
 
* '''A Linux System with a GridFTP client installed'''
 
 
=== Further information on X.509 certificates ===
 
* http://en.wikipedia.org/wiki/X.509
* http://www.eugridpma.org/members/worldmap/


= Pre-installed GridFTP client on the Hazelhen frontend nodes =
== Installing the GridFTP client at your home institution ==


* Create a directory ''.globus/'' in your homedir and place both the certificate and your keyfile into this directory
* Since version 5.2 of the Globus Toolkit, the GridFTP client is also available as pre-compiled RPM (for '''Red Hat Enterprise Linux 6 and 7''', '''CentOS 6 and 7''', '''Scientific Linux 6 and 7''' and possibly others) or DEB (for '''Debian GNU/Linux 7, 8 and 9''' and '''Ubuntu Linux 14.04 LTS, 16.04 LTS, 16.10 and 17.04''') package. Install the GridFTP client - if a pre-compiled package is available it's usually named <code>globus-gass-copy-progs</code>, <code>make grdiftp</code> will include it for source installs - by following the instructions in the [https://gridcf.org/gct-docs/latest/gridftp/admin/index.html#gridftp-admin-installing Grid Community Toolkit documentation]. Be sure to also install the <code>grid-proxy-init</code> tool - included in the <code>globus-proxy-utils</code> package or in an installation from source with <code>make gridftp</code> - or just use the <code>genproxy</code> tool mentioned above. Only one of these tools is required for the creation of GSI proxy credentials.


* In the above directory, create another directory ''certificates/'' and place all the CA files there. These files can be found at e.g. [https://winnetou.surfsara.nl/prace/certs/globuscerts.tar.gz here] as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
* Create a directory <code>.globus</code> in your home directory and place both your personal X.509 certificate (as <code>usercert.pem</code>) and your private key file (as <code>userkey.pem</code>) there. To create these files from a PKCS#12 keystore follow these [[X.509-SSH#CONFIGURATION_2|instructions]] but use the names from above for the destination files. When using <code>grid-proxy-init</code> to create a GSI proxy credential, you can also place a PKCS#12 keystore (as <code>usercred.p12</code>) there - the Firefox web browser for example exports user certificates and keys as PKCS#12 keystore.


= Installing the GridFTP client at your home institution =
* Additionally create another directory named <code>certificates</code> in <code>.globus</code> and place all the trusted CA certificates there. A collection suitable for use with the Globus Toolkit is available for download as [http://www.eugridpma.info/distribution/igtf/current/accredited/igtf-preinstalled-bundle-classic.tar.gz tarball] - download and untar it into the above directory. The included files are needed to authenticate remote entities (i.e. GridFTP servers).


== Installation & Configuration ==
* Run <code>grid-proxy-init</code> or <code>genproxy</code> to verify the validity of your personal X.509 certificate and to create a GSI proxy credential signed by your personal X.509 certificate with a default lifetime of 12 hours (for <code>grid-proxy-init</code>) and 24 hours (for <code>genproxy</code>). This step has to be repeated after the created GSI proxy credential has expired.


* Since the version 5.2, the GridFTP client is also available packaged as rpm- or deb-package. Install the GridFTP client by following the instructions on [http://toolkit.globus.org/toolkit/docs/6.0/admin/install/index.html this page] Be sure to have "globus-proxy-utils" as well (If it the client ist compiled from source, this is included). 
== Usage ==


* Create a directory ''.globus/'' in your homedir and place both the certificate and your keyfile into this directory


* In the above directory, create another directory ''certificates/'' and place all the CA files there. These files can be found at e.g. [http://winnetou.surfsara.nl/deisa/certs/globuscerts.tar.gz here] as a tarball- just untar them into the above directory. These files are needed to later verify your certificate against the Certificate Authority.
=== Workspaces ===


* run
The paths to your workspaces are identical on supercomputers and GridFTP servers. To get the path of a specific workspace, first login to the respective supercomputer frontend(s), then determine the workspace name of the workspace you want to use and then enter <code>ws_find <WORKSPACE_NAME></code> to get the actual path to this specific workspace. More information about workspaces at HLRS can be found in the [https://kb.hlrs.de/platforms/index.php/Workspace_mechanism platforms wiki].


<pre> grid-proxy-init </pre>
This tool verifies the validity of your certificate and creates a proxy, that is internally needed by the GridFTP client. This step has to be repeated before the usage. If something like


<pre>
=== gtransfer (gt) ===
Your identity: <YourDNhere>
 
Creating proxy ............................................... Done
* Type <code>gt</code> and hit the ENTER/RETURN key to get a brief usage message. Use <code>gt --help</code> and <code>man gt</code> to get a description of all gt options.
Your proxy is valid until: Wed Apr 18 22:25:32 2012
* To start a transfer, enter <code>gt</code>, hit the SPACE key and then hit the TAB key three times to make use of the gt bash completion. You'll get a listing of all available options. Start with <code>-s</code> to enter the source address. The <code>-</code> character was already provided by the gt bash completion. After entering <code>s</code> hit the SPACE key and enter your source address, e.g. <code>gsiftp://gridftp.domain.tld:2811</code>. You can also hit the TAB key two times to get the preconfigured GridFTP source server addresses or [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/host-aliases.md host aliases]. Add the path to your desired workspace just like on the supercomputer frontends (e.g. <code>/lustre/hpe/ws10/ws10.3/ws/user-workspace/</code>) and then hit the TAB key two to three times to get a listing of the files and directories in your workspace directory on the remote server. Depending on the latency and the number of files present there, it can take a few seconds until you see results and this will only work if your GSI proxy certificate is considered valid by the remote GridFTP server and you are trying to list a directory where you have <code>rx</code> (read and execute) permissions. Type in the beginning of your desired file or directory and hit the TAB key to complete the name. If you want to copy all files in a directory, add <code>/*</code> or just <code>/</code> to the end of the path. Now continue with the destination address. Add <code>-d</code> to the command line, hit the SPACE key and continue with the destination address just like you entered the source address. Enter a <code>/</code> at the end of the destination path.
</pre>
* To recursively copy all files and directories below a given directory, add the <code>-r</code> option to the gt command line.
 
==== Hints ====


shows up, everything is installed correctly.


===== I have multiple user accounts at a remote GridFTP server. How can I choose a specific account? =====


=== Firewall issues ===
This can be done by inserting a <code><USER>@</code> portion into your GridFTP URLs or prefixing host aliases with <code><USER>@</code>. Replace <code><USER></code> with your desired username on the remote site.


Because there is a distinction between control and data connection, some ports of the firewall on the client side have to be opened:
Examples:
* Port 2812 for the control channel to the frontend node gridftp-fr1.hww.de (HAZELHEN) or gridftp-fr2.hww.de (LAKI)
* Ports 20000-20500 for data channels to the backend node (Hostnames on request. These ports have to be opened for both incoming and outgoing connections.


Moreover, you have to set the following environment variables to instruct your client to use the specified ports:
* GridFTP URL:
<pre>
<code>gsiftp://gridftp.domain.tld:2811/[...]/files/*</code> => <code>gsiftp://user1@gridftp.domain.tld:2811/[...]/files/*</code>
export GLOBUS_TCP_PORT_RANGE=20000,20500
* Host alias: 
export GLOBUS_TCP_SOURCE_RANGE=20000,20500
<code>my-gridftp:/[...]/files/</code> => <code>user1@my-gridftp:/[...]/files/</code>
</pre>  


Please refer to this document for further information: http://www.globus.org/toolkit/docs/latest-stable/gridftp/user/#gridftp-user-config-client-firewall


= Usage =
===== Can gtransfer automatically create non-existing directories on the destination side? =====


First, run
Yes, this is possible and activated by default. Just enter the desired name or path in your destination URL and gtransfer will automatically create non-existing directories on the destination side (with the help of [http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/user/#globus-url-copy globus-url-copy]).


<pre> grid-proxy-init </pre>
This tool verifies the validity of your certificate and creates a proxy, that is internally needed by the GridFTP client. This step has to be repeated before the usage. If something like


<pre>
===== Use host aliases for your GridFTP servers =====
Your identity: <YourDNhere>
Creating proxy ............................................... Done
Your proxy is valid until: Wed Apr 18 22:25:32 2012
</pre>


shows up, a proxy certificate has been set up properly.
To create your own host aliases, please refer to the host aliases documentation linked below.


Then, transfers can be started.
===== What if the gtransfer command fails during a data transfer? =====
[https://gridcf.org/gct-docs/latest/gridftp/user/index.html#globus-url-copy Globus-url-copy] - the tool gtransfer actually uses through [https://github.com/fr4nk5ch31n3r/tgftp tgftp] to transfer data - is configured by gtransfer to retry the transfer of files that failed to transfer successfully to the destination GridFTP server. And if that fails, gtransfer will retry the whole process three times until giving up on the transfer. And even if that happens, you can later continue a failed or interrupted transfer by simply issuing the very same gtransfer command. Gtransfer stores state information about a transfer in your home directory below <code>.gtransfer</code>. So this mechanism will work in the same home directory and with the same user account and as long as the state files are not touched in between.


See http://toolkit.globus.org/toolkit/docs/6.0/appendices/commands/index.html#globus-url-copy
===== What if I need to interrupt a data transfer? =====
for details of the globus-url-copy tool
You can always interrupt a gtransfer data transfer by hitting CTRL+C during a data transfer, which effectively sends a <code>SIGINT</code> to the gtransfer process group and interrupts the data transfer. You can continue the transfer from where it was interrupted by issuing the very same gtransfer command - as with failed transfers described above. The same restrictions - same host, same user account, no fiddling with the state files in between - apply here.


The basic syntax is:


<pre>globus-url-copy [optional command line switches] source destination</pre>
==== Documentation ====


where source and destination can be further resolved to
<pre>globus-url-copy [optional command line switches] [gsiftp://<server adress>:<port> | file://]<absolute path> [gsiftp://<server adress>:<port> | file://]<absolute path></pre>


Files on remote systems are referenced by ''gsiftp://'' whereas local files a referenced by ''file://''. Be sure always to reference the absolute paths.
===== General =====


To access files on '''HAZELHEN''', the informations are:<br>
* [https://github.com/fr4nk5ch31n3r/gtransfer/ gtransfer GitHub repository and README]
*server adress: ''gridftp-fr1.hww.de''<br>
*port: ''2812''<br>


To access files on '''LAKI''', the informations are: <br>
*server adress: ''gridftp-fr2.hww.de''<br>
*port: ''2812'' <br>


===== Man pages =====


For the referenced directories, you have to specify the absolute path to your
Man(ual) pages are available as a part of the software installation. Simply enter <code>man</code> and the name of the manpage (e.g. <code>gtransfer</code> or <code>dpath</code>) to read a specific page. If man pages with the same name exist in different sections you also have to specify the section number after the <code>man</code> command but before the name of the man page to read a man page from a specific section. E.g. to read the <code>dparam(5)</code> man page - which contains the file format description for dparams - you would enter<code>man 5 dparam</code>.
workspace. If you are logged into HAZELHEN, you can find out about your
workspace with the command


<pre>ws_list</pre>


that lists all your available workspaces. Your workspace will reside in
====== Section 1 ======
a directory like


<pre>/univ_1/ws1/ws/<username-name></pre>
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/gtransfer.1.md gtransfer(1)]
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dparam.1.md dparam(1)]
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dpath.1.md dpath(1)]
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/halias.1.md halias(1)]


Suppose your workspace directory is


<pre>/univ_1/ws1/ws/foo-test-0</pre>
====== Section 5 ======


and you want to copy files from this workspace to the home directory of
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dparam.5.md dparam(5)]
the machine you are currently  logged in, perform these commands:  
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/dpath.5.md dpath(5)]


<pre>grid-proxy-init
===== Special functionality =====


globus-url-copy -tcp-bs 4000000 -p 8 gsiftp://gridftp-fr1.hww.de:2812/univ_1/ws1/ws/foo-test-0/file  file:///home/foo/file
* [https://github.com/fr4nk5ch31n3r/gtransfer/blob/master/share/doc/host-aliases.md Host aliases]
</pre>


If you want to copy files from this workspace to the another machine running a GridFTP server as well, say in the PRACE network, perform these commands:
=== globus-url-copy (aka Globus GridFTP client (GGC)) ===


<pre>grid-proxy-init
* Type <code>globus-url-copy</code> and hit the ENTER/RETURN key to get a brief usage message. Use <code>globus-url-copy -help</code> and <code>man globus-url-copy</code> to get a description of all globus-url-copy options.


globus-url-copy -tcp-bs 4000000 -p 8 gsiftp://gridftp-fr1.hww.de:2812/univ_1/ws1/ws/foo-test-0/file  gsiftp://juqueen1p.fz-juelich.de:2812/~
* The basic syntax is:
<pre>
globus-url-copy [optional command line switches] source destination
</pre>
</pre>


It may be neccessary to play around with the parameters a little bit to achieve optimal performance
* Source and destination can be further resolved to:
<pre<
globus-url-copy [optional command line switches] {gsiftp://<server address>:<port> | file://}<absolute path> {gsiftp://<server address>:<port> | file://}<absolute path>
</pre>


=== Some important parameters ===
* Files on remote systems can be referenced by <code>gsiftp://</code> URLs whereas local files have to be referenced by <code>file://</code> URLs. The usage of gtransfer host aliases is not supported by globus-url-copy, hence you need to enter the server addresses and ports manually. Use the following table for reference:


{|border="1" cellpadding="2"
{| class="wikitable"
|-
!Server address
|'''Parameter'''||'''Description'''
!Port
|-
|-
| -help|| Prints out a detailled list of parameters and their description
|gridftp-fr1.hww.hlrs.de
|2812
|-
|-
| -vb||Verbose mode, show more information: number of bytes transferred, performance since the last update (every 5 seconds) and average performance for the whole transfer
|gridftp-fr2.hww.hlrs.de
|-
|2812
| -dbg|| Debug mode, gives detailed information for debugging
|-
| -p || Specifies the number of the parallel streams.  
|-
| -tcp-bs || Specifies the size (in bytes) of the  TCP buffer to be used by the underlying ftp data channels. Please note that while higher values yield better performance, many parallel streams (high p) together with large buffer sizes could drive the systems out of memory.
|}
|}


= Further Information =
Example:
<pre>
$ globus-url-copy -cc 2 -tcp-bs 4M -p 2 -cd gsiftp://gridftp-fr1.hww.hlrs.de:2812/lustre/hpe/ws10/ws10.3/ws/user-workspace/file* gsiftp://gridftp.domain.tld:2811/~/
</pre>
 
 
==== Documentation ====
 
See the [https://gridcf.org/gct-docs/latest/gridftp/user/index.html#gridftp-user-basic Grid Community Toolkit documentation on globus-url-copy] for more details.


* http://www.globus.org/toolkit/docs/latest-stable/gridftp/ Offical documentation
Manual pages <code>man globus-url-copy</code>
* http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details Intend to PRACE user, but could also be helpful to others
* http://www.globus.org/toolkit/docs/latest-stable/gridftp/user/#gridftp-user-config-client-firewall Firewall issues


== Support ==  
== Further Information ==


[http://www.hlrs.de/organization/people/schembera/ Björn Schembera] [mailto:schembera@hlrs.de schembera@hlrs.de]
* https://gridcf.org/gct-docs/latest/gridftp/index.html - Offical documentation
* https://gridcf.org/gct-docs/latest/gridftp/user/index.html#gridftp-user-quickstart-config - Firewall issues

Latest revision as of 10:57, 9 February 2023

Note: Be aware that by default the data transfer is not encrypted. Only the authentication step is encrypted but the following data transfers are unencrypted. You can add encryption with "-dcpriv" or "-data-channel-private" command line options. For transfering sensitive data you should active encryption independent of the expected performance reductions. Gridftp falls back to unencrypted data channel if a involved server does not support it (e.g. dCache)


Introduction

For transferring large amounts of data, the simple FTP protocol can not fully exploit high bandwidth connections (especially when they have high latencies, like intra- or international Wide Area Networks (WANs)). For this task, an extension has been definied: GridFTP. It supports parallel TCP streams and multi-node transfers (also known as Striping) to achieve a high data rate on high bandwidth connections (even with high latencies). Furthermore, transfers can be restarted and third-party transfers can be established, which means users can initiate transfers between two GridFTP servers that are controlled by a third party (i.e. the user).

GridFTP has a typical client/server architecture, where the server stores the data or has access to the data and where the client downloads/uploads data or controls a server to server transfer in a third-party transfer as described above. The Globus Toolkit includes a simple GridFTP client - globus-url-copy - which is described in more detail below. On top of that there exists gtransfer a more user-friendly tool with additional features which is also described in more detail below.

At HLRS, dedicated GridFTP servers are available for use which have access to the high-performance file system of the Hawk supercomputer. These servers can be used with a GridFTP client, which needs to be installed by the user.

Prerequirements for using our GridFTP servers

  • A personal X509 grid certificate. For accessing our GridFTP servers and performing your data transfers with GridFTP you need a GSI proxy credential (GPC) signed by your personal X.509 grid certificate. Please see "Key concepts of GSI security" for more information about GSI proxy certificates. This means that you first need a personal X.509 grid certificate signed by your organization or institute. In addition the source and destination GridFTP services must be able to verify your GPC to enable the data transfer. By default a GPC derived from a personal X.509 certificate issued by one of the grid certificate authorities (CAs) that are member of the IGTF or their affiliated registration authorities (RAs) is required for data transfers. Please contact your IT department on how to acquire such a personal X.509 certificate.
  • The distinguished name (DN) of your X.509 grid certificate. After receiving your personal X.509 grid certificate you need to forward the certificate's DN to the HLRS personnel in order to activate access to our GridFTP servers. To determine the DN you can use the following openssl command on your personal X.509 grid certificate:
$ openssl x509 -noout -subject -in <YOUR_PERSONAL_X509_CERTIFICATE_FILE>
  • A Linux System with a GridFTP client installed


Further information on X.509 certificates

Installing the GridFTP client at your home institution

  • Since version 5.2 of the Globus Toolkit, the GridFTP client is also available as pre-compiled RPM (for Red Hat Enterprise Linux 6 and 7, CentOS 6 and 7, Scientific Linux 6 and 7 and possibly others) or DEB (for Debian GNU/Linux 7, 8 and 9 and Ubuntu Linux 14.04 LTS, 16.04 LTS, 16.10 and 17.04) package. Install the GridFTP client - if a pre-compiled package is available it's usually named globus-gass-copy-progs, make grdiftp will include it for source installs - by following the instructions in the Grid Community Toolkit documentation. Be sure to also install the grid-proxy-init tool - included in the globus-proxy-utils package or in an installation from source with make gridftp - or just use the genproxy tool mentioned above. Only one of these tools is required for the creation of GSI proxy credentials.
  • Create a directory .globus in your home directory and place both your personal X.509 certificate (as usercert.pem) and your private key file (as userkey.pem) there. To create these files from a PKCS#12 keystore follow these instructions but use the names from above for the destination files. When using grid-proxy-init to create a GSI proxy credential, you can also place a PKCS#12 keystore (as usercred.p12) there - the Firefox web browser for example exports user certificates and keys as PKCS#12 keystore.
  • Additionally create another directory named certificates in .globus and place all the trusted CA certificates there. A collection suitable for use with the Globus Toolkit is available for download as tarball - download and untar it into the above directory. The included files are needed to authenticate remote entities (i.e. GridFTP servers).
  • Run grid-proxy-init or genproxy to verify the validity of your personal X.509 certificate and to create a GSI proxy credential signed by your personal X.509 certificate with a default lifetime of 12 hours (for grid-proxy-init) and 24 hours (for genproxy). This step has to be repeated after the created GSI proxy credential has expired.

Usage

Workspaces

The paths to your workspaces are identical on supercomputers and GridFTP servers. To get the path of a specific workspace, first login to the respective supercomputer frontend(s), then determine the workspace name of the workspace you want to use and then enter ws_find <WORKSPACE_NAME> to get the actual path to this specific workspace. More information about workspaces at HLRS can be found in the platforms wiki.


gtransfer (gt)

  • Type gt and hit the ENTER/RETURN key to get a brief usage message. Use gt --help and man gt to get a description of all gt options.
  • To start a transfer, enter gt, hit the SPACE key and then hit the TAB key three times to make use of the gt bash completion. You'll get a listing of all available options. Start with -s to enter the source address. The - character was already provided by the gt bash completion. After entering s hit the SPACE key and enter your source address, e.g. gsiftp://gridftp.domain.tld:2811. You can also hit the TAB key two times to get the preconfigured GridFTP source server addresses or host aliases. Add the path to your desired workspace just like on the supercomputer frontends (e.g. /lustre/hpe/ws10/ws10.3/ws/user-workspace/) and then hit the TAB key two to three times to get a listing of the files and directories in your workspace directory on the remote server. Depending on the latency and the number of files present there, it can take a few seconds until you see results and this will only work if your GSI proxy certificate is considered valid by the remote GridFTP server and you are trying to list a directory where you have rx (read and execute) permissions. Type in the beginning of your desired file or directory and hit the TAB key to complete the name. If you want to copy all files in a directory, add /* or just / to the end of the path. Now continue with the destination address. Add -d to the command line, hit the SPACE key and continue with the destination address just like you entered the source address. Enter a / at the end of the destination path.
  • To recursively copy all files and directories below a given directory, add the -r option to the gt command line.

Hints

I have multiple user accounts at a remote GridFTP server. How can I choose a specific account?

This can be done by inserting a <USER>@ portion into your GridFTP URLs or prefixing host aliases with <USER>@. Replace <USER> with your desired username on the remote site.

Examples:

  • GridFTP URL:

gsiftp://gridftp.domain.tld:2811/[...]/files/* => gsiftp://user1@gridftp.domain.tld:2811/[...]/files/*

  • Host alias:

my-gridftp:/[...]/files/ => user1@my-gridftp:/[...]/files/


Can gtransfer automatically create non-existing directories on the destination side?

Yes, this is possible and activated by default. Just enter the desired name or path in your destination URL and gtransfer will automatically create non-existing directories on the destination side (with the help of globus-url-copy).


Use host aliases for your GridFTP servers

To create your own host aliases, please refer to the host aliases documentation linked below.

What if the gtransfer command fails during a data transfer?

Globus-url-copy - the tool gtransfer actually uses through tgftp to transfer data - is configured by gtransfer to retry the transfer of files that failed to transfer successfully to the destination GridFTP server. And if that fails, gtransfer will retry the whole process three times until giving up on the transfer. And even if that happens, you can later continue a failed or interrupted transfer by simply issuing the very same gtransfer command. Gtransfer stores state information about a transfer in your home directory below .gtransfer. So this mechanism will work in the same home directory and with the same user account and as long as the state files are not touched in between.

What if I need to interrupt a data transfer?

You can always interrupt a gtransfer data transfer by hitting CTRL+C during a data transfer, which effectively sends a SIGINT to the gtransfer process group and interrupts the data transfer. You can continue the transfer from where it was interrupted by issuing the very same gtransfer command - as with failed transfers described above. The same restrictions - same host, same user account, no fiddling with the state files in between - apply here.


Documentation

General


Man pages

Man(ual) pages are available as a part of the software installation. Simply enter man and the name of the manpage (e.g. gtransfer or dpath) to read a specific page. If man pages with the same name exist in different sections you also have to specify the section number after the man command but before the name of the man page to read a man page from a specific section. E.g. to read the dparam(5) man page - which contains the file format description for dparams - you would enterman 5 dparam.


Section 1


Section 5
Special functionality

globus-url-copy (aka Globus GridFTP client (GGC))

  • Type globus-url-copy and hit the ENTER/RETURN key to get a brief usage message. Use globus-url-copy -help and man globus-url-copy to get a description of all globus-url-copy options.
  • The basic syntax is:
globus-url-copy [optional command line switches] source destination
  • Source and destination can be further resolved to:

<pre< globus-url-copy [optional command line switches] {gsiftp://<server address>:<port> | file://}<absolute path> {gsiftp://<server address>:<port> | file://}<absolute path>

  • Files on remote systems can be referenced by gsiftp:// URLs whereas local files have to be referenced by file:// URLs. The usage of gtransfer host aliases is not supported by globus-url-copy, hence you need to enter the server addresses and ports manually. Use the following table for reference:
Server address Port
gridftp-fr1.hww.hlrs.de 2812
gridftp-fr2.hww.hlrs.de 2812

Example:

$ globus-url-copy -cc 2 -tcp-bs 4M -p 2 -cd gsiftp://gridftp-fr1.hww.hlrs.de:2812/lustre/hpe/ws10/ws10.3/ws/user-workspace/file* gsiftp://gridftp.domain.tld:2811/~/


Documentation

See the Grid Community Toolkit documentation on globus-url-copy for more details.

Manual pages man globus-url-copy

Further Information