- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Data Transfer with UFTP

From HLRS Platforms
Jump to navigationJump to search

UFTP Service

UFTP is a data streaming library and file transfer tool. It is integrated into UNICORE, allowing to transfer data from client to server (and vice versa), as well as providing data staging between UFTP-enabled UNICORE sites. UFTP can also be used independently from UNICORE, requiring a authentication server and a standalone UFTP client. You can install the standalone client in your desktop and use it for copying data from or to the file systems of the Supercomputers

Authentication server: gridftp-fr1.hww.hlrs.de

Port: 9000

Currently we have not connected the HOME file system to the datatransfer node thus for using UFTP we need to manually add you ssh public key to the uftp server config.. Please create a request in our Trouble Ticket system and add your username and public ssh key together with your email address which is registered with your account.

Common used command examples (unfinished)

All commands you execute are sent to the Data transfer Frontend (FE) while data transfers are then initiated between the client and the Data Transfer Backends (BE) The Basic command looks like the following

uftp <subcommand> <user identification> <URL> <additional options>

uftp <subcommand> -h gives you more information about the details of this subcommand

The following "subcommands" examples are useful, the subcommand are similar to the normal linux commands


"info": Get information about the used Frontend

uftp info -u <username> https://gridftp-fr1.hww.hlrs.de:9000/rest/auth/

The result shows your identity, the BEs connected to the used FE and their status. The main information to be used later is the provided URL base for a logical Server.


For all other subcommands, you need the full url including the logical name of the transfer server and the directory path

  • <username> is your username on the HLRs resources
  • <URL base> is retreived form the info command including ":"
  • "HLRS" in the URL points to the transfer cluster where several backends are used alternatively. You don't have to care if a transfer server is down temporarily.


"ls": List your remote home directory

Syntax: uftp ls -u <username> <URL base>:<directory>

Example: uftp ls -u <username> https://gridftp-fr1.hww.hlrs.de:9000/rest/auth/HLRS:/lustre/hpe/ws10/ws10.3/ws/hpcbuch-test


"mkdir": Create remote directories

Syntax: uftp mkdir -u <username> <URL base>:<new directory path>

Example: uftp mkdir -u <username> https://gridftp-fr1.hww.hlrs.de:9000/rest/auth/HLRS::/lustre/hpe/ws10/ws10.3/ws/hpcbuch-test/testdir


"cp": copy files from local to remote of from remote to local (the trailing . is important, alternative is an absolute or relative path)

Syntax: uftp cp -u <username> <source local path or base URL + ":" + absolute/relative path> <target base URL + ":" + absolute/relative path or local path > / Example: uftp cp -u <username> <testfile> https://gridftp-fr1.hww.hlrs.de:9000/rest/auth/HLRS:/lustre/hpe/ws10/ws10.3/ws/hpcbuch-test/

Example: uftp cp -u <username> https://gridftp-fr1.hww.hlrs.de:9000/rest/auth/HLRS:/lustre/hpe/ws10/ws10.3/ws/hpcbuch-test ./testfile2


"rm": Delete remote files or directories

Syntax: uftp rm -u <username> <URL base>:<absolute path>

Example: uftp rm -u <username> https://gridftp-fr1.hww.hlrs.de:9000/rest/auth/HLRS::/lustre/hpe/ws10/ws10.3/ws/hpcbuch-test/testfile

File/directory path

For transfers you need to use the same file path as used on the clusters, i.e.:

  • WS10: /lustre/hpe/ws10/ws10.3/ws/hpcbuch-test
  • WS9: /lustre/cray/ws9/6/ws/hpcbuch-test

Additional command line options (unfinished)

  • -t,--threads <arg> Use specified number of UFTP connections (threads), this allows parallel thread reducing the limitation to a single core and thus higher throughput
  • -n,--streams <Streams> Number of tcp streams per connection/thread, may increase the bandwidth within on thread.
  • -a,--archive Tell server to interpret data as tar/zip stream and unpack it. This feature is relatively new and supports the transfer of huge numbers of small files. Each file would produce additional overhead (similar to new connections). Thus using tar and piping teh result in uftp is an option. Please use this with caution and you can only use id to push data to HLRS since you can't run tar/pipe for pulling data without local login.
  • -e, --encrypt Encrypt data connections (default is unencrypted)
  • -c, --compress Compress data for transfer (default is uncompressed)

Setup at HLRS

By design you need to have console access to a system where either the source or destination filesystem is directly mounted/accessible. The other side needs to have a running UFTPD server with access to the destination/source filesystem.

Here at HLRS we have 5 data transfer nodes (data transfer backends/BE) available and two authentication server (data transfer frontends/FE). The main systen is gridftp-fr1 using the majority of BEs for transfers and increased bandwidth usage.

Currently we have not connected the HOME file system to the datatransfer node thus for using UFTP we need to manually add you ssh public key to the uftp server config.

Possible scenarios (unfinished)

Since pushing and pulling data is possible, it does not matter on which site of the transfer you are logged in.

  • Upload/download from/to a your home site (normally not running UFTP servers) to HLRS
    • Install the UFTP client or use a preinstalled client at your site
  • Upload/download from a UFTP enabled site (like GCS Centers) from/to HLRS

You have then two options

    • Initiating the transfer from the remote site
      • Same as for your home site, but the client is already installed
    • Initiating the transfer from HLRS (not yet ready) to remote site
      • Possible options, not implemented yet
      • Login in to our data transfer node where client tools are already installed
      • Run the client from the Cluster frontend

UFTP client

The UFTP standalone Client is a Java-based client for UFTP and runs under Linux. It allows to list remote directories, copy files (with many options such as wildcards), and sync single files. It supports username/password authentication and ssh-key authentication to a UFTP Authentication Server.

Installation

First you need to have a working Java 8 Runtime Environment The Unicore download page contains all components for Unicore. Unless you want to use the full Unicore potential, the UFTP Client is enough and you can use the direct link to sourceforge. The directory for a version contains three installation methods

  • RPM based version (for Redhat based systems, also CentOS, Scientific Linux, ...)
  • DEB based version (for Debian based systems, also Ubuntu, ...)
  • zip Archive (for all systems including above mentioned ones)

You can use any of the listed installation methods. Since RPM and DEB should be clear, we explain only the manual installation (Detailed instrution can be found here)

  • Download the newest client uftp-client-x.x.x-all.zip file
  • Extract it where ever you want
    • unzip uftp-client-<uftp_client_version>-all.zip -d <UFTP_CLIENT_INSTALL_DIR/>
  • run the client to verify proper environment
    • </UFTP_CLIENT_INSTALL_DIR>/bin/uftp -h
  • If this works, you can start to transfer data
  • For easier use you can create either an alias or adjust your $PATH variable in your ~/.bashrc
    • alias uftp=/<UFTP_CLIENT_INSTALL_DIR>/bin/uftp
    • export PATH=$PATH:/<UFTP_CLIENT_INSTALL_DIR>/bin/

UFTP detailed

Features

  • dynamic firewall port opening using a pseudo FTP connection. UFTPD requires only a single open port.
  • optional encryption of the data streams using a symmetric key algorithm
  • optional compression of the data streams (using gzip)
  • partial reads/writes to a file. If supported by the filesystem, multiple UFTP processes can thus read/write a file in parallel (striping)
  • supports efficient synchronization of single local and remote files using the rsync algorithm
  • integrated into UNICORE clients for fast file upload and download
  • integrated with UNICORE servers for fast data staging and server-to-server file transfers
  • standalone (non-UNICORE) client available

How does UFTP work?

The server part, called uftpd, listens on two ports (which may be on two different network interfaces):

  • the command port receives control commands (for connections from authentication server)
  • the listen port accepts data connections from clients.

The uftpd server is "controlled" (usually by UNICORE/X) via the command port, and receives/sends data directly from/to a user’s client machine or another UFTP enabled UNICORE server. Data connnections are made to the "listen" port, which has to be accessible from external machines. Firewalls have to treat the "listen" port as an FTP port. A UFTP file transfer works as follows:

  • the UNICORE/X server (or authentication server) sends a request to the command port. This request notifies the UFTPD server about the upcoming transfer and contains the following information
    • the client’s IP address
    • the source/target file name
    • whether to send or receive data
    • a "secret", i.e. a string the client will send to authenticate itself
    • how many data connections will be opened
    • the user and group id for who to create the file (in case of send mode)
    • an optional key to encrypt/decrypt the data
  • the UFTPD server will now accept an incoming connection from the announced IP address, provided the supplied "secret" matches the expectation.
  • if everything is OK, the requested number of data connections from the client can be opened. Firewall transversal will be negotiated using a pseudo FTP protocol.
  • the file is sent/received using the requested number of data connections
  • to access the requested file, uftpd attempts to switch its user id to the requested one prior to reading/writing the file. This uses a C library which is accessed from Java via the Java native interface (JNI). See also the installation section below.