- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Data Encryption

From HLRS Platforms
Revision as of 16:16, 24 April 2024 by Hpcbuch (talk | contribs) (Created page with "= Overview = In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidential level of your data, it is strongly recommended to use encryption for your data. Looking at the process in HPC computing, the following steps can be identified. {| class="wikitable" ! Acti...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Overview

In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidential level of your data, it is strongly recommended to use encryption for your data.

Looking at the process in HPC computing, the following steps can be identified.

Action Step in HPC process Suggested encryption method Additional notes
Transfer Moving input data from your system/organization to HLRS Use encrypted data transfer (SSH, GridFTP, UFTP) In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option.
Setup Configure jobs, install software, ... Normally data is untouched thus no data encryption handling is necessary
Submission Submitting your configured job to the batch system gpg, ... Since the job is executed non-interactive this is the last step where you can decrypt your data for processing
Computing Executing HPC Job including pre/post processing if executed via batch system. gpg, ... During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or piped output you might be able to directly encrypt output data asymmetrically on each compute node.
Transfer Moving results, logs, ... back to you organization Use encrypted data transfer (SSH, GridFTP, UFTP) In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option.
Storing Store data between other steps without interaction gpg, ... Many other options are possible, depending on your requirements.

Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the folloing section we will deal with the most important aspects which we currently support, the list will be extended over time. In the end we will also list a few additionally available tools which are currently not supported, which might be of interest for further data encryption issues.

Transfer encryption

We offer mainly three ways to transfer data.

  • SSH: Most users simply use SSH to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding transfer pages
  • GridFTP/Unicore FTP: For very large transfers we offer GridFTP and Unicore UFTP, both protocols no not encrypt by default. Both protocols have similar options especially for setting up parallel processes and streams to increase transfer rates. Both also allow to use encryption
    • globus-url-copy -dcpriv or globus-url-copy -data-channel-private
    • export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e or export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt + setting the environment variable

Encryption while stored

To encrypt data which will be stored for a certain tain without usage, there are many options to do it. One widely used tool is the gpg (Gnu privacy guard) which allows different encryption, decryption and signing actions. We will start with the quite easy symmetric encryption and the more complex asymmetric encryption which allows a extended encryption under certain conditions.

Symmetric encryption with gpg

gpg -c --no-symkey-cache <Filename>
gpg -d --no-symkey-cache --output <Target cleartext file> <encrypted file>


Overview of used options:

-c --symmetric              Encrypt with a symmetric key
-d --decrypt                Decrpyt a file by providing the secret key
-o --output <filename>      Filename where the output should be redirected to. By default the output is printed to the console
--no-symkey-cache           Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the 
                            stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in 
							and automatically decrypts the file. The behaviour is quite convenient, but not always wanted.

The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your skript. So the config (with the secret key) is at least separated from the skript and the data, but stored in cleartext in your HOME. In case you have the idea to use some external service whcih should provide you key directly to your job, then you ened some authentication credentials which then need to be stored in cleartext. This makes it more complex but not more secure.

Asymmetric encryption

The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way.

Asymmetric is usable in teh same way as symmetric encryption, but then private and public key need to be available on the target system. Thus the main advantage ot two keys is no longer relevant. This asymetric encryption is only feasable in the following situations where it might be of interst for you:

  • your data files are separated by node
  • you do not want to read the job output prior to transfer it back to your secured organization space
  • the output can directly be piped in a file.
  • you can use a in-memory filesystem

In these situation you might be able to encrypt data prior it is stored on the global disc space.

The following command allow assymmetric encryption

gpg --full-gen-key Create a key pair and store it in the keyring on your local computer. Later you may need to copy the private key to the system where you want to decrypt gpg -a --output <pubkey file> --export <ID or name> Export public key only, then copy it to the HLRS system where you want to encrypt data gpg --fingerprint <pubkey file> Verify key fingerprint before importing, for foreign key always recommended gpg --import <pubkey file> Import the public key on the system where you want to encrypt gpg --no-default-keyring --keyring publickeys.gpg --list-keys Optionally create an additional keyring gpg --no-default-keyring --keyring publickeys.gpg --import <pubkey file> Import public key, Copy keyring file to target system gpg -e --armor -recipient <ID or name> <file> Encrypted file end with .asc if -a ASCII is used, otherwise .gpg. gpg -d --output <decrypted file> <encrypted file> Decrypt after copying encrypted file to your local secured storage

Overview of used options: -K --list-secret-keys List all private keys in keyring -k --list-public-keys List al public keys in keyring

  --delete-keys <ID or name>               Delete the corresponding public key from keyring
  --delete-secret-keys <...>               Delete the corresponding private key (4 times approval )
  --delete-secret-and-public-keys <...>    Combination of both

-a --armor ASCII Text instead binary

  --export <ID or name>                    export public key

-o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console

  --fingerprint <ID or string>             Print Fingerprint for a key to easy identify or compare keys
  --full-gen-key                           Create new keypair fully dialog supported (name + mail + comment + key options)
  --gen-key                                Create new keypair with short dialog (only name + mail), comment maybe useful to distinguish keys

-e --encrypt Encrypt -r --recipient Recipient, the corresponding key is used for encryption -d --decrypt Decrypt

Experienced errors:

  • When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes
- even once you terminated the hanging command - a passphrase prompt is shown where none is expected. This is probably caused by a timeout of this asynchronous pinentry process where first by default a graphical input dialogs  is tried which might fail especially via ssh and no local x Server. Once it failed the failback console pinentry is used. To solve this issue, use can add "pinentry-program /usr/bin/pinentry-curses" to your .gnupg/gpg-agent.conf file.
  • When you have several private keys in your keyring with the same tag and you try to delete one by tag if might happen that you get the error that there is no such key. This seems to happen when you already deleted one with this tag. Simply use another tag or the ID. For public keys this seems to not cause any problems.