- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Data Encryption
Overview
In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidential level of your data, it is strongly recommended to use encryption for your data.
Looking at the process in HPC computing, the following steps can be identified.
Action | Step in HPC process | Suggested encryption method | Additional notes |
---|---|---|---|
Transfer | Moving input data from your system/organization to HLRS | Use encrypted data transfer (SSH, GridFTP, UFTP) | In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option. |
Setup | Configure jobs, install software, ... | Normally data is untouched thus no data encryption handling is necessary | |
Submission | Submitting your configured job to the batch system | gpg, ... | Since the job is executed non-interactive this is the last step where you can decrypt your data for processing |
Computing | Executing HPC Job including pre/post processing if executed via batch system. | gpg, ... | During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or piped output you might be able to directly encrypt output data asymmetrically on each compute node. |
Transfer | Moving results, logs, ... back to you organization | Use encrypted data transfer (SSH, GridFTP, UFTP) | In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option. |
Storing | Store data between other steps without interaction | gpg, ... | Many other options are possible, depending on your requirements. |
Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the folloing section we will deal with the most important aspects which we currently support, the list will be extended over time. In the end we will also list a few additionally available tools which are currently not supported, which might be of interest for further data encryption issues.
Transfer encryption
We offer mainly three ways to transfer data.
- SSH: Most users simply use SSH to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding transfer pages
- GridFTP/Unicore FTP: For very large transfers we offer GridFTP and Unicore UFTP, both protocols no not encrypt by default. Both protocols have similar options especially for setting up parallel processes and streams to increase transfer rates. Both also allow to use encryption
globus-url-copy -dcpriv
orglobus-url-copy -data-channel-private
export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e
orexport UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt
+ setting the environment variable
Encryption while stored
To encrypt data which will be stored for a certain tain without usage, there are many options to do it. One widely used tool is the gpg (Gnu privacy guard) which allows different encryption, decryption and signing actions. We will start with the quite easy symmetric encryption and the more complex asymmetric encryption which allows a extended encryption under certain conditions.
Symmetric encryption with gpg
Overview of used options:
-c --symmetric Encrypt with a symmetric key -d --decrypt Decrpyt a file by providing the secret key -o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console --no-symkey-cache Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in and automatically decrypts the file. The behaviour is quite convenient, but not always wanted.
The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your skript. So the config (with the secret key) is at least separated from the skript and the data, but stored in cleartext in your HOME. In case you have the idea to use some external service whcih should provide you key directly to your job, then you ened some authentication credentials which then need to be stored in cleartext. This makes it more complex but not more secure.
Asymmetric encryption
The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way.
Asymmetric is usable in teh same way as symmetric encryption, but then private and public key need to be available on the target system. Thus the main advantage ot two keys is no longer relevant. This asymetric encryption is only feasable in the following situations where it might be of interst for you:
- your data files are separated by node
- you do not want to read the job output prior to transfer it back to your secured organization space
- the output can directly be piped in a file.
- you can use a in-memory filesystem
In these situation you might be able to encrypt data prior it is stored on the global disc space.
The following command allow assymmetric encryption
Create a key pair and store it in the keyring on your local computer. Later you may need to copy the private key to the system where you want to decrypt
Export public key only, then copy it to the HLRS system where you want to encrypt data
Verify key fingerprint before importing, for foreign key always recommended
Import the public key on the system where you want to encryp
Optionally create an additional keyring
Import public key, Copy keyring file to target system
Encrypted file end with .asc if -a ASCII is used, otherwise .gpg.
Decrypt after copying encrypted file to your local secured storage
Overview of used options:
-K --list-secret-keys List all private keys in keyring -k --list-public-keys List al public keys in keyring --delete-keys <ID or name> Delete the corresponding public key from keyring --delete-secret-keys <...> Delete the corresponding private key (4 times approval ) --delete-secret-and-public-keys <...> Combination of both -a --armor ASCII Text instead binary --export <ID or name> export public key -o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console --fingerprint <ID or string> Print Fingerprint for a key to easy identify or compare keys --full-gen-key Create new keypair fully dialog supported (name + mail + comment + key options) --gen-key Create new keypair with short dialog (only name + mail), useful to distinguish keys -e --encrypt Encrypt -r --recipient Recipient, the corresponding key is used for encryption -d --decrypt Decrypt
Experienced errors:
- When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes even once you terminated the hanging command - a passphrase prompt is shown where none is expected. This is probably caused by a timeout of this asynchronous pinentry process where first by default a graphical input dialogs is tried which might fail especially via ssh and no local x Server. Once it failed the failback console pinentry is used. To solve this issue, use can add "pinentry-program /usr/bin/pinentry-curses" to your .gnupg/gpg-agent.conf file.
- When you have several private keys in your keyring with the same tag and you try to delete one by tag if might happen that you get the error that there is no such key. This seems to happen when you already deleted one with this tag. Simply use another tag or the ID. For public keys this seems to not cause any problems.