- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Data Encryption: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
No edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
= Overview =
= Overview =
In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidential level of your data, it is strongly recommended to use encryption for your data.
In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidentiality level of your data, it is generally recommended to use encryption for your data.


Looking at the process in HPC computing, the following steps can be identified.  
Looking at the process in HPC computing, the following steps can be identified.  
Line 11: Line 11:
| Setup || Configure jobs, install software, ... || || Normally data is untouched thus no data encryption handling is necessary
| Setup || Configure jobs, install software, ... || || Normally data is untouched thus no data encryption handling is necessary
|-
|-
| Submission || Submitting your configured job to the batch system || gpg, ... || Since the job is executed non-interactive this is the last step where you can decrypt your data for processing
| Submission || Submitting your configured job to the batch system || gpg, ... || Since the job is executed non-interactively this is the last step where you can decrypt your data for processing
|-
|-
| Computing || Executing HPC Job including pre/post processing if executed via batch system. || gpg, ... || During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or  piped output you might be able to directly encrypt output data asymmetrically on each compute node.
| Computing || Executing HPC Job including pre/post processing if executed via batch system. || gpg, ... || During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or  piped output you might be able to directly encrypt output data asymmetrically on each compute node or temporarily store unencrypted data in ramdisk file systems before encrypting it.
|-
|-
| Transfer || Moving results, logs, ... back to you organization || Use encrypted data transfer (SSH, GridFTP, UFTP) || In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option.
| Transfer || Moving results, logs, ... back to you organization || Use encrypted data transfer (SSH, GridFTP, UFTP) || In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option.
Line 21: Line 21:
|}
|}


Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the folloing section we will deal with the most important aspects which we currently support, the list will be extended over time. In the end we will also list a few additionally available tools which are currently not supported, which might be of interest for further data encryption issues.
Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the following section we will deal with the most important aspects which we currently support, the list will be extended over time. In the end we will also list a few additionally available tools which are currently not supported, which might be of interest for further data encryption issues.


= Transfer encryption =
= Transfer encryption =
We offer mainly three ways to transfer data.
We offer mainly three ways to transfer data.
*  SSH: Most users simply use SSH to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding transfer pages
*  SSH: Most users simply use SSH/scp to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding wiki pages about data transfer solutions
* GridFTP/Unicore FTP: For very large transfers we offer [[Data Transfer with GridFTP | GridFTP]] and [[Data Transfer with UFTP | Unicore UFTP]], both protocols no not encrypt by default. Both protocols have similar options especially for setting up parallel processes and streams to increase transfer rates. Both also allow to use encryption  
* GridFTP/Unicore FTP: For very large transfers we offer [[Data Transfer with GridFTP | GridFTP]] and [[Data Transfer with UFTP | Unicore UFTP]], both protocols do not encrypt by default. Both protocols have similar options especially for setting up parallel processes and streams to increase transfer rates. Both also allow to use encryption  
** <code>globus-url-copy -dcpriv</code> or <code>globus-url-copy -data-channel-private</code>
** <code>globus-url-copy -dcpriv</code> or <code>globus-url-copy -data-channel-private</code>
** <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e</code> or <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt</code> + setting the environment variable
** <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e</code> or <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt</code> + setting the environment variable


= Encryption while stored =
= Encryption while stored =
To encrypt data which will be stored for a certain tain without usage, there are many options to do it. One widely used tool is the gpg (Gnu privacy guard) which allows different encryption, decryption and signing actions. We will start with the quite easy symmetric encryption and the more complex asymmetric encryption which allows a extended encryption under certain conditions.
To encrypt data which will be stored for a certain time without usage, there are many options to do it. One widely used tool is gpg (Gnu privacy guard) which allows different encryption, decryption and signing actions. We will start with the quite easy symmetric encryption and the more complex asymmetric encryption which allows more advanced encryption under certain conditions.


== Symmetric encryption with gpg ==
== Symmetric encryption with gpg ==
Line 40: Line 40:
<pre>
<pre>
-c --symmetric              Encrypt with a symmetric key
-c --symmetric              Encrypt with a symmetric key
-d --decrypt                Decrpyt a file by providing the secret key
-d --decrypt                Decrypt a file by providing the secret key
-o --output <filename>      Filename where the output should be redirected to. By default the output is printed to the console
-o --output <filename>      Filename where the output should be redirected to. By default the output is printed to the console
--passphrase                Optionally provide passphrase directly to avoid interactive prompt
--passphrase-file          Optionally provide passphrase in file i.e. for scripted usage
--no-symkey-cache          Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the  
--no-symkey-cache          Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the  
                             stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in  
                             stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in  
                             and automatically decrypts the file. The behaviour is quite convenient, but not always wanted.
                             and automatically decrypts the file. The behavior is quite convenient, but not always wanted.
</pre>
</pre>
The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your skript. So the config (with the secret key) is at least separated from the skript and the data, but stored in cleartext in your HOME.
The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your script. So the config (with the secret key) is at least separated from the script and the data, but stored in cleartext in your $HOME.
In case you have the idea to use some external service whcih should provide you key directly to your job, then you ened some authentication credentials which then need to be stored in cleartext. This makes it more complex but not more secure.
In case you have the idea to use some external service which should provide your key directly to your job, then you need some authentication credentials which then need to be stored in cleartext. This makes it more complex but not more secure.


== Asymmetric encryption ==
== Asymmetric encryption ==
Line 53: Line 55:
The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way.
The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way.


Asymmetric is usable in teh same way as symmetric encryption, but then private and public key need to be available on the target system. Thus the main advantage ot two keys is no longer relevant. This asymetric encryption is only feasable in the following situations where it might be of interst for you:
Asymmetric encryption is usable in the same way as symmetric encryption, but then private and public key need to be available on the target system. Thus the main advantage of two keys is no longer relevant. This asymmetric encryption is only feasible in the following situations where it might be of interest for you:
* your data files are separated by node
* your data files are separated by node
* you do not want to read the job output prior to transfer it back to your secured organization space
* you do not want to read the job output prior to transfer it back to your secured organization space
* the output can directly be piped in a file.
* the output can directly be piped in a file.
* you can use a in-memory filesystem  
* you can use an in-memory filesystem  
In these situation you might be able to encrypt data prior it is stored on the global disc space.
In these situations you might be able to encrypt data prior it is stored on the global disc space.


The following command allow assymmetric encryption
The following command allow asymmetric encryption


Create a key pair and store it in the keyring on your local computer. Later you may need to copy the private key to the system where you want to decrypt
Create a key pair and store it in the keyring on your local computer. Later you may need to copy the private key to the system where you want to decrypt
Line 68: Line 70:
Verify key fingerprint before importing, for foreign key always recommended
Verify key fingerprint before importing, for foreign key always recommended
{{Command|command=gpg --fingerprint <pubkey file>}}
{{Command|command=gpg --fingerprint <pubkey file>}}
Import the public key on the system where you want to encryp
Import the public key on the system where you want to encrypt
{{Command|command=gpg --import <pubkey file>}}
{{Command|command=gpg --import <pubkey file>}}
Optionally create an additional keyring
Optionally create an additional keyring
Line 76: Line 78:
Encrypted file end with .asc if -a ASCII is used, otherwise .gpg.  
Encrypted file end with .asc if -a ASCII is used, otherwise .gpg.  
{{Command|command=gpg -e --armor -recipient <ID or name> <file>}}
{{Command|command=gpg -e --armor -recipient <ID or name> <file>}}
Decrypt after copying encrypted file to your local secured storage
Decrypt after copying encrypted file back to your local secured storage
{{Command|command=gpg -d --output <decrypted file> <encrypted file>}}
{{Command|command=gpg -d --output <decrypted file> <encrypted file>}}


Line 98: Line 100:
-d --decrypt                                Decrypt
-d --decrypt                                Decrypt
</pre>
</pre>
'''Important Notes:'''
* Don't forget to make sure the unencrypted data is removed once the encryption is done. We recommend the usage of <tt>shred -fu <FILENAME></tt>, but we also refer to the disclaimer in the man page which states that it still can not be guaranteed that the data in fact is overwritten (although shred at least attempts to do so).
* It is up to the user to keep the keys in a safe place, which means both aspects, confidentiality and availability. Everyone who has access to the (private) key is able to decrypt the data. On the other hand if the key is lost, there is no way to recover the encrypted data, not even for the admin users.


Experienced errors:
Experienced errors:


* When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes even once you terminated the hanging command - a passphrase prompt is shown where none is expected. This is probably caused by a timeout of this asynchronous pinentry process where first by default a graphical input dialogs  is tried which might fail especially via ssh and no local x Server. Once it failed the failback console pinentry is used. To solve this issue, use can add "pinentry-program /usr/bin/pinentry-curses" to your .gnupg/gpg-agent.conf file.
* When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes even once you terminated the hanging command - a passphrase prompt is shown where none is expected. This is probably caused by a timeout of this asynchronous pinentry process where first by default a graphical input dialog is tried which might fail especially via ssh and no local x Server. Once it failed the failback console pinentry is used. To solve this issue, you can add "pinentry-program /usr/bin/pinentry-curses" to your .gnupg/gpg-agent.conf file.
* When you have several private keys in your keyring with the same tag and you try to delete one by tag if might happen that you get the error that there is no such key. This seems to happen when you already deleted one with this tag. Simply use another tag or the ID. For public keys this seems to not cause any problems.
* When you have several private keys in your keyring with the same tag and you try to delete one by tag it might happen that you get the error that there is no such key. This seems to happen when you already deleted one with this tag. Simply use another tag or the ID. For public keys this seems to not cause any problems.

Latest revision as of 17:52, 24 April 2024

Overview

In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidentiality level of your data, it is generally recommended to use encryption for your data.

Looking at the process in HPC computing, the following steps can be identified.

Action Step in HPC process Suggested encryption method Additional notes
Transfer Moving input data from your system/organization to HLRS Use encrypted data transfer (SSH, GridFTP, UFTP) In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option.
Setup Configure jobs, install software, ... Normally data is untouched thus no data encryption handling is necessary
Submission Submitting your configured job to the batch system gpg, ... Since the job is executed non-interactively this is the last step where you can decrypt your data for processing
Computing Executing HPC Job including pre/post processing if executed via batch system. gpg, ... During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or piped output you might be able to directly encrypt output data asymmetrically on each compute node or temporarily store unencrypted data in ramdisk file systems before encrypting it.
Transfer Moving results, logs, ... back to you organization Use encrypted data transfer (SSH, GridFTP, UFTP) In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option.
Storing Store data between other steps without interaction gpg, ... Many other options are possible, depending on your requirements.

Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the following section we will deal with the most important aspects which we currently support, the list will be extended over time. In the end we will also list a few additionally available tools which are currently not supported, which might be of interest for further data encryption issues.

Transfer encryption

We offer mainly three ways to transfer data.

  • SSH: Most users simply use SSH/scp to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding wiki pages about data transfer solutions
  • GridFTP/Unicore FTP: For very large transfers we offer GridFTP and Unicore UFTP, both protocols do not encrypt by default. Both protocols have similar options especially for setting up parallel processes and streams to increase transfer rates. Both also allow to use encryption
    • globus-url-copy -dcpriv or globus-url-copy -data-channel-private
    • export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e or export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt + setting the environment variable

Encryption while stored

To encrypt data which will be stored for a certain time without usage, there are many options to do it. One widely used tool is gpg (Gnu privacy guard) which allows different encryption, decryption and signing actions. We will start with the quite easy symmetric encryption and the more complex asymmetric encryption which allows more advanced encryption under certain conditions.

Symmetric encryption with gpg

gpg -c --no-symkey-cache <Filename>
gpg -d --no-symkey-cache --output <Target cleartext file> <encrypted file>


Overview of used options:

-c --symmetric              Encrypt with a symmetric key
-d --decrypt                Decrypt a file by providing the secret key
-o --output <filename>      Filename where the output should be redirected to. By default the output is printed to the console
--passphrase                Optionally provide passphrase directly to avoid interactive prompt
--passphrase-file           Optionally provide passphrase in file i.e. for scripted usage
--no-symkey-cache           Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the 
                            stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in 
                            and automatically decrypts the file. The behavior is quite convenient, but not always wanted.

The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your script. So the config (with the secret key) is at least separated from the script and the data, but stored in cleartext in your $HOME. In case you have the idea to use some external service which should provide your key directly to your job, then you need some authentication credentials which then need to be stored in cleartext. This makes it more complex but not more secure.

Asymmetric encryption

The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way.

Asymmetric encryption is usable in the same way as symmetric encryption, but then private and public key need to be available on the target system. Thus the main advantage of two keys is no longer relevant. This asymmetric encryption is only feasible in the following situations where it might be of interest for you:

  • your data files are separated by node
  • you do not want to read the job output prior to transfer it back to your secured organization space
  • the output can directly be piped in a file.
  • you can use an in-memory filesystem

In these situations you might be able to encrypt data prior it is stored on the global disc space.

The following command allow asymmetric encryption

Create a key pair and store it in the keyring on your local computer. Later you may need to copy the private key to the system where you want to decrypt

gpg --full-gen-key

Export public key only, then copy it to the HLRS system where you want to encrypt data

gpg -a --output <pubkey file> --export <ID or name>

Verify key fingerprint before importing, for foreign key always recommended

gpg --fingerprint <pubkey file>

Import the public key on the system where you want to encrypt

gpg --import <pubkey file>

Optionally create an additional keyring

gpg --no-default-keyring --keyring publickeys.gpg --list-keys

Import public key, Copy keyring file to target system

gpg --no-default-keyring --keyring publickeys.gpg --import <pubkey file>

Encrypted file end with .asc if -a ASCII is used, otherwise .gpg.

gpg -e --armor -recipient <ID or name> <file>

Decrypt after copying encrypted file back to your local secured storage

gpg -d --output <decrypted file> <encrypted file>


Overview of used options:

-K --list-secret-keys                       List all private keys in keyring
-k --list-public-keys                       List al public keys in keyring
   --delete-keys <ID or name>               Delete the corresponding public key from keyring
   --delete-secret-keys <...>               Delete the corresponding private key (4 times approval )
   --delete-secret-and-public-keys <...>    Combination of both

-a --armor                                  ASCII Text instead binary
   --export <ID or name>                    export public key
-o --output <filename>                      Filename where the output should be redirected to. 
                                            By default the output is printed to the console
   --fingerprint <ID or string>             Print Fingerprint for a key to easy identify or compare keys
   --full-gen-key                           Create new keypair fully dialog supported (name + mail + comment + key options)
   --gen-key                                Create new keypair with short dialog (only name + mail), useful to distinguish keys
-e --encrypt                                Encrypt
-r --recipient                              Recipient, the corresponding key is used for encryption
-d --decrypt                                Decrypt

Important Notes:

  • Don't forget to make sure the unencrypted data is removed once the encryption is done. We recommend the usage of shred -fu <FILENAME>, but we also refer to the disclaimer in the man page which states that it still can not be guaranteed that the data in fact is overwritten (although shred at least attempts to do so).
  • It is up to the user to keep the keys in a safe place, which means both aspects, confidentiality and availability. Everyone who has access to the (private) key is able to decrypt the data. On the other hand if the key is lost, there is no way to recover the encrypted data, not even for the admin users.

Experienced errors:

  • When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes even once you terminated the hanging command - a passphrase prompt is shown where none is expected. This is probably caused by a timeout of this asynchronous pinentry process where first by default a graphical input dialog is tried which might fail especially via ssh and no local x Server. Once it failed the failback console pinentry is used. To solve this issue, you can add "pinentry-program /usr/bin/pinentry-curses" to your .gnupg/gpg-agent.conf file.
  • When you have several private keys in your keyring with the same tag and you try to delete one by tag it might happen that you get the error that there is no such key. This seems to happen when you already deleted one with this tag. Simply use another tag or the ID. For public keys this seems to not cause any problems.