- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Data Encryption: Difference between revisions
(Created page with "= Overview = In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidential level of your data, it is strongly recommended to use encryption for your data. Looking at the process in HPC computing, the following steps can be identified. {| class="wikitable" ! Acti...") |
No edit summary |
||
(2 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
= Overview = | = Overview = | ||
In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the | In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidentiality level of your data, it is generally recommended to use encryption for your data. | ||
Looking at the process in HPC computing, the following steps can be identified. | Looking at the process in HPC computing, the following steps can be identified. | ||
Line 11: | Line 11: | ||
| Setup || Configure jobs, install software, ... || || Normally data is untouched thus no data encryption handling is necessary | | Setup || Configure jobs, install software, ... || || Normally data is untouched thus no data encryption handling is necessary | ||
|- | |- | ||
| Submission || Submitting your configured job to the batch system || gpg, ... || Since the job is executed non- | | Submission || Submitting your configured job to the batch system || gpg, ... || Since the job is executed non-interactively this is the last step where you can decrypt your data for processing | ||
|- | |- | ||
| Computing || Executing HPC Job including pre/post processing if executed via batch system. || gpg, ... || During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or piped output you might be able to directly encrypt output data asymmetrically on each compute node. | | Computing || Executing HPC Job including pre/post processing if executed via batch system. || gpg, ... || During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or piped output you might be able to directly encrypt output data asymmetrically on each compute node or temporarily store unencrypted data in ramdisk file systems before encrypting it. | ||
|- | |- | ||
| Transfer || Moving results, logs, ... back to you organization || Use encrypted data transfer (SSH, GridFTP, UFTP) || In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option. | | Transfer || Moving results, logs, ... back to you organization || Use encrypted data transfer (SSH, GridFTP, UFTP) || In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option. | ||
Line 21: | Line 21: | ||
|} | |} | ||
Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the | Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the following section we will deal with the most important aspects which we currently support, the list will be extended over time. In the end we will also list a few additionally available tools which are currently not supported, which might be of interest for further data encryption issues. | ||
= Transfer encryption = | = Transfer encryption = | ||
We offer mainly three ways to transfer data. | We offer mainly three ways to transfer data. | ||
* SSH: Most users simply use SSH to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding transfer | * SSH: Most users simply use SSH/scp to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding wiki pages about data transfer solutions | ||
* GridFTP/Unicore FTP: For very large transfers we offer [[Data Transfer with GridFTP | GridFTP]] and [[Data Transfer with UFTP | Unicore UFTP]], both protocols | * GridFTP/Unicore FTP: For very large transfers we offer [[Data Transfer with GridFTP | GridFTP]] and [[Data Transfer with UFTP | Unicore UFTP]], both protocols do not encrypt by default. Both protocols have similar options especially for setting up parallel processes and streams to increase transfer rates. Both also allow to use encryption | ||
** <code>globus-url-copy -dcpriv</code> or <code>globus-url-copy -data-channel-private</code> | ** <code>globus-url-copy -dcpriv</code> or <code>globus-url-copy -data-channel-private</code> | ||
** <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e</code> or <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt</code> + setting the environment variable | ** <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e</code> or <code>export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt</code> + setting the environment variable | ||
= Encryption while stored = | = Encryption while stored = | ||
To encrypt data which will be stored for a certain | To encrypt data which will be stored for a certain time without usage, there are many options to do it. One widely used tool is gpg (Gnu privacy guard) which allows different encryption, decryption and signing actions. We will start with the quite easy symmetric encryption and the more complex asymmetric encryption which allows more advanced encryption under certain conditions. | ||
== Symmetric encryption with gpg == | == Symmetric encryption with gpg == | ||
Line 40: | Line 40: | ||
<pre> | <pre> | ||
-c --symmetric Encrypt with a symmetric key | -c --symmetric Encrypt with a symmetric key | ||
-d --decrypt | -d --decrypt Decrypt a file by providing the secret key | ||
-o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console | -o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console | ||
--passphrase Optionally provide passphrase directly to avoid interactive prompt | |||
--passphrase-file Optionally provide passphrase in file i.e. for scripted usage | |||
--no-symkey-cache Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the | --no-symkey-cache Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the | ||
stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in | stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in | ||
and automatically decrypts the file. The behavior is quite convenient, but not always wanted. | |||
</pre> | </pre> | ||
The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your | The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your script. So the config (with the secret key) is at least separated from the script and the data, but stored in cleartext in your $HOME. | ||
In case you have the idea to use some external service | In case you have the idea to use some external service which should provide your key directly to your job, then you need some authentication credentials which then need to be stored in cleartext. This makes it more complex but not more secure. | ||
== Asymmetric encryption == | == Asymmetric encryption == | ||
Line 53: | Line 55: | ||
The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way. | The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way. | ||
Asymmetric is usable in | Asymmetric encryption is usable in the same way as symmetric encryption, but then private and public key need to be available on the target system. Thus the main advantage of two keys is no longer relevant. This asymmetric encryption is only feasible in the following situations where it might be of interest for you: | ||
* your data files are separated by node | * your data files are separated by node | ||
* you do not want to read the job output prior to transfer it back to your secured organization space | * you do not want to read the job output prior to transfer it back to your secured organization space | ||
* the output can directly be piped in a file. | * the output can directly be piped in a file. | ||
* you can use | * you can use an in-memory filesystem | ||
In these | In these situations you might be able to encrypt data prior it is stored on the global disc space. | ||
The following command allow | The following command allow asymmetric encryption | ||
Create a key pair and store it in the keyring on your local computer. Later you may need to copy the private key to the system where you want to decrypt | |||
gpg - | {{Command|command=gpg --full-gen-key}} | ||
gpg -- | Export public key only, then copy it to the HLRS system where you want to encrypt data | ||
gpg -- | {{Command|command=gpg -a --output <pubkey file> --export <ID or name>}} | ||
gpg -- | Verify key fingerprint before importing, for foreign key always recommended | ||
gpg --no-default-keyring --keyring publickeys.gpg -- | {{Command|command=gpg --fingerprint <pubkey file>}} | ||
gpg - | Import the public key on the system where you want to encrypt | ||
gpg - | {{Command|command=gpg --import <pubkey file>}} | ||
Optionally create an additional keyring | |||
{{Command|command=gpg --no-default-keyring --keyring publickeys.gpg --list-keys}} | |||
Import public key, Copy keyring file to target system | |||
{{Command|command=gpg --no-default-keyring --keyring publickeys.gpg --import <pubkey file>}} | |||
Encrypted file end with .asc if -a ASCII is used, otherwise .gpg. | |||
{{Command|command=gpg -e --armor -recipient <ID or name> <file>}} | |||
Decrypt after copying encrypted file back to your local secured storage | |||
{{Command|command=gpg -d --output <decrypted file> <encrypted file>}} | |||
Overview of used options: | Overview of used options: | ||
<pre> | |||
-K --list-secret-keys List all private keys in keyring | -K --list-secret-keys List all private keys in keyring | ||
-k --list-public-keys List al public keys in keyring | -k --list-public-keys List al public keys in keyring | ||
Line 80: | Line 91: | ||
-a --armor ASCII Text instead binary | -a --armor ASCII Text instead binary | ||
--export <ID or name> export public key | --export <ID or name> export public key | ||
-o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console | -o --output <filename> Filename where the output should be redirected to. | ||
By default the output is printed to the console | |||
--fingerprint <ID or string> Print Fingerprint for a key to easy identify or compare keys | --fingerprint <ID or string> Print Fingerprint for a key to easy identify or compare keys | ||
--full-gen-key Create new keypair fully dialog supported (name + mail + comment + key options) | --full-gen-key Create new keypair fully dialog supported (name + mail + comment + key options) | ||
--gen-key Create new keypair with short dialog (only name + mail), | --gen-key Create new keypair with short dialog (only name + mail), useful to distinguish keys | ||
-e --encrypt Encrypt | -e --encrypt Encrypt | ||
-r --recipient Recipient, the corresponding key is used for encryption | -r --recipient Recipient, the corresponding key is used for encryption | ||
-d --decrypt Decrypt | -d --decrypt Decrypt | ||
</pre> | |||
'''Important Notes:''' | |||
* Don't forget to make sure the unencrypted data is removed once the encryption is done. We recommend the usage of <tt>shred -fu <FILENAME></tt>, but we also refer to the disclaimer in the man page which states that it still can not be guaranteed that the data in fact is overwritten (although shred at least attempts to do so). | |||
* It is up to the user to keep the keys in a safe place, which means both aspects, confidentiality and availability. Everyone who has access to the (private) key is able to decrypt the data. On the other hand if the key is lost, there is no way to recover the encrypted data, not even for the admin users. | |||
Experienced errors: | Experienced errors: | ||
* When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes | * When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes even once you terminated the hanging command - a passphrase prompt is shown where none is expected. This is probably caused by a timeout of this asynchronous pinentry process where first by default a graphical input dialog is tried which might fail especially via ssh and no local x Server. Once it failed the failback console pinentry is used. To solve this issue, you can add "pinentry-program /usr/bin/pinentry-curses" to your .gnupg/gpg-agent.conf file. | ||
* When you have several private keys in your keyring with the same tag and you try to delete one by tag it might happen that you get the error that there is no such key. This seems to happen when you already deleted one with this tag. Simply use another tag or the ID. For public keys this seems to not cause any problems. | |||
* When you have several private keys in your keyring with the same tag and you try to delete one by tag |
Latest revision as of 17:52, 24 April 2024
Overview
In general the importance or necessity of data encryption strongly depends on the individual user requirements. In cases where you use only public data, there is normally no need to encrypt at all. When you use confidential data or you are not sure about the confidentiality level of your data, it is generally recommended to use encryption for your data.
Looking at the process in HPC computing, the following steps can be identified.
Action | Step in HPC process | Suggested encryption method | Additional notes |
---|---|---|---|
Transfer | Moving input data from your system/organization to HLRS | Use encrypted data transfer (SSH, GridFTP, UFTP) | In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option. |
Setup | Configure jobs, install software, ... | Normally data is untouched thus no data encryption handling is necessary | |
Submission | Submitting your configured job to the batch system | gpg, ... | Since the job is executed non-interactively this is the last step where you can decrypt your data for processing |
Computing | Executing HPC Job including pre/post processing if executed via batch system. | gpg, ... | During computation the data needs to be available unencrypted in most cases. In rare situations like single node jobs or piped output you might be able to directly encrypt output data asymmetrically on each compute node or temporarily store unencrypted data in ramdisk file systems before encrypting it. |
Transfer | Moving results, logs, ... back to you organization | Use encrypted data transfer (SSH, GridFTP, UFTP) | In case transport encryption is not feasible, data en-/decryption prior/after transport might be an option. |
Storing | Store data between other steps without interaction | gpg, ... | Many other options are possible, depending on your requirements. |
Unless your application can run directly on encrypted data there is no way to permanently keep your data encrypted. In the following section we will deal with the most important aspects which we currently support, the list will be extended over time. In the end we will also list a few additionally available tools which are currently not supported, which might be of interest for further data encryption issues.
Transfer encryption
We offer mainly three ways to transfer data.
- SSH: Most users simply use SSH/scp to copy their files to HLRS. In this case youn simply have to do nothing, since SSH is always encrypting your communication. For derails please visit the corresponding wiki pages about data transfer solutions
- GridFTP/Unicore FTP: For very large transfers we offer GridFTP and Unicore UFTP, both protocols do not encrypt by default. Both protocols have similar options especially for setting up parallel processes and streams to increase transfer rates. Both also allow to use encryption
globus-url-copy -dcpriv
orglobus-url-copy -data-channel-private
export UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp -e
orexport UFTP_ENCRYPTION_ALGORITHM = "AES"; uftp --encrypt
+ setting the environment variable
Encryption while stored
To encrypt data which will be stored for a certain time without usage, there are many options to do it. One widely used tool is gpg (Gnu privacy guard) which allows different encryption, decryption and signing actions. We will start with the quite easy symmetric encryption and the more complex asymmetric encryption which allows more advanced encryption under certain conditions.
Symmetric encryption with gpg
Overview of used options:
-c --symmetric Encrypt with a symmetric key -d --decrypt Decrypt a file by providing the secret key -o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console --passphrase Optionally provide passphrase directly to avoid interactive prompt --passphrase-file Optionally provide passphrase in file i.e. for scripted usage --no-symkey-cache Prevents caching keys in the gpg-agent. Without this option the agent will automatically use the stored keys instead asking for it. Even when using VI, the encrypted file the gpg-agent jumps in and automatically decrypts the file. The behavior is quite convenient, but not always wanted.
The secret key used for encryption should obviously not be stored along with the data. In case you decrypt the data prior submitting the job, you can interactively input the secret key. In case you decrypt the data within the HPC job, you have to store the secret somewhere, preferably in a configuration file for your job which is read by your script. So the config (with the secret key) is at least separated from the script and the data, but stored in cleartext in your $HOME. In case you have the idea to use some external service which should provide your key directly to your job, then you need some authentication credentials which then need to be stored in cleartext. This makes it more complex but not more secure.
Asymmetric encryption
The challenge in using asymmetric encryption only for your individual usage is the need to manage keys. By design the encryption is made for a larger number of participants and keys are managed in a keyring. In our scenario this would not be necessary, but gpg implements the full scenario and you more or less have to use it that way.
Asymmetric encryption is usable in the same way as symmetric encryption, but then private and public key need to be available on the target system. Thus the main advantage of two keys is no longer relevant. This asymmetric encryption is only feasible in the following situations where it might be of interest for you:
- your data files are separated by node
- you do not want to read the job output prior to transfer it back to your secured organization space
- the output can directly be piped in a file.
- you can use an in-memory filesystem
In these situations you might be able to encrypt data prior it is stored on the global disc space.
The following command allow asymmetric encryption
Create a key pair and store it in the keyring on your local computer. Later you may need to copy the private key to the system where you want to decrypt
Export public key only, then copy it to the HLRS system where you want to encrypt data
Verify key fingerprint before importing, for foreign key always recommended
Import the public key on the system where you want to encrypt
Optionally create an additional keyring
Import public key, Copy keyring file to target system
Encrypted file end with .asc if -a ASCII is used, otherwise .gpg.
Decrypt after copying encrypted file back to your local secured storage
Overview of used options:
-K --list-secret-keys List all private keys in keyring -k --list-public-keys List al public keys in keyring --delete-keys <ID or name> Delete the corresponding public key from keyring --delete-secret-keys <...> Delete the corresponding private key (4 times approval ) --delete-secret-and-public-keys <...> Combination of both -a --armor ASCII Text instead binary --export <ID or name> export public key -o --output <filename> Filename where the output should be redirected to. By default the output is printed to the console --fingerprint <ID or string> Print Fingerprint for a key to easy identify or compare keys --full-gen-key Create new keypair fully dialog supported (name + mail + comment + key options) --gen-key Create new keypair with short dialog (only name + mail), useful to distinguish keys -e --encrypt Encrypt -r --recipient Recipient, the corresponding key is used for encryption -d --decrypt Decrypt
Important Notes:
- Don't forget to make sure the unencrypted data is removed once the encryption is done. We recommend the usage of shred -fu <FILENAME>, but we also refer to the disclaimer in the man page which states that it still can not be guaranteed that the data in fact is overwritten (although shred at least attempts to do so).
- It is up to the user to keep the keys in a safe place, which means both aspects, confidentiality and availability. Everyone who has access to the (private) key is able to decrypt the data. On the other hand if the key is lost, there is no way to recover the encrypted data, not even for the admin users.
Experienced errors:
- When generating key pairs a hint for needed entropy is shown but seems to stall and no prompt for the passphrase is shown. After a few minutes even once you terminated the hanging command - a passphrase prompt is shown where none is expected. This is probably caused by a timeout of this asynchronous pinentry process where first by default a graphical input dialog is tried which might fail especially via ssh and no local x Server. Once it failed the failback console pinentry is used. To solve this issue, you can add "pinentry-program /usr/bin/pinentry-curses" to your .gnupg/gpg-agent.conf file.
- When you have several private keys in your keyring with the same tag and you try to delete one by tag it might happen that you get the error that there is no such key. This seems to happen when you already deleted one with this tag. Simply use another tag or the ID. For public keys this seems to not cause any problems.