Portable GELI

A portable version of GELI on GNU/Linux (with write support)

Although most of my works are properties of the companies I work for, This repository is one of my recent personal projects I am actually proud of. This is portable-geli. GELI(8) is FreeBSD's cryptography framework technology. It consists of the kernel side which controls the actual encryption/decryption blocks of data and the user-space side which gives the user the ability to control various aspects of the cryptography such as defining keys, choosing encryption algorithms and so on.

The primary goal of this project is to make it possible to attach GELI encrypted block devices on other operating systems such as GNU/Linux while being 100% compatible with the FreeBSD implementation. Currently only the GNU/Linux is supported but support for other operating systems such as OpenBSD (The daily operating system of mine) is already under serious consideration. Also I am planning to add GELI support to the Linux kernel but that work is not started yet and I’m not quite sure if it’s acceptable to upstream project’s maintainers.


Before explaining the internals of the project, I should mention that I almost rewrote the entire project again after completion and replaced it with most of FreeBSD’s actual code (Kernel and UserSpace) for two reasons: The main reason was to avoid mistakes and leaving cryptography vulnerabilities which are very hard to find and debug and the other was that this decision (hopefully) wil help me to improve the code faster and maybe lead to some contribution to the FreeBSD’s code as well. So, most of the descriptions below except the NBD part is identical to the FreeBSD implementations. Hope it helps others interested in cryptography as well as FreeBSD enthusiasts.

Data structures:

GELI uses the last sector of the block device to store it’s metadata (struct eli_metadata) describing the properties of the encrypted device (Note that this decision to use the last sector is the opposite in LUKS in which it uses the first sectors). Although the metadata structure is fixed in shape, The stored values and interpretation of it depends on the version of geli (md_version) which created the device. There are 7 different versions which the latest version is currently supported on portable-geli. Some of the most notable fields in this metadata is the encryption algorithm (md_ealgo), authentication algorithm (md_aalgo), the random salt (md_salt)and finally the encryption master keys (md_mkeys).

struct eli_metadata {
    char        md_magic[16];
    uint32_t    md_version;
    uint32_t    md_flags;
    uint16_t    md_ealgo;
    uint16_t    md_keylen;
    uint16_t    md_aalgo;
    uint64_t    md_provsize;
    uint32_t    md_sectorsize;
    uint8_t     md_keys;
    int32_t     md_iterations;
    uint8_t     md_salt[ELI_SALTLEN];
    uint8_t     md_mkeys[ELI_MAXMKEYS * ELI_MKEYLEN];
    uint8_t     md_hash[MD5_DIGEST_LENGTH];
} __attribute__((packed));

Encrypting the master keys:

The master-keys field (md_mkeys) can hold up to two master keys. Each master key slot holds Initial Vector (IV) and DATA which are filled with highly randomized bits of data and a HASH field which is calculated using the user's passphrase.

The IV field is the Initialization vector and the DATA is the actual key used in encrypting the disk which both are randomly created.

  64      64     64     64     64     64
│  IV    DATA │ HASH │  IV    DATA │ HASH │
   First Key’s slot       Second Key’s slot
     (192 bytes)            (192 bytes)

The above diagram shows the unencrypted schema of the master keys. The actual master keys (md_keys) is always encrypted on disk using a symmetric-key algorithm such as AES and only with the correct key, the components of the master key can be retrieved. This field is encrypted using the human passphrase but instead of using the passphrase directly as an encryption key, first a derived-key is calculated using a key derivation function (KDF) which makes a much stronger key with desired length (512-bits) than the user passphrase. GELI uses HMAC as it’s key derivation function with SHA512 hash function but also applies PBKDF2 algorithm to add computational costs to brute force attacks.

# pseudocode:
derived-key=HMAC(k,m) = H((k'⊕opad) || H((k'ipad) || m))
          m=PKCS5v2(salted-passphrase, iteration) or salted-password if iteration is 0
          The || symbol means string concatenation
          The  symbol means XOR

Call graph:

The derived-key is then used to calculate the HMAC of the IV and DATA fields together and store it into the HASH field of the master-key.

|                 IV-DATA                 |           HASH          |
             Unencrypted Master Key slot (192 bytes)

# pseudocode:
hmac-key=HMAC(derived-key, ‘\x00’)

Call graph:

Then, the whole master-key slot is encrypted using AES-CBC algorithm.

|                     ENCRYPTED-MASTER-KEY                          |
               Encrypted Master Key slot (192 bytes)

# pseudocode:

encryption-key=HMAC(derived-key, ‘\x01’)

Call graph:

Now the master-key slot is fully encrypted indirectly using the user's passphrase. Remember that the only way to access the master-key’s components is to decrypt the master-key slot using the correct passphrase.

|                    ENCRYPTED-MASTER-KEY                   |
                Encrypted Master Key slot (192 bytes)

|         IV              DATA          |       HASH        |
                 Decrypted Master Key slot (192 bytes)

# pseudocode:
decrypted-master-key=AES-CBC-DECRYPT(encryption-key, encrypted-master-key)
    encryption-key=HMAC(derived-key, '\x01')

Call graph:

After decrypting the master-key slot, the HASH field of the decrypted-master-key should be equal to the recalculated hash of the IV and DATA section of decrypted-master-key slot using the passphrase. If equal, the given passphrase is the correct one. With any given passphrase, this process is repeated for the two of the master-keys before giving up.

|        IV                DATA         |         HASH      |
              Decrypted Master Key slot (192 bytes)

                                        | RECALCULATED-HASH |

# pseudocode:
    hkey=HMAC(derived-key, ‘\x00’)

Call graph:

If the passphrase is verified, this metadata is then processed and stored in a more efficient structure in memory at run-time (struct eli_softc). Only the IV and DATA sections of the unencrypted master key components are needed and will be stored in memory until the encrypted device is finally detached.

struct eli_softc {
    u_int        sc_version;
    u_int        sc_crypto;
    uint8_t      sc_mkey[ELI_DATAIVKEYLEN];
    uint8_t      sc_ekey[ELI_DATAKEYLEN];
    TAILQ_HEAD(, eli_key) sc_ekeys_queue;
    uint64_t     sc_ekeys_total;
    uint64_t     sc_ekeys_allocated;
    u_int        sc_ealgo;
    u_int        sc_ekeylen;
    uint8_t      sc_akey[ELI_AUTHKEYLEN];
    u_int        sc_aalgo;
    u_int        sc_akeylen;
    u_int        sc_alen;
    SHA256_CTX   sc_akeyctx;
    uint8_t      sc_ivkey[ELI_IVKEYLEN];
    SHA256_CTX   sc_ivctx;
    int      sc_nkey;
    uint32_t     sc_flags;
    int      sc_inflight;
    off_t        sc_mediasize;
    size_t       sc_sectorsize;
    u_int        sc_bytes_per_sector;
    u_int        sc_data_per_sector;
} eli_sc;

Encrypting and decrypting blocks of data:

The decrypted block device is at least one sector smaller than the encrypted block device since the metadata sector is not present. GELI uses different derived-keys for every 2 to the power of 20 (2^20) blocks of data or sectors (unless it is configured to use only a single key) and an unique initial vector (IV) for each sector equal to it’s sector offset. That means there is a unique key for every 2^20 blocks of data starting from key number zero. To encrypt/decrypt each block of data, a specific key should be calculated every time a block is requested which can be quite cumbersome. To avoid this, a number of possible keys can be calculated beforehand to save some computational power. These keys are then inserted into a sorted list of calculated keys. The process of decrypting and encrypting blocks of data is as following:

To read the unencrypted sector of m1, the content of b1 sector on the encrypted device should be read and decrypted by the following algorithm. And to write the m1 sector, the content of this sector should be encrypted and replaces the b1 sector on disk.

| b0   b1   b2   b3   b4   b5   b6   b7        bi   metadata
          Encrypted block device (on physical disk)
| m0   m1   m2   m3   m4   m5   m6   m7        mi
       | Decrypted block device (virtual block device)

# pseudocode:
hmac-data={’ekey’, keyno}
encryption-key is IV-DATA if md_flags has ENC_IVKEY flag set otherwise DATA.
data-encryption-key=HMAC(encryption-key, hmac-data)
ivi=i || padded-zero

mi=AES-XTS-DECRYPT(data-encryption-key, ivi, bi)
bi=AES-XTS-ENCRYPT(data-encryption-key, ivi, mi)

Virtual block device:

The final part is to provide the user a virtual block device. Using Linux Network Block Device (NBD) it’s possible to provide a virtual block device in which the requests to such a device are handled by a user-space program like portable-geli. So when geli is attached to an encrypted block device, first it verifies the user passphrase and then creates a nbd device with the proper size (md_provsize). User’s requests to this virtual block device are received by the geli process and appropriate type of action is done according to it’s command type which can be either read or write.

Who AM I?

I'm a software enginner interested in kernel programming, Senior System Developer and project manager of the router project at Zharfpouyan. Currently I'm looking forward to work with bigger teams on more exciting projects. You can find me at Github.