Encryption Core Concepts: Adding Crypto to your System

Are you developing an application that you think would benefit from encryption? Do you know it will save money in a data breach, but don’t know where to go from there? Encryption core end to end encryptionconcepts are not rocket science, but they rely on complex math and software life-cycle issues that make encryption challenging in practice. Read on to understand more about encryption core concepts before you add crypto to your app.

This article is part of our Security Guide series – Encryption for Developers. Read more in that series of in-depth technical articles on getting encryption right in your application.

In this article, we’ll revisit and expand on encryption core concepts and the cryptography abstraction stack. This is the idea that encryption applications are built on a deep layer of systems and tools. When you sit down to “implement encryption” in your application, it’s very useful to be clear about which part of the stack you’re working in. Our general advice is to stay as far to the right as possible.  

Throughout this explanation, we’re going to touch on the Signal protocol. This is a cryptographic chat system that lets two end users chat without sending the plain text to an intermediate server. It’s therefore “end-to-end encrypted”. Signal is a great example because it makes a lot of decisions for the developer of a chat application so that there are fewer things to get wrong. It moves the bar to the right.

The stack of cryptographic layers is:

cryptography layers

The Math

Underlying all cryptography is some pretty specific mathematics. This is how we generate public / private key pairs and how we demonstrate that encryption is strong “enough” to withstand a determined adversary’s brute-force attacks. Over the last few years, quantum computers have become a long-term concern around the viability of the math behind modern crypto. There’s some serious work in the “math” layer to build efficient “post-quantum” encryption algorithms; that is, algorithms that are not vulnerable to quantum computers.

Example: Elliptic curve cryptography is based on some visual and elegant math. Here’s a fun explainer article on ECC.

The Algorithm

 This layer includes the alphabet soup of symmetric and asymmetric ciphers and hashing algorithms. They all work very differently and apply to different problems (e.g. AES, RSA, ECC, SHA2). I’m also including related algorithms in this layer, like how keys themselves are generated (e.g. password-based key derivation functions, pseudo random number generators) and the cipher “modes” (e.g. ECB, GCM, CCM). These low-level ciphers themselves are awesome! But hard to use right, particularly in combination with various ancillary functions and modes. In fact, they are merciless. They will let you make encrypted looking stuff that’s not secure at all.

Example: The Signal protocol uses a variety of ciphers, including AES.

Algorithm Library

Now we get to the programming bit! Each programming language has a set of (potentially non-overlapping) combinations and implementations of algorithms, modes, and ancillary functions. It’s sometimes hard to communicate across languages. Java, for example has its crypto libraries that include all the algorithms above, as implemented by e.g. BouncyCastle. At this level, you have to understand the algorithm itself in order to use it correctly. For instance, without some help, you probably will use AES wrong in Java.

Example: The Signal protocol’s Java library uses javax.crypto.Cipher’s spec for AES (and I bet they used it right).

The Protocol

 HTTPS, SMIME, SAML, OAuth, Signal, etc. The protocol is the application of a set of algorithms to a type of problem. Notice that this is a type of problem, not necessarily a specific application. Plus “protocols” span a very wide range of abstraction levels since they build on each-other. Protocols are often specified  in Request for Comments. RFCs are the way some crypto protocols are agreed upon. They can be a little hard to consume because they often either contain far too much detail about the protocol, or not quite enough (often both).

Example: WhatsApp is a very popular chat application, and to implement security between end users, it relies on the Signal protocol.

Protocol Library

 You can’t actually use a protocol until someone writes a library for it. OpenSSL is a great example here because it’s a very widely deployed implementation of HTTPS. Many people use OpenSSL correctly, and many people use it badly; and many people use it for things besides HTTPS, which is why it appears multiple times in this list.

Example: libsignal-protocol-java which anyone can use to make a chat app.

The Application

 The use of the protocol library to secure a part of the whole system. Note at this layer you must be sure that you’re actually using the protocol right or you’ll get security bugs. And sadly, with today’s tools you usually have to understand each layer down to the algorithm itself. That’s because your specific problem isn’t actually addressed by any of the layers between here and the algorithm.

Example: What’sApp and Signal both use the Signal Protocol.

The Questions You Need to Answer

In the discussion above, we talked through a few key questions that make cryptography a lot harder than it looks. In this article, we’ll dive into a bit more detail about what all of that means and how to work through the answers.

Selecting the type of cipher you’re looking for:

  • Symmetric (shared key): Fast and efficient, these algorithms are usually your baseline for encrypting data.  AES is usually what you want. Symmetric encryption suffers from challenges with key management. You need a way to get the shared key to both parties, which is why you need:
  • Asymmetric (public / private key): Slower and more complex than symmetric encryption, these algorithms are typically used for exchanging symmetric keys. RSA is the “classic” choice here; ECC is more modern and efficient, and almost as widely supported.

Selecting an Algorithm and Mode:

  • The core algorithm will determine performance, security, and compatibility: AES, RSA, ECC (with various curves), CHACHA20, etc.
  • Symmetric multi-block modes vary in their confidentiality and integrity properties, and some work better with different types of data or different system constraints (such as a lack of a random number generator): ECB, GCM, CBC, SIV, etc.
  • Hashing / tagging / MAC are required to add integrity to your mode. Many people think that encryption implies integrity, but it does not. For instance, AES doesn’t provide integrity by default: MD5, SHA1, SHA2, Poly1305, GCM, etc.

Managing Keys, because you’re not done with your crypto until you decide how to manage keys:

  • Generation: Random keys, key sizes, symmetric vs. asymmetric, etc.
  • Storage: Whether to derive keys from a user-generated password or store them for later lookup.
  • Communication: How to agree on a key between a client and server or two users.

Read our discussion of key management for more information about this.

Miscellaneous practical considerations:

  • Beyond key material, there are other elements of randomness or uniqueness that are associated with an encrypted messages. Initialization Vector, salt, and nonces fall in this category. These need to be communicated to the decrypting party as well, so they need to be stored or transmitted. Typically, it’s safe to transmit these unencrypted along with the ciphertext, but you should be careful not to let the attacker modify them.
  • You also need to pad, encode, serialize, and sign your messages. Believe it or not, even bad padding can undermine the confidentiality of the encrypted message. For signing of structured data like a JSON object or HTTP headers, you need an identical way for both sides to serialize and deserialize the data, or the signatures won’t match.
  • If you’ve done all of this right, you now have an encrypted and signed message. It’s likely at this point that you’ll want to send this message to another party, who will check the signature and decrypt the message. That means you need to communicate all of your choices: key id, size, cipher, mode, IV, hashing algorithm, etc. This communication itself is a fraught weakness in many cryptography systems. For instance, attackers have been able to trick some symmetric systems into behaving like asymmetric systems and sending their shared key directly to the attacker. Oops.

FIPS compliance:

  • Some industries, like federal contracting and banking, may need to comply with NIST recommendations, which limits the selection of algorithms and implementation libraries.
  • Before you settle on your algorithms and libraries, check to see if you will be required to comply with FIPS. It will impact all of the choices that we outlined above.
  • Even if you play it safe and choose FIPS-compliant ciphers and use them everywhere, that won’t necessarily make you FIPS compliant.
  • With FIPS, you need to select an approved library. Open source libraries like OpenSSL have FIPS-approved variants, but that doesn’t mean that OpenSSL itself is approved. You have to use the right variant, and in some cases the right binary.
  • A deep analysis of FIPS 140-3 is beyond the scope of this guide.

General guidelines for getting things right

Cryptography is very specific to your application. A trained cryptographer can help you understand the strengths and weaknesses of your approach, and no “how-to” document can tell you what’s right or wrong. There are, however, a few choices you can make that will get your closer to “good cryptography”, and you can often safely use them.

A few recommendations we have, particularly if you need to or want to stick with the NIST / FIPS ciphers:

    • Symmetric encryption: AES-GCM is a nice mode of operation because it provides multi-block confidentiality (unlike ECC) and authentication / integrity (unlike CBC). It’s broadly available, so you can usually count on it being there when you need it. You have to be very careful with the GCM nonce, though, because nonce reuse (or if the attacker can choose it) can leak key material. That is no good.
    • Authentication: This verifies that the person with the private key encrypted the data. Very important. Our recommendation is same as above using the tag added by GCM.
    • Key Exchange: Elliptic Curve Diffie-Hellman (ECDH) over curve P-384 is a good choice.
    • Hashing: SHA256 is pretty standard across the board now-a-days.
    • Don’t use old/broken stuff: While this is not an exhaustive list, the most commonly used “old or broken” stuff: DES, MD4, MD5, SHA1, RC4, AES-ECB, (RSA is old, but not broken. It’s fine to use if that’s what’s available, but prefer ECC if you can.)
    • libSodium: If you don’t need NIST / FIPS compliance, you should definitely look into libSodium. It’s very well regarded and the libraries are typically easier to use than the libraries that implement similar FIPS ciphers.