The RSA Algorithm
  |   Source

The RSA Algorithm

Ron Rivest, Adi Shamir and Leonard Adleman shared their encryption scheme with the world in 1977. The algorithm that bears their name was independently discovered by the British GCHQ a few years ealier, but they deemed it too sensitive to share with the world. This is another great example of Arnold's Principle. My goal is to finally work through an example of the full algorithm. Please see my previous three posts on the subject if you need more details on the individual steps. I am not picking on this article in particular; it just happens to be the most recent one I have read. Most articles on the topic involve quite a bit of hand waving as in Beyond The Illusion -Breaking RSA Encryption by Max Van Der Horst. They are absolutely correct in asserting that you should not roll your own cryptography, but I hope to explain the steps with a bit more clarity. At a high level, RSA isn't the most complicated algorithm in the world, but breaking it is non-trivial. It would not be so ubiquitous otherwise.

Key selection

  1. The receiver chooses two prime numbers $p$ and $q$ and calculates their product $N$.
  2. Compute $\phi(N)$. Since $N$ is the product of two primes, $\phi(N) = (p − 1)(q − 1)$.
  3. Choose a number $e$ that is relatively prime to both $N$ and $\phi(N)$. 65,537 is frequently used due to its desirable binary properties.
  4. $(e, N)$ is the public key.
  5. Calculate the modular inverse $d$ of $e$ $(\pmod \phi(N))$. $(d, N)$ is the private key.

Key selection - example

  1. Choose $p = 331$ and $q = 283$. Compute $N = 331 · 283 = 93673$.
  2. Compute $\phi(N) = (p − 1)(q − 1) = 330 \cdot 282 = 93060$.
  3. Choose $e = 307$.
  4. $(307, 93673)$ is the public key.
  5. Calculate the modular inverse $d$ of $307 (\pmod \phi(93673)) = 307 (\pmod 93060) = 73963$.

This step is demonstrated using the Extended Euclidean Algorithm that I wrote about previously. $(73963, 93673)$ is the private key where the first value is the modular inverse d and the second is the product N computed in step 1.

Message Encryption and Decryption

Let's encrypt "ET TU, BRUTE". As a side note, real-world implmentations encrypt/decrypt blocks of bytes, not a character at a time. Begin by converting the letters to numbers using a simple map where A=0, B=1, ..., Z=25. This results in $[4, 19, 26, 19, 20, 27, 26, 1, 17, 20, 19, 4]$. Using the public key $(307, 93673)$, compute $c^{307} (\pmod 93673)$ for each character. These numbers can be quite large, so be mindful of the limitations of the software you are using as many programming languages fail to warn of overflow errors. Python has native support for large integers which makes this calculation easy to perform without precision issues.

In [1]:
et_tu = [4 , 19 , 26 , 19 , 20 , 27 , 26 , 1 , 17 , 20 , 19 , 4]
ciph = list (map (lambda n: pow (n , 307 , 93673) , et_tu ))
print(ciph)
[43857, 26421, 21480, 26421, 44114, 11186, 21480, 1, 54440, 44114, 26421, 43857]

These are the integers that someone would send over an insecure medium to the receiver. On the other end, the receiver uses the private key $(73963, 93673)$ to compute $i^{73963} (\pmod 93673)$ for each integer received.

In [2]:
msg = list (map( lambda n: pow (n , 73963 ) % 93673 , ciph))
print(msg)
[4, 19, 26, 19, 20, 27, 26, 1, 17, 20, 19, 4]

A simple review quickly validates that this is indeed our original message! It is worth highlighting once more that the private key is never shared. The RSA algorithm is not paricularly fast, especially for handheld devices like cellular phones. It is typically used to exchange a randomly generated key that is unique to each HTTPS session between a client and server. The remainder of the data can be sent using much faster symmetric encryption schemes using session keys.

Intermezzo - Multiplying Large Integers

The encryption and decryption steps of the RSA Algorithm require computing the products of very large integers. 93,673 only requires 17 bits of storage, or less than 3 bytes. What about a number such as $26^{93673}$? The Python programming language will gladly attempt to compute it and succeeds on a modern laptop.

>>> pow(26, 93673).bit_length()
440305

That is a massive number over 134,000 digits long! Thankfully, this particular result is not needed, but the result modulus another number. For example, using a number easily computed by hand, what is $26^{307}\pmod{47}$? Start by converting the exponent to binary. 307 in base-2 is equal to 100110011. Exponentiation works in $\mathbb{Z}_n$ because of the properties listed earlier. An exponent is simply repeated multiplication. If $b \equiv a^x \pmod{n}$ and $c\equiv a^y\pmod{n}$, then $bc \equiv a^{x+y}\pmod{n}$. $$ \begin{aligned} 26^{307} &= 26^{2^0+2^1+2^4+2^5+2^8} \\ &\equiv 307^{2^0}+307^{2^1}+307^{2^4}+307^{2^5}+307^{2^8}\pmod{47} \\ \end{aligned} $$ Calculate $2^i$ where $i=0, 1, 4, 5, 8$. $$26^{2^1}=676\equiv18 \pmod{47}$$ This can be squared to find $26^{2^2}\pmod{47}$. $$\begin{aligned} 26^{2^2}&\equiv (26^{2^1})^2\pmod{47} \\ &\equiv (18)^2\pmod{47} \\ &\equiv 324\pmod{47} \\ &\equiv 42\pmod{47} \\ \end{aligned}$$ Use the fact that $(a^{2^n})^2\equiv a^{2\cdot2^n}\equiv a^{2^{n+1}}\pmod{n}$ \parencite{abstract}. $$\begin{aligned} 26^{2^4}&\equiv 14\pmod{47} \\ 26^{2^5}&\equiv 8\pmod{47} \\ 26^{2^8}&\equiv 2\pmod{47} \\ 26^{307}&\equiv 26*18*14*8*2 \pmod{47}\\ &\equiv 104832 \pmod{47}\\ &\equiv 22 \pmod{47}\\ \end{aligned}$$

Other than speed, this method saves memory as well since the largest number the computer must store is $2\log_2{exp}$ bits. In general, an $n$-bit number multiplied by an $n$-bit number is $2n$-bits long. There are two approaches known as left-to-right (LR) and right-to-left (RL) depending on the direction that the bits of the exponent are scanned. The method above uses the RL method, but LR is more common in hardware implementations of the calculation.

Euler’s theorem

The RSA algorithm works because of Euler’s theorem. This is consistently the most hand-wavy aspect of most articles I read on this subject. It appears magical, but it's a beautiful result in mathematics if you dig into it a bit!

Theorem

Let $a$ and $n$ be integers such that $n > 0$ and $gcd(a, n) = 1$. Then $a^{\phi(n)} \equiv 1 (\pmod n)$.

Recall that step 5 of the key selection requires finding

$$td\equiv 1\pmod{\phi(N)}.$$

Using the definition of modular congruence,

$$td = 1 + k\cdot \phi(N)$$

for some integer $k$. The message is encrypted by finding $a^t \pmod{N}$ and decrypted by raising the encrypted message to the power of $d\pmod{N}$.

$$(a^t)^d \pmod{N}$$

This gives back $a$, the original message.

$$\begin{aligned} (a^t)^d &= a^{td} \\ &= a^{1+k\cdot\phi(N)} \\ &= a\cdot \left( a^{\phi(N)} \right)^k \\ &\equiv a\cdot1^k \\ &\equiv a\pmod{N} \\ \end{aligned}$$

Euler's theorem may appear a bit magical at first. Consider the Caley table of integers $\pmod{9}$.

Caley table

Take note of the entries with a value of 1. These only occur when a number is coprime with 9. The number of 1 entries in the Caley table is the same value as $\phi(n)$. For example, 11 is coprime with 9.

$$\begin{aligned} 2 \equiv 11 \pmod{9} \\ 11^2 \equiv 4 \pmod{9} \\ 11^3 \equiv 8 \pmod{9} \\ 11^4 \equiv 7 \pmod{9} \\ 11^5 \equiv 5 \pmod{9} \\ 11^6 \equiv 1 \pmod{9} \\ \end{aligned}$$

This process works the same even if you start with a value of 2. Increasing the power will always work towards a value of 1 as long as the GCD of $a$ and $n$ is 1.

Conclusion

I hope my previous writing on this subject coupled with this explanation will give you a better idea of how RSA works under the hood. I think it rests on a beautiful set of results in mathematics, and the genius of the original creators is how they crafted these ideas into a useful protocol. Quantum computing has already made an impact on cryptography, but I trust this algorithm will be around for many more years as we transition to new, post-quantum technologies.