About a week ago, a story hit the wires about a recently-discovered coded message from the Civil War. It had been sealed in a vial, in The Museum of The Confederacy, for years, and was only recently unfolded and decoded. The story was relayed to me with the challenge “extract the key,” so I did.
Actually, it wasn’t quite that easy, but upon looking at the photograph of the message, I was quite surprised to see what I understood to be a major error: the writer of the message had left word breaks intact in the ciphertext. This gives me a significant leg up on trying to break the code.
The first thing I did was try and guess the key length. It appeared to be a Vigenère cipher, in which the key is repeated multiple times along the length of the message. If the key is longer, it repeats less frequently, and the cipher is (theoretically) more secure. If the key is exactly as long as the text, then it’s actually one-time-pad, and is provably unbreakable. In this case, I noticed a repeated three-letter sequence, 210 letters apart, which told me there’s a good likelihood that the key is some factor of 210 letters long. I also noticed four singleton letters near the end of the message, and two of them were both M, 30 characters apart. Since 30 is a nice factor of 210, I started work with that as my key length.
To make a long story short (the long story can be read on my personal blog), I basically followed these simple steps:
- Make an assumption for any given letter of the plaintext (the singleton Is, or assuming “O?” is “ON”).
- Determine what the key letter, in that position, would have to be for that assumption to be true.
- Repeat that new key letter, forwards and backwards along the text, according to the current key length assumption (in this case, every 30 letters).
- See what new plaintext letters those key changes reveal.
- Review the new state of the text, and look for new words that are nearly complete. This brings you back to step 1.
Just repeat that process until it’s all done. I went down a couple of short blind alleys, then reduced the key length to 15 (half of the original 30, but still a factor of the two repeats I’d seen). That gave me many more partially completed words to test out, and after trying another couple of blind alleys I had enough text cleared that I knew I was on the right track.
The rest of the message fell into place rather quickly. There were several errors in the ciphertext, but not enough that the message can’t be read.
I’m convinced the main reason this message was easy for me to break was the word breaks in the ciphertext, which gave me quite a bit of context to work with. Normally, a message like this is written out in 4- or 5-letter blocks, and the reader has to reassemble the words after decoding the message. If there’d been no breaks, I probably wouldn’t have been able to break it, certainly not as quickly.
On the other hand, if I’d had more context for the message – like who it was from, who it was addressed to, and most importantly, what conventions a typical message followed – then maybe even then it would have been easy to break. For this message, If I’d known that it was sent to a General, and that such messages often start “GENL ,” then I could have broken it almost as easily.
Of course, even though I’m an amateur, this is also 147 years after the message was sent. Surely today, even hobbyists have access to much better code-breaking theory than Civil War cryptanalysts? Perhaps. But any Civil War cryptanalyst attacking this message would have been wise to first try the same key that they South had been using for ages. In that case, it would have been broken in minutes, as this message used that same key. To someone familiar with the keys in use at the time, it might as well not have been encrypted at all.
As it turns out, this particular message never got delivered (which is why it was still sealed in that glass vial for over a century), as the recipient surrendered at Vicksburg the day it was written.
In the end, the writers of the message made three errors, all of which it would be wise to continue to avoid even today:
- Don’t provide any context to the attacker. Remove all word breaks and present the message as short blocks of text.
- Don’t reward the attacker for good guesses. Ensure the message doesn’t start with a predictable word.
- Don’t use the same key day after day after day.
If this message had followed those rules, it would have been much more secure. Certainly not against today’s tools and methods, but very probably against those of the 19th century. Of course, these rules are still true today, especially the point about keys (although modern ciphers make word breaks and cribs less of an issue).
If you’re interested in puzzles, cryptography, or battlefield communications, I recommend you check out the full post.