The 2009 Verizon Data Breach Investigation Report
In 2009, the Verizon Business Risk Team released their first public Data Breach Investigations Report. I saw it reasonably soon after release, and noticed a whole bunch of binary numbers in the background on the cover. “Cool,” I thought, but I didn’t bother trying to decode it. A week or so later, I learned that there’d been a contest, and I missed out. :(
In 2010, I was ready, and tried to solve the puzzle, but failed. That story comes later.
But now, on the eve of the release of the 2011 DBIR, I’m finally documenting the method needed to solve these puzzles. Here’s a quick, fresh look at the 2009 puzzle.
As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. If you’d like a copy of just the raw data (in this case, two ciphertexts), click here.
So, I vaguely remembered how this worked. And also that it was a very simple puzzle. Let’s see how quickly I can solve it, without digging too deep into my memory for what needed to get done. First, I pulled down the original PDF. And there, all over the background of the cover, is a whole bunch of binary numbers. Highlight, copy, and paste out into a file.
First, how do we break up the numbers? 8-bits? 7-bits? I removed the line breaks, counted, and divided by 8, but didn’t get an even number of bytes. Found some text in the middle, removed that. Now count again – ah, beter. Looks like it’s 900 8-bit characters.
Next up – a simple script to decode the binary. Doesn’t take more than a few minutes, and now I’ve got a big block of ciphertext.
EVNTXIGYIMWSNEHEIEFOTXBSCWYHRQMWGUZABVYCBBFREYFBVEDKEVMFRIFN
GFNRBFGVKSFPNBUFZJGCEEEWAKHPXEBTZJCZOWGTBSQGTMIAYDPYDRIRYETK
CJRPYHEPWKUOAEKNVTVZHSMZNTTIVIKMMRYSNUIAKBRKQMSTYCGCCRLRRIIR
EFGYTJUBUXHEYSGLEYRVHIYXDEYZCJKVTOSOIXJEHOXEVMWJBNZMTKWZEFOF
CNBWNCUWMYFIUVBKWNPWTYOEYQTIRRYRCMNVFVLRSBNTPWPAOCZPEKHLFCEE
RRVWVUYBVJPUVPOAYMIKQQNSWZGHZKDGYLAEGWPKESGCYZFVJDMEPQKSSLNV
SVPUVVRVYERHDTUTYYMQGEVWRMQSZFNPNRJIGGWAJNNJLKOEQHNETRPUQYDF
ZWCZKVJEXLMCKCSIFTCTSUTLDRRMIKQTNINPGRPQQXPTZDPAIOTCEUAZFEWD
QLLPZRHXLXQGSLRJTBLZRIRVISNZIWLMVYADVOHFEVNAKKGORRXSYGXPUMVG
BOMRJLCREFCMRQVXTMIYMJJVHXNBTSZMTJEFKFGKURFLNHXPKCWLEXMIYLGY
NNRWAKSEWTHPKGZKKXGAZELLUTAYCIEKWISHUNDKEKWARGBYZFGKEPKQGZZS
RIMFLGKARTURAINSNGEEUMEXRVEELZXTISUWVZKOYLTPBHZWEOQWNXNPXPKS
SXJHPANCVFPRYADRLROEWEBQEWHZRGATZDGUCEKLFYHZJNNZIJRGNZRVBOCA
UYEZGKPSJXJIASMVFTDWFXBIDHQZEYKDRTDRIOPPKJRPISSKMCZJFZTBVBJU
GEYANJIGJTDCPTZDEOGUTLZPEKHTNIHTGGUMVGBOMRJLCREFSWFZOCROHEAU
Okay, what kind of encryption did they use? A quick test of ROT-13 and such doesn’t get me anywhere. It’s awfully long, so I really don’t want to try a substitution cipher if I can avoid it. Then I remember that there was a clue somewhere in the report. Skimming through, I found a footnote on page 48:
yr puvsser vaqrpuvssenoyr
Let’s run that through ROT-13, and sure enough, we get a hint:
le chiffre indechiffrable
Aha! That’s French. And one of the most commonly found ciphers, it seems, for hacker crypto challenges was created by a Frenchman. And I also know, because of how often I’ve run against this cipher, that he called it “le chiffre indechiffrable” (or the indecipherable cipher). So which cipher it is has been decided: it’s a Vigènere.
But how long is the key? I found an online applet that would do a Kasiski analysis, which looks for repeated trigraphs in the cipher and measures the distance between them. If you can find a common factor amongst a bunch of repeated trigraph distances, that could very well be the key length. I found 10 repeated trigraphs, but their distances are all over the map, and I can’t see anything that’s a clear common factor.
Next up, the index of coincidence, which is a way of looking at the ciphertext in varying keylengths to see which one seems to have “slices” that are the most internally consistent. That’s a simplification. Truth is, I don’t understand it much beyond a zen-like vagueness, so I’m not going to try to explain it here.
Anyway, the IC applet makes 9 characters look like a good potential key length, though it’s far from certain. But at least one of the Kasiski distances was 72, which is a multiple of 9, and so this is as good a place to start as any.
Next up, I stick the ciphertext into a nice interactive Vigenere app, set it to a 9-character key, and start sliding the alphabets around to see if anything pops out. Not having anywhere better to start, I make a guess that the plaintext starts with “CONGRATULATIONS.” As I adjust the various alphabets to make this happen, the key starts to appear. C-H-A-N-G-I-N-E. Hm. So close. Let’s change it to CHANGING…and now it’s looking a lot more real. Here’s the beginning of the plaintext:
CONGRATSGFWFHWUYGXFBNPOMAPYULIZQENZNVNLWZUFEYQSVTXDXYNZZPBFA
AXALZYGIEKSJLUUSTBTWCXEJUCUJVXBGTBPTMPGGVKDARFINSVCSBKIESWGE
ACRCSZRJUDUBUWXHTMVMBKZTLMTVPAXGKKYFHMVUIURXKEFNWVGPWJYLPBIE
YXTSRCUOOPUYWLGYYQEPFBYKXWLTACKINGFIGQJRBGKYTFWWVFMGRDWMYXBZ
Except that this is the only plaintext I see. If it were a 9-character key, then a key of “CHANGINGA” would at least give me 8 characters of real text, repeated down the length of the output, with a junk character in between each. At this point, I could think back, remember that the key was actually present in the text, find the two instances of “changing” in the report and have the puzzle solved in less than 30 minutes total. But that’d be cheating. So let’s try something new.
It’s looking pretty likely that the key starts with “CHANGING.” But I don’t know how many characters come next. I didn’t see a repeat at 8 or 9 characters, so let’s add another A, and another, and another, until I see things repeat. Once I get to 26 characters it happens. Now I’ve got plaintext that starts like this:
CONGRATSIMWSNEHEIEFOTXBSCW
WARDGOTOZABVYCBBFREYFBVEDK
COMSLASHGFNRBFGVKSFPNBUFZJ
EVERYONEHPXEBTZJCZOWGTBSQG
RFINSVCSDRIRYETKCJRPYHEPWK
SHAREFINVZHSMZNTTIVIKMMRYS
LNINETEEQMSTYCGCCRLRRIIREF
So now, let’s start changing the letters after CHANGING and see what happens. A is no good, neither is B, nor C, but D – that seems to extend the cleartext words properly. In fact, the ZZZ after GOTO are probably supposed to be WWW. To make that happen, my key now starts with “CHANGING DEF”, which gives me this:
CONGRATSFIRSNEHEIEFOTXBSCW
WARDGOTOWWWVYCBBFREYFBVEDK
COMSLASHDBIRBFGVKSFPNBUFZJ
EVERYONEELSEBTZJCZOWGTBSQG
RFINSVCSANDRYETKCJRPYHEPWK
SHAREFINSVCSMZNTTIVIKMMRYS
LNINETEENINTYCGCCRLRRIIREF
From here, it’s a pretty easy job to finish out the key this way. The result is “Changing default credentials.” (it also appears in the report as “Changing default credentials is key.” Is. Key. Heh. Funny.) The final plaintext tells where to write with your solution, and the rest is a terse, high-level summary of the entire report. Here it is with spaces and newlines entered for clarity.
CONGRATS
FIRST TO CRACK GETS REWARD
GO TO WWW VERIZONBUSINESS COM SLASH DBIRHUNT TO CLAIM
FOR EVERYONE ELSE HIGH LVL STATS FOR FIN SVCS AND RETAIL FOLLOW
PLS SHARE
FIN SVCS
SOURCES EXTERNAL NINETEEN INTERNAL NINE PARTNER TWO
THREATS MALWARE ELEVEN HACKING FIFTEEN DECEIT FOUR MISUSE SIX PHYSICAL TWO ERROR ONE
ERROR SIG CONTRIBUTOR IN FIFTEEN
TOP THREE HACK TYPES SQL INJECTION SEVEN MISCONFIG ACLS SEVEN DEFAULT CREDS TWO
TOP HACK VECTOR IS WEB APP
TEN TOP ASSET IS ONLINE DATA TWENTY SIX AND
ALL RECORDS TOP THREE DATA TYPES AUTH CRED ELEVEN PII TEN PYMNT CARD EIGHT
PYMNT CARD WAS NINETY EIGHT PCT OF RECORDS
TOP UU IS UNKNOWN CONNECTIONS SEVEN
DISCOVERY TAKES WEEKS TO MONTHS
RETAIL SOURCES
EXTERNAL TWENTY THREE INTERNAL ONE PARTNER EIGHT
THREATS MALWARE TEN HACKING TWENTY ONE DECEIT TWO MISUSE TWO PHYSICAL ZERO ERROR ZERO
ERROR SIG CONTRIBUTOR IN SIXTEEN
TOP TWO HACK TYPES SQL INJECTION SEVEN STOLEN CREDS SEVEN
TOP HACK VECTOR IS REM ACCMGT EIGHT
TOP ASSET IS POS ELEVEN AND
OVER HALF OF RECORDS TOP TWO DATA TYPES PAYCARD TWENTY THREE PII NINE
DISCOVERY TAKES MOSTLY MONTHS
My memory was correct in one respect – this was a very simple puzzle. Even the long approach I took, once I’d figured it out, went fast. If I’d received this puzzle new, today, I’m sure I would have solved it in an evening, tops. Two years ago, I almost certainly wouldn’t have been so lucky. For one, the trick of padding out the potential key to look for repeats isn’t something that’d ever occurred to me before, that I can recall, though it’s pretty obvious in retrospect. I’ll definitely have to remember this technique for future puzzles.
Also, having “CONGRATS” as the opening word gave me a really easy crib. Without that, I honestly don’t know where I’d have started.
So though I was right, this was a simple puzzle, I was wrong in another key respect: That its simplicity would mean it wasn’t going to be any fun, especially (subconciously at least) knowing what I needed to do. Learning a new approach to break this cipher was fantastic fun. And proof that even the easy puzzles shouldn’t be ignored.
Thanks to the whole Verizon crew for this one. The 2010 puzzle was a different story, but that’ll wait until later. Hopefully I’ll write that up before next week’s new puzzle starts sucking up all my free time…