How to Lose $1000 in Vegas Without Even Gambling

Tue, Aug 30, 2011

- ∞ -

On July 15, Fidelis Security Solutions announced that they’d be running a crypto puzzle at Black Hat. And that the prize would be $1000. So, naturally, I was quite interested. I went to their site, downloaded the puzzle, and set to work:

^
¥Ð§µ    
¶®Æä
æ©×ä
÷ĳŒĐ
ƆķėĲ
ŦůŶū
ƂƐƔƆ
ŦƉƶǴ
ƆƅƦƬ
ǆƹɇʃ

As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. The text above is all that you need to get started, or you can click here to see the ciphertext and the hints that were revealed during the conference.

It’s immediately obvious that we’re not looking at straight ASCII. I figured it would be UTF-8 encoded, and verified that quickly. But the question then was whether the decoding work should be in UTF-8 or if, for example, I needed to convert it to UTF-16 first. I even considered that maybe I needed to look to the official Unicode name for each character, instead of the binary representation of it. Here’s the hexdump of the ciphertext, in UTF-8 (with newlines dropped for clarity):

5e 
c2 a5 c3 90 c2 a7 c2 b5 
c2 b6 c2 ae c3 86 c3 a4
c3 a6 c2 a9 c3 97 c3 a4
c3 b7 c4 b3 c5 92 c4 90
c6 86 c4 b7 c4 97 c4 b2
c5 a6 c5 af c5 b6 c5 ab
c6 82 c6 90 c6 94 c6 86
c5 a6 c6 89 c6 b6 c7 b4
c6 86 c6 85 c6 a6 c6 ac
c7 86 c6 b9 c9 87 ca 83

The little ^ character at the beginning made me think of XOR – since in many languages, that’s the operator used for that. So I need to find some binary key stream that, when XORd with the ciphertext, will give me plaintext.

I played with that for a while, then watched their little promo video again. And there, at the very end of the video, the phrase “ALL YOUR ¥Ð§µ ARE BELONG TO US” zooms past the viewer. So “¥Ð§µ” == “BASE”? Okay, that’s something else I can work with.

Another interesting thing that I noticed: in UTF-16, the first byte of each two-byte pair (each character is represented by two bytes) increases gradually over the entire ciphertext:

005E 
00A5 00D0 00A7 00B5 
00B6 00AE 00C6 00E4 
00E6 00A9 00D7 00E4 
00F7 0133 0152 0110 
0186 0137 0117 0132 
0166 016F 0176 016B 
0182 0190 0194 0186 
0166 0189 01B6 01F4 
0186 0185 01A6 01AC 
01C6 01B9 0247 0283

In the UTF-8 version, the first byte varies among c2, c3, c4, c5, etc., but is generally increasing (just not quite as clearly in the UTF-8 version). So it seems pretty likely that the first byte is irrelevant.

After a few days, pulling the puzzle out every now and then, the original puzzle page was removed and replaced with something that essentially said “You’re too early. Come back later for the puzzle.” Damn – maybe what I’ve been playing with was just a teaser, and not the real puzzle. Okay, I’ll forget about it for a while (and finish my talk slides…)

Fast forward to Black Hat, I stop by the Fidelis booth. And discover that the original puzzle is in fact the real puzzle. Arrgh! I could’ve been working on this all along!

Every few hours over the course of the conference, they sent out hints via Twitter. Predictably, the first few hints don’t help me at all, though one tweet “get to know xxd” helps. As sending the original ciphertext through xxd would just give me a straight UTF-8 dump in hex, I now know I’m supposed to use that encoding and not convert to UTF-16.

But that’s not helping me any. They tell me in person that the “BASE” bit wasn’t meant to mean anything, so that was just a red herring. They also tweet some of the plaintext (“Fidelis” is in it), but still I’m getting nowhere. I tried writing some tools that’d drag “Fidelis” across the ciphertext, looking at what the XORd keystream would have to be in order to produce that plaintext. That gets me nowhere.

A later tweet says that there are only 20 characters in the plaintext. Ooooh, that changes things. Now I’m playing with XORing two bytes in a row (like A5 ^ D0, then A7 ^ B5, etc.) but again, no luck. Another hint tells me that ¥ and µ are the same, but that doesn’t help either (one’s encoded as A5, the other as B5), and if we’re talking about two-character sequences, then now I’m really confused).

Finally, at 1:07 on Thursday, they released this hint: “‘C2A5’ =~ /.{3}(.)/”. This tells me that for every four-character hex sequence, I only need to look at the last character.

I get off the escalator, pull up the ciphertext hex on my phone, and start decoding ASCII in my head. “P…u…n…d…” Oh, crap. Finding a corner to sit in, I open the hex in a text editor and pull up an ASCII chart. What I end up with is this:

a5 90 a7 b5 
b6 ae 86 a4
a6 a9 97 a4
b7 b3 92 90
86 b7 97 b2
a6 af b6 ab
82 90 94 86
a6 89 b6 b4
86 85 a6 ac
86 b9 87 83

becomes:

which then becomes:

50 75 6e 64 69 74 ....

Pundits grok Fidelis

I immediately went to the Fidelis booth, walked up to Will (the creator of the puzzle), looked him in the eye, and simply said “Really? REALLY!?” That got a laugh. Apparently, in his words, I gave him “too much credit.” I was looking for an actual, cryptographic solution, when really the answer was staring me in the face THE ENTIRE TIME.

And if I’d taken the time to really think about it, I should have solved this in 10 minutes just by visual inspection. Remember when I said that I’d determined pretty quickly to drop the initial byte of each pair? The “c2” and “c3” and so forth? Well, part of figuring out whether it was UTF-8 or UTF-16 involved me looking at the Wikipedia pages for UTF. Which told me that when a pair begins with c2 through cf, the next byte MUST start with 8, 9, a, or b. So I knew, almost from the beginning, that not only did the first byte have no real bearing in the puzzle, but the next nybble (the first character of the 2nd byte) had no bearing either. But that fact just never registered.

In fact, hindsight has helped me to recognize not just one, but two different ways to look at the data and quickly solve the puzzle. Let’s look at the hex dump again (with column headers to make the discussion easier):

AB CD EF GH IJ KL MN OP
-----------------------
c2 a5 c3 90 c2 a7 c2 b5 
c2 b6 c2 ae c3 86 c3 a4
c3 a6 c2 a9 c3 97 c3 a4
c3 b7 c4 b3 c5 92 c4 90
c6 86 c4 b7 c4 97 c4 b2
c5 a6 c5 af c5 b6 c5 ab
c6 82 c6 90 c6 94 c6 86
c5 a6 c6 89 c6 b6 c7 b4
c6 86 c6 85 c6 a6 c6 ac
c7 86 c6 b9 c9 87 ca 83

Looking at the columns, it should have been even more obvious. I’ve already discussed dropping columns A, B, E, F, I, J, M, and N (I came to this conclusion shortly after I originally started the puzzle). But what I didn’t “grok,” but should have, was that I needed to drop columns C, G, K, and O as well. What’s left after that? Column D is either a 2, 5, 6, or 7. Column H is 0, 3, 5, 7, 9, e, or f. Column L: 2, 4, 6, or 7, and finally the last column, P, which is 0, 2, 3, 4, 5, 6, b, or c. So two columns (H and P) with truly random-looking numbers, and two columns (D and L) with 2, 4, 5, 6, or 7. In ASCII, a byte that begins with 4 or 5 is a capital letter, and 6 and 7 denote lowercase letters. Bytes beginning with 2 are punctuation – in this case, the 2 is always paired with a 0, or a space.

Another way to look at this, mathematically, is in terms of bits of entropy. Columns A, E, I, and M have 0 bits, since they never change. Columns B, F, J, and N have 4 bits, since they’re anything between 2 and a. Similarly, columns C, G, K, and O only bring 2 bits to the game (since 8, 9, a, and b are all 10xx in binary, it’s just the last 2 bits that change). And D and L have only 3 bits of entropy (2, 4, 5, 6, and 7 all fit within 0xxx in binary). Finishing it up, we have:

AB CD EF GH IJ KL MN OP
-----------------------
04 23 04 24 04 23 04 24

One can pretty readily see what might’ve been obscured before: That each line has 2 repeated pairs (04 23 04 24). Looking at the UTF-16 version, again, we see that the characters are taken from increasingly higher code pages in the Unicode alphabet…which further strengthens the supposition that columns B, F, J, and N are all meaningless (since they’re largely derived from the Unicode pages as well). So instead we have “00 23 00 24” or just “23 24”. That’s 11 bits….but we only need 8 bits for letters (or really, 7 bits for ASCII). But wait – the 2s here are also largely driven by the Unicode layout…if we drop those, now our data has “03 04” bits, or 7 bits total. Just enough to build ASCII data. And sure enough, that’s what it does.

The plaintext wasn’t even encrypted. Just hidden in noise.

I should have walked into their booth first thing Wednesday morning, handed them the solution, and walked away $1000 richer.

To say I was frustrated…well….that doesn’t begin to cover it.

So in the end….was this a good or bad puzzle? As much as it pains me to admit it, this was an excellent puzzle. It made me think about UTF-8 encoding (which many, especially us old dumb-terminal types, overlook in favor of flat ASCII). It had a red herring (the ^ making me think of XOR). It had obvious, blatant signs that should have been seen, at least by experienced cryptographers. Like most good riddles, it had a simple, obvious, easy-to-execute solution. Also like most most good riddles, I felt like a complete idiot for having missed the answer.

Thanks, Fidelis, for reminding me to keep my eye on the basics, and for driving home the first rule of cryptanalysis, as defined by the late Robert Morris: “Check for plaintext.”

(view Archived Comments from the old site)