ShmooCon 2012 Badge Puzzle

February 3, 2012 5 comments

For three years running, I (or I with a co-worker) have been the first person to solve the ShmooCon Badge puzzle. (I’m also, I believe, the only outsider to have solved the 2008 badge puzzle, but that’s another post). Seems like it’s time for me to stop playing.

So I asked Heidi if I could do the puzzle this year, and she agreed. We went back and forth many times over a few weeks, and got a lot of advice and suggested changes from G. Mark Hardy (who’d written the last three puzzles). Finally, just a few days before everything had to go to the printers, we put a fork in it and decided the puzzle was “done.”

Since the theme for the con this year was, loosely, gearheads, I chose a puzzle with some mechanical/crypto components. All in all, there were seven gear-shaped badges, several images in the program, and a couple extra bits of crypto text.

As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. If you’d like a copy of just the raw data (ciphertexts and other clues revealed during the contest), click here.

The first element of the puzzle that everyone saw was, of course, the badge. In addition to the “expected” badge elements (ShmooCon, plus Speaker or Attendee or Staff), the badges featured:

  • Letters all around the edge, in the gear teeth,
  • A six-letter string at the bottom, and
  • A four-digit number at the top.

Hidden in the program was the first hint:

GET YOUR SHMOOS IN LINE! SLASH OR DOT! ONE’S HOT ONE’S NOT!

The hope was that this hint would encourage players to find the correct order for the badges, as well as the fact that they need to line up in some way. The trouble was, how do you order them? A natural guess would be to use the numbers, but are they supposed to go in numerical order or something else?

That was the point of the “slash or dot” element of the hint. If you were to add a slash to the numbers, where would you put it? Turns out, these numbers (selected by G. Mark) were actually the start dates for ShmooCon 1-7. Put them in the proper order by year, and that’s the order for the badges.

So badge data, in order, looks like this:

               Teeth (clockwise from top)
#  Bottom  Top   0 1 2 3 4 5 6 7 8 9 10 11
1  CGARIN  0204  Y I I E O E O E K T A  T
2  OEDNCE  0113  C O T R T A T U D W A  S
3  NATOKX  0323  H C O T M H C H S A U  C
4  NRONRT  0215  H A O T C N R L E U H  N
5  ESPEEE  0206  L E S K A E O K U D N  B
6  CRTCAT  0205  G A A D E R W W E S E  T
7  TEULDC  0128  A Y S H R H N Y L C S  E

That was “Stage 0.” The goal of Stage 1, then, is to read the message hidden in the six-character badge strings by stacking the badges in order and reading down the columns.

What’s interesting is that you could bypass the entire ShmooCon date index altogether. For example, if you looked at the frequency analysis for Stage 1 (the six-letter strings), you’d see something that looks remarkably like normal English text, with E being the most common letter, and C, N, and T tying for 2nd. It’s a small sample size, but even still definitely doesn’t look like a typical sustitution cipher output. This should tell the player that it’s some kind of transposition cipher — that is, the letters are simply scrambled, not changed.

So to solve this in that way, it’d be necessary to try to re-arrange the letters until you get words. As I mentioned later, “it’s best to try the easy approaches first,” so the easiest approach here would be to assume each six-letter string needs to stick together, and it’s just a question of re-arranging them to build words. If you look at the last letter of each row, there are only 5 letters in use: C, E, N, T, and X. So immediately, one might consider the words “CENT” and “NEXT.”

There are four ways to do this: The N and X remain constant, but you have to try two different rows for E and for T.

  CGARIN CGARIN CGARIN CGARIN
  OEDNCE ESPEEE ESPEEE OEDNCE
  NATOKX NATOKX NATOKX NATOKX
  CRTCAT NRONRT CRTCAT NRONRT

In two of these, the 2nd column spells “GSAR,” but in two of them, it spells “GEAR.” The other columns don’t do much, but there are only three other rows to try to add to the bottom to build new words with, so it should fall out pretty quickly. For example, if you add ESPEE next, then the first column becomes “CONCE” or “CONNE”, depending on what you picked for the fourth row, while the 3rd column ends with “TOP.” And so forth.

I don’t know if anyone actually tried this approach. I’m hoping some people at least considered it.

Regardless of whether you used the number index or just brute-forced the strings, the result of Stage 1 is instructions for Stage 2:

CONNECT GEARS READ TOP TURN ONE CLICK READ NEXT ETC

Again, thanks to G. Mark for taking one of our rough ideas (“wouldn’t it be cool if people had to actually connect the badges together and turn them to get a message?”) and making it into something that actually works. But how do you connect the gears? There was a hint for that, in the program:

Putting all the gears, in order, in an arrangement like that yields the first “machine” for this puzzle. To read this Stage 2 message, you’d:

  • Look at the top of a gear and read the letter
  • Turn the gears one click (top gears go clockwise, bottom counter-clockwise)
  • Look at the top of the next gear
  • Repeat

At the top of the first gear is “Y.” Turn the gears one click, and now the “O” that was at 1 o’clock on the 2nd gear is now at the top, so write down “O.” Turn gears again, and now the “U” that was originally at 10 o’clock on gear 3 is at the top. Keep doing this and eventually you get the message:

YOU TURNED THE GEARS
NOW REACH BACK
A CLASSIC CODE
YOU MUST ATTACK
WHO WON LAST THREE
HIS HANDLE THE KEY

I’d considered several different keys for Stage 3, but eventually picked my own handle. I did this partially because there’s a history of the puzzle-maker using his handle as a key or hint (about half of G. Mark’s puzzles feature GMARK as a key at some point). I also hoped that, in looking up my handle, players would find my writeups from past puzzles and see that one particular cipher appears again and again. That’s the “classic code” that’s used for Stage 3.

Plugging Stage 3 into a Vigenere decoder, then:

  • Ciphertext: JAKAL EPTYV UJXRR SEZVE KMORA PLUSG KVSCE
  • Key: DARTH NULL
  • Plaintext: GATHER VINS USE KEY TO SET THE GEARS PROFIT

Which brings us to Stage 4. As I was wandering around Friday night, I watched a table full of people working on the puzzle, and they’d already figured out how this stage works, even before they’d solved a single stage. Which was vaguely encouraging to me, to know that my contraption wasn’t that obscure.

Stage 4 required three elements: A ciphertext, a cipher, and a key.

The ciphertext was hidden on little “auto repair slips” scattered through the program (the “GATHER VINS” part of the clue). Collecting all 5, and putting the VINs in order based on the number in the middle of each, gives the following final ciphertext:

ZFFLKJBV1WHNNHPIB
FCJVBJRD2APJEYOPQ
HJJTZPQM3XLJYUQFH
ZJHIYZZP4WNJBNOVD
PVVEWBLG5SPHISYEJ

The cipher is a keystream-based cipher, where the keystream is generated by a 3-gear machine printed in the program.

That machine produces a different keystream (or, more accurately, a different segment of a single very long keystream) depending on what position the gears are initially set to. That starting position is the final “KEY” mentioned in the clue. In the image in the program, the gears are initally set to “TSG.”

But what’s the actual key you need to use? I thought and thought for a while on this one… Originally I wanted to use “OCT” (for ShmooCon 8), and figured that “10″ in octal would be an interesting sort of hint, until G. Mark reminded me that “10″ also looks like binary. Doh. So after about 20 minutes of brainstorming, we finally came up with “CAR” (Duh!!), and then I decided on something slightly more evil.

The key for the final stage is “KEY.”

I hope that annoyed…er, amused…at least some of the players.

Anyway, once you set the gears to start at KEY, you then read the top of each gear, turn, read the top of each gear, turn, etc. Not quite the same as the first machine, but I thought reasonably obvious (and as I said, at least one team figured that part out on Friday afternoon).

To solve this final stage, you take the ciphertext:

ZFFLKJBVWHNNHPIBFCJVBJRDAPJEYOPQHJJTZPQM
XLJYUQFHZJHIYZZPWNJBNOVDPVVEWBLGSPHISYEJ

and subtract the keystream:

BRLEKOXSIUJSDYKFBRYINNMPJWCATGCQWHCTOEMZ
RHQKYISLSJOGYIWBSVIKTMRLWNKTPBQCECGXEWUR

to get the plaintext. In this case, we’re numbering the alphabet from 0 (so A is 0, and Z is 25). So Z-B is 25-1 or 24, which is Y. F-R wraps around and gets you O., etc. You can also use the standard Vigenere tableau (it’s essentially the same operation, mathematically), or the “One Time Pad” tool on my favorite cipher puzzle site rumkin.com. No matter how you attack it, the final ciphertext decrypts to:

YOU HAVE DONE VERY WELL
NOW FOR THE FINAL CHALLENGE
TO WIN WHAT CAR DOES BRUCE STILL HAVE ON BLOCKS

I wasn’t in the building when the winning team came in with their answer, but apparently they actually walked up to Bruce and asked him. Informed that he would not be able to answer the question, the winners huddled over the con program for a while, then after additional input from Heidi, went off to “ask the Internet.” Ten minutes later, at about 12:40 on Saturday, they returned with the right answer: “Volvo.”

Congratulations to Mike Herms and Matthew Bocknek for solving the puzzle! I hope you enjoyed it.

(click here for the solution presentation from the closing ceremony.)

How to Lose $1000 in Vegas Without Even Gambling

August 30, 2011 1 comment

On July 15, Fidelis Security Solutions announced that they’d be running a crypto puzzle at Black Hat. And that the prize would be $1000. So, naturally, I was quite interested. I went to their site, downloaded the puzzle, and set to work:

^
¥Ð§µ    
¶®Æä
æ©×ä
÷ijŒĐ
ƆķėIJ
ŦůŶū
ƂƐƔƆ
ŦƉƶǴ
ƆƅƦƬ
džƹɇʃ

As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. The text above is all that you need to get started, or you can click here to see the ciphertext and the hints that were revealed during the conference.

It’s immediately obvious that we’re not looking at straight ASCII. I figured it would be UTF-8 encoded, and verified that quickly. But the question then was whether the decoding work should be in UTF-8 or if, for example, I needed to convert it to UTF-16 first. I even considered that maybe I needed to look to the official Unicode name for each character, instead of the binary representation of it. Here’s the hexdump of the ciphertext, in UTF-8 (with newlines dropped for clarity):

5e 
c2 a5 c3 90 c2 a7 c2 b5 
c2 b6 c2 ae c3 86 c3 a4
c3 a6 c2 a9 c3 97 c3 a4
c3 b7 c4 b3 c5 92 c4 90
c6 86 c4 b7 c4 97 c4 b2
c5 a6 c5 af c5 b6 c5 ab
c6 82 c6 90 c6 94 c6 86
c5 a6 c6 89 c6 b6 c7 b4
c6 86 c6 85 c6 a6 c6 ac
c7 86 c6 b9 c9 87 ca 83

The little ^ character at the beginning made me think of XOR — since in many languages, that’s the operator used for that. So I need to find some binary key stream that, when XORd with the ciphertext, will give me plaintext.

I played with that for a while, then watched their little promo video again. And there, at the very end of the video, the phrase “ALL YOUR ¥Ð§µ ARE BELONG TO US” zooms past the viewer. So “¥Ð§µ” == “BASE”? Okay, that’s something else I can work with.

Another interesting thing that I noticed: in UTF-16, the first byte of each two-byte pair (each character is represented by two bytes) increases gradually over the entire ciphertext:

005E 
00A5 00D0 00A7 00B5 
00B6 00AE 00C6 00E4 
00E6 00A9 00D7 00E4 
00F7 0133 0152 0110 
0186 0137 0117 0132 
0166 016F 0176 016B 
0182 0190 0194 0186 
0166 0189 01B6 01F4 
0186 0185 01A6 01AC 
01C6 01B9 0247 0283

In the UTF-8 version, the first byte varies among c2, c3, c4, c5, etc., but is generally increasing (just not quite as clearly in the UTF-8 version). So it seems pretty likely that the first byte is irrelevant.

After a few days, pulling the puzzle out every now and then, the original puzzle page was removed and replaced with something that essentially said “You’re too early. Come back later for the puzzle.” Damn — maybe what I’ve been playing with was just a teaser, and not the real puzzle. Okay, I’ll forget about it for a while (and finish my talk slides…)

Fast forward to Black Hat, I stop by the Fidelis booth. And discover that the original puzzle is in fact the real puzzle. Arrgh! I could’ve been working on this all along!

Every few hours over the course of the conference, they sent out hints via Twitter. Predictably, the first few hints don’t help me at all, though one tweet “get to know xxd” helps. As sending the original ciphertext through xxd would just give me a straight UTF-8 dump in hex, I now know I’m supposed to use that encoding and not convert to UTF-16.

But that’s not helping me any. They tell me in person that the “BASE” bit wasn’t meant to mean anything, so that was just a red herring. They also tweet some of the plaintext (“Fidelis” is in it), but still I’m getting nowhere. I tried writing some tools that’d drag “Fidelis” across the ciphertext, looking at what the XORd keystream would have to be in order to produce that plaintext. That gets me nowhere.

A later tweet says that there are only 20 characters in the plaintext. Ooooh, that changes things. Now I’m playing with XORing two bytes in a row (like A5 ^ D0, then A7 ^ B5, etc.) but again, no luck. Another hint tells me that ¥ and µ are the same, but that doesn’t help either (one’s encoded as A5, the other as B5), and if we’re talking about two-character sequences, then now I’m really confused).

Finally, at 1:07 on Thursday, they released this hint: “‘C2A5′ =~ /.{3}(.)/”. This tells me that for every four-character hex sequence, I only need to look at the last character.

I get off the escalator, pull up the ciphertext hex on my phone, and start decoding ASCII in my head. “P…u…n…d…” Oh, crap. Finding a corner to sit in, I open the hex in a text editor and pull up an ASCII chart. What I end up with is this:

a5 90 a7 b5 
b6 ae 86 a4
a6 a9 97 a4
b7 b3 92 90
86 b7 97 b2
a6 af b6 ab
82 90 94 86
a6 89 b6 b4
86 85 a6 ac
86 b9 87 83

becomes:

5 0 7 5 
6 e 6 4
6 9 7 4
7 3 2 0
6 7 7 2
6 f 6 b
2 0 4 6
6 9 6 4
6 5 6 c
6 9 7 3

which then becomes:

50 75 6e 64 69 74 ....

or

Pundits grok Fidelis

I immediately went to the Fidelis booth, walked up to Will (the creator of the puzzle), looked him in the eye, and simply said “Really? REALLY!?” That got a laugh. Apparently, in his words, I gave him “too much credit.” I was looking for an actual, cryptographic solution, when really the answer was staring me in the face THE ENTIRE TIME.

And if I’d taken the time to really think about it, I should have solved this in 10 minutes just by visual inspection. Remember when I said that I’d determined pretty quickly to drop the initial byte of each pair? The “c2″ and “c3″ and so forth? Well, part of figuring out whether it was UTF-8 or UTF-16 involved me looking at the Wikipedia pages for UTF. Which told me that when a pair begins with c2 through cf, the next byte MUST start with 8, 9, a, or b. So I knew, almost from the beginning, that not only did the first byte have no real bearing in the puzzle, but the next nybble (the first character of the 2nd byte) had no bearing either. But that fact just never registered.

In fact, hindsight has helped me to recognize not just one, but two different ways to look at the data and quickly solve the puzzle. Let’s look at the hex dump again (with column headers to make the discussion easier):


AB CD EF GH IJ KL MN OP
-----------------------
c2 a5 c3 90 c2 a7 c2 b5 
c2 b6 c2 ae c3 86 c3 a4
c3 a6 c2 a9 c3 97 c3 a4
c3 b7 c4 b3 c5 92 c4 90
c6 86 c4 b7 c4 97 c4 b2
c5 a6 c5 af c5 b6 c5 ab
c6 82 c6 90 c6 94 c6 86
c5 a6 c6 89 c6 b6 c7 b4
c6 86 c6 85 c6 a6 c6 ac
c7 86 c6 b9 c9 87 ca 83

Looking at the columns, it should have been even more obvious. I’ve already discussed dropping columns A, B, E, F, I, J, M, and N (I came to this conclusion shortly after I originally started the puzzle). But what I didn’t “grok,” but should have, was that I needed to drop columns C, G, K, and O as well. What’s left after that? Column D is either a 2, 5, 6, or 7. Column H is 0, 3, 5, 7, 9, e, or f. Column L: 2, 4, 6, or 7, and finally the last column, P, which is 0, 2, 3, 4, 5, 6, b, or c. So two columns (H and P) with truly random-looking numbers, and two columns (D and L) with 2, 4, 5, 6, or 7. In ASCII, a byte that begins with 4 or 5 is a capital letter, and 6 and 7 denote lowercase letters. Bytes beginning with 2 are punctuation — in this case, the 2 is always paired with a 0, or a space.

Another way to look at this, mathematically, is in terms of bits of entropy. Columns A, E, I, and M have 0 bits, since they never change. Columns B, F, J, and N have 4 bits, since they’re anything between 2 and a. Similarly, columns C, G, K, and O only bring 2 bits to the game (since 8, 9, a, and b are all 10xx in binary, it’s just the last 2 bits that change). And D and L have only 3 bits of entropy (2, 4, 5, 6, and 7 all fit within 0xxx in binary). Finishing it up, we have:


AB CD EF GH IJ KL MN OP
-----------------------
04 23 04 24 04 23 04 24

One can pretty readily see what might’ve been obscured before: That each line has 2 repeated pairs (04 23 04 24). Looking at the UTF-16 version, again, we see that the characters are taken from increasingly higher code pages in the Unicode alphabet…which further strengthens the supposition that columns B, F, J, and N are all meaningless (since they’re largely derived from the Unicode pages as well). So instead we have “00 23 00 24″ or just “23 24″. That’s 11 bits….but we only need 8 bits for letters (or really, 7 bits for ASCII). But wait — the 2s here are also largely driven by the Unicode layout…if we drop those, now our data has “03 04″ bits, or 7 bits total. Just enough to build ASCII data. And sure enough, that’s what it does.

The plaintext wasn’t even encrypted. Just hidden in noise.

I should have walked into their booth first thing Wednesday morning, handed them the solution, and walked away $1000 richer.

To say I was frustrated…well….that doesn’t begin to cover it.

So in the end….was this a good or bad puzzle? As much as it pains me to admit it, this was an excellent puzzle. It made me think about UTF-8 encoding (which many, especially us old dumb-terminal types, overlook in favor of flat ASCII). It had a red herring (the ^ making me think of XOR). It had obvious, blatant signs that should have been seen, at least by experienced cryptographers. Like most good riddles, it had a simple, obvious, easy-to-execute solution. Also like most most good riddles, I felt like a complete idiot for having missed the answer.

Thanks, Fidelis, for reminding me to keep my eye on the basics, and for driving home the first rule of cryptanalysis, as defined by the late Robert Morris: “Check for plaintext.”

First Anniversary

August 23, 2011 Leave a comment

A year ago today, I left the comfortable confines of an 18-year career in big-name Government contracting, and joined a very small security startup called Intrepidus Group.

It’s been an interesting year.

One major change — I’ve really stepped up my blogging. I’ve posted detailed analysis on issues ranging from the RSA breach (including a theoretical attack on their SecurID tokens) to the question of whether iPhones were tracking your location (I still say “no.”)

My research efforts have also expanded, resulting in two detailed white papers, the first describing a hack to build rainbow tables for UNIX crypt() passwords, and the second providing documentation for Apple’s iOS Mobile Device Management (MDM) protocol / API. Both those papers also included detailed code that people could use right away to further their own research.

This research also led to opportunities to speak at major information security conferences. In January, I spoke briefly to a huge crowd at ShmooCon on my rainbow table work as part of the closing panel, discussing passwords — past, present, and future. And just a few weeks ago, I headlined my own talk at Black Hat in Las Vegas, discussing the good, bad, and ugly of Apple’s iOS MDM system.

Speaking of iOS, I’ve even found a few interesting bugs, all related in one way or another to MDM. The biggest, of course, was a bit of an 0-day which I dropped during my Black Hat talk: Exploiting man-in-the-middle vulnerabilities in iOS MDM to accomplish an “Evil Maid” attack, and thus bypass secure passcodes on a locked iOS device. Full details are in the talk slides.

Speaking at two cons in a 6-month period was definitely a thrill, and I thank both ShmooCon and Black Hat for the chance to present my results to a broader audience. But speaking wasn’t the only thing I did for cons… Right when I joined Intrepidus, I learned that we’d signed up to rebuild the ShmooCon ticket sales system, and I jumped at the opportunity to tackle that challenge. We had some growing pains during the first round of sales (none of which were my fault, honest! the servers melted into a heap of slag long before my code was activated). In the end, Bruce and company fixed the server issues, and with the help of 3ric Johansen, I optimized my code significantly and the sales ended up going pretty well in the end.

Unfortunately, all that focus on ShmooCon meant that I negelected another project, Khan Fu. However, that’s beginning to spin back up again, and after supporting Black Hat, BSidesLV,and DEFCON, we’re ready to enhance and extend Khan Fu for next year’s con season.

I’ve also continued to have fun with crypto. I was first to solve the THOTCON 0×2 pre-sale puzzle, and also won the ShmooCon badge contest for the 3rd year running. I didn’t go to Toorcon or CarolinaCon, but was able to solve crypto contests for both of those at home, just for fun. Unfortunately, I also ran into my first major defeats: I was soundly beaten by Sak3bomb’s THOTCON 0×2 stego, and totally missed the incredibly obvious for Fidelis Security’s Black Hat challenge. (I blame that one on being pre-occupied by my talk. Yeah. That’s my story, and I’m sticking to it.)

All in all, though, it’s been a great year. I’ve learned a lot, I’ve done a lot, and I’ve worked with some incredibly smart and interesting people. I can’t wait to see what the next year brings!

Great Googly Moogly! I’m speaking at Black Hat!

July 28, 2011 1 comment

One week from today I’ll be presenting a talk at Black Hat. Black Hat! Wow. I’m still a little amazed at this turn of events, but am trying not to dwell on it for fear of slipping into a blind panic. :)

But I think I’m ready. I submitted a nice long white paper a couple of weeks ago, and sent in my presentation yesterday. I’m comfortable with the material. I (think) I’ll be able to intelligently field questions. I’m pretty sure I won’t be a complete, blithering idiot on stage. And to settle my nerves, I’ve put in an early order for a bottle of Drambuie. Though I think I’ll save that for the obligatory post-talk celebration.

Of course, this isn’t the first time I’ve spoken at a conference — I was lucky enough to get a spot on the closing panel at ShmooCon this past January. There were four of us on the panel, so I didn’t get to speak long (only about 10 minutes). But being the closing session, most of the con was there — perhaps as many as 1000 people. I haven’t seen the video, but people tell me that I did well, so I guess there’s really no reason to be nervous here.

I still have yet to write up anything about that ShmooCon appearance, and hopefully I’ll finally do something soon. There’s been quite a bit happening in the password cracking / authentication business in the past six months, and I have a lot of interesting ideas swirling around that I really need to put down for others to comment on. Maybe I’ll write some on the flight to Vegas. You know, to keep my mind off of my talk.

It’s actually my talk that I’m writing now, to, er, talk about. Since joining Intrepidus Group, I’ve spent a good deal of time helping to assess risk and craft security guidelines for iOS devices in large enterprises. A large part of securing iStuff in the enterprise relies upon the use of Mobile Device Management technology (MDM). MDM has been around for a while, especially for some of the older, more corporately-established mobile devices (like BlackBerry or Windows Mobile). Last summer, though, Apple jumped into the arena, adding support for their devices as part of iOS 4.0.

Unfortunately, the way that MDM works for iOS hasn’t been very well described, publicly. Which makes it difficult when you’re trying to demonstrate to a customer that it will make their enviroment more secure.

So I set about doing everything I could to understand, at a deep, technical level, exactly how the technology worked. We were already pretty satisifed, abstractly, with the features and capabilities of Apple’s MDM, but we felt it necessary to go that extra step to truly know what it’s doing. The end result of this is that we now have a mostly-complete understanding of how the protocol works.

Which is what I’ll be talking about next week. I start with how iOS settings work, move into additional features available through the iPhone Configuration Utility, and then start talking about MDM. The talk shows in detail how MDM uses the Apple Push Notification Service, and describes the message format used to make that notification. It’ll also document the interaction between device and server, from authentication and enrollment to receiving commands and providing responses. Enough detail is provided to enable you to write your own experimental MDM server (or, you could simply use the one I’ll be releasing at the talk).

Finally, I’ll talk about some limitations and weaknesses I’ve uncovered, and their potential security ramifications. There might even be a surprise for those hardy enough to sit through the whole talk.

This is going to be quite the experience for me. If your work involves securing iOS devices, especially at the enterprise level, please drop by and give a listen. If you can’t make it, check out the Intrepidus Group website after the conference — I hope to write up some of the more interesting bits of the talk for a standalone post, and we should also have the slides, white paper, and source code available for download at some point.

See you in Vegas!

Categories: Conferences, iOS, Security

DEF CON 16 Punch Card Puzzle

July 27, 2011 2 comments

Back in 2008, at DEF CON 16, G. Mark Hardy presented his second crypto challenge. I didn’t go to DC16, so I didn’t see the challenge (and even if I had, I wasn’t really tracking these at the time). But in 2010, at ShmooCon, he dusted the challenge off and handed it out again, as nobody had solved it yet. I’d managed, with a buddy, to solve the ShmooCon badge puzzle that year, and after I got home I started on the DC16 puzzle. It took me a few days, but I managed to beat it.

I’ve held off on writing this one up, because the original included a phone number, and I didn’t want to publish that without G. Mark’s approval. And though we’re in frequent contact, it wasn’t until recently that I remembered to ask him about it. At his request, I’ve modified the puzzle slightly, with a different phone number (which I’m sure you’ll recognize). The method to arrive at the solution is still the same as the original.

The puzzle was handed out in five pieces, each printed on old computer punch cards. Each card included some additional text and two lines of code. Here are the five cards (again, modified for a different endgame):

VFLASGGGGIUGAAGYBDAWHOEVHUUVLLHGJYOLGFGP
GHALGGGOAAGGJPLLHZIHBFMHWIHSRYOIFPMIFVTF

XBMGRMBULEMPBMSRGMEBYRGMGRGHFMAGNMRLRZOM
GXMJRMLNBMEMUAZEGNVSOSFCUMXDSLDPFFUMXDVY

BVQZWOOBPPUSAZJEAUBTMATDFAJTTAUIFDSAQPVI
PFTIBOPWAUFOFHFAAJBUGBQBBCNXLQJMBUJVQDGN

QRJRWGDNMCZQTGYRZGFWRLRJRUFRSYWWKARAGMLS
RRGSKGMWYZKGSREOAVSXAQRZWHDKEQICCVMVUSAQ

KCPNCEJPKPPAFFFZZKDKEPEPZZFXRCOKLAVDYDKO
XTXEJHKKPPEKECMSKKWAMCLAADOJDADZKSNXIJJQ

As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. The text above is all that you need to get started.

One of the first things I did was to try the simple attacks: ROT-13, for example. After those gained me nothing, I wrote a simple python script to output letter frequencies for each card. The results looked something like this:

A :    7     2     9     5     6   
B :    2     5     9     0     0   
C :    0     1     1     3     5   
D :    1     3     3     2     6   
E :    1     4     1     2     6   
F :    6     4     6     2     4   
G :   15     8     2     7     0   
H :    8     1     1     1     1   
I :    5     0     3     1     1   
J :    2     1     5     2     5   
K :    0     0     0     4    12   
L :    7     4     1     2     2   
M :    2    15     2     4     2   
N :    0     3     2     1     2   
O :    4     2     4     1     3   
P :    3     2     5     0     8   
Q :    0     0     5     5     1   
R :    1     7     0    12     1   
S :    2     4     2     6     2   
T :    1     0     5     1     1   
U :    3     4     6     2     0   
V :    4     2     3     3     1   
W :    2     0     2     6     1   
X :    0     4     1     1     4   
Y :    3     2     0     3     1   
Z :    1     2     2     4     5

So the five cards have distinctly different frequency distributions, but none of them are really flat. The first card had more Gs than any other letter, the second, slightly more Ms than Gs, etc. Pretty quickly I’d noticed a pattern: GMARK. I later saw this as a recurring theme in his puzzles, but this was the first time I’d seen it, and so I was kind of stoked. First, I tried shifting the letters back such that the most common letter was E, but that didn’t seem to look right. Remembering that he often uses Z for a space, I then shifted them back to Zs (G -> Z, M -> Z, etc.), and now my texts looked like this:

OYETLZZZZBNZTTZRUWTPAHXOANNOEEAZCRHEZYZI
ZATEZZZHTTZZCIEEASBAUYFAPBALKRHBYIFBYOMY

KOZTEZOHYRZCOZFETZROLETZTETUSZNTAZEYEMBZ
TKZWEZYAOZRZHNMRTAIFBFSPHZKQFYQCSSHZKQIL

AUPYVNNAOOTRZYIDZTASLZSCEZISSZTHECRZPOUH
OESHANOVZTENEGEZZIATFAPAABMWKPILATIUPCFM

YZRZEOLVUKHYBOGZHONEZTZRZCNZAGEESIZIOUTA
ZZOASOUEGHSOAZMWIDAFIYZHEPLSMYQKKDUDCAIY

ZRECRTYEZEEPUUUOOZSZTETEOOUMGRDZAPKSNSZD
MIMTYWZZEETZTRBHZZLPBRAPPSDYSPSOZHCMXYYF

But this still didn’t give me a cleartext. Some kind of wild guess made me think that I was dealing with a columnar transposition, which I’d never tried to break before. So I resolved to do this one, and to do it “by hand” (without resorting to brute-force computer programs). I tried some simple rearrangements of each card’s text, but got nowhere…

Then I realized, that I might be able to do an attack “in depth”: Since I had 5 different ciphertexts, if they were all encoded with the same key, then I could use bits of one to help solve another. I lined all the text up in five columns, and started trying to rearrange the rows such that words formed. For example, if I found a Q in the first column, I’d then look for another row with a U in the first column, and put them together. I did that for all the Qs I could find, then looked in the other columns to see if other obvious digraphs were being formed.

This way, I figured, I might start with “QUI” in one column, and notice “HIS” in another. Then I’d just have to put a row with “T” above HIS” and I’d have another word built. Repeat, and repeat, and eventually I’d solve all of them.

Except that this wasn’t how the puzzle worked. :(

As I realized that I was getting nowhere, I noticed that there were two rows which read “Z Y O U Z.” And for the first time, I saw the word “YOU” in the middle of two Zs. And realized that I was being an idiot.

I eliminated some spaces, to make it easier to read, and found the plaintext. [I was working vertically, but to save space I'll rotate it here, in two blocks. The first block is the 1st half of each card's shifted text, placed one on top of the next, the 2nd block is the same for the 2nd half of each card].

OYETLZZZZBNZTTZRUWTPAHXOANNOEEAZCRHEZYZI
KOZTEZOHYRZCOZFETZROLETZTETUSZNTAZEYEMBZ
AUPYVNNAOOTRZYIDZTASLZSCEZISSZTHECRZPOUH
YZRZEOLVUKHYBOGZHONEZTZRZCNZAGEESIZIOUTA
ZRECRTYEZEEPUUUOOZSZTETEOOUMGRDZAPKSNSZD

ZATEZZZHTTZZCIEEASBAUYFAPBALKRHBYIFBYOMY
TKZWEZYAOZRZHNMRTAIFBFSPHZKQFYQCSSHZKQIL
OESHANOVZTENEGEZZIATFAPAABMWKPILATIUPCFM
ZZOASOUEGHSOAZMWIDAFIYZHEPLSMYQKKDUDCAIY
MIMTYWZZEETZTRBHZZLPBRAPPSDYSPSOZHCMXYYF

Reading down each column in the 1st block, then continuing in the 2nd, we get:

OKAYZYOUZREZPRETTYZCLEVERZZNOTZONLYZHAVEZYOUZBROKE 
NZTHEZCRYPTOZBUTZYOUZFIGUREDZOUTZHOWZTOZTRANSPOSEZ 
ALLZTHEZTEXTSZTOZCREATEZONEZCONTINUOUSZMESSAGEZZGR 
ANTEDZTHEZCAESARZCIPHERZKEYZISZEPONYMOUSZBUTZIZHAD 
ZTOZMAKEZITZSOMEWHATZEASYZZNOWZYOUZHAVEZTOZGETZTHE 
ZRESTZZNOZCHEATINGZREMEMBERZWHATZIZSAIDZ

Or, cleaned up:

OKAY YOU RE PRETTY CLEVER  

NOT ONLY HAVE YOU BROKEN THE CRYPTO BUT YOU FIGURED OUT HOW 
TO TRANSPOSE ALL THE TEXTS TO CREATE ONE CONTINUOUS MESSAGE  

GRANTED THE CAESAR CIPHER KEY IS EPONYMOUS BUT I HAD TO MAKE 
IT SOMEWHAT EASY  

NOW YOU HAVE TO GET THE REST  

NO CHEATING REMEMBER WHAT I SAID

Woohoo! Of course, that’s not all. There’s still a block of text at the end that’s not right:

BAUYFAPBALKRHBYIFBYOMY
IFBFSPHZKQFYQCSSHZKQIL
ATFAPAABMWKPILATIUPCFM
AFIYZHEPLSMYQKKDUDCAIY
LPBRAPPSDYSPSOZHCMXYYF

So there’s more to decode. Fortunately, G. Mark gave us a big hint when he said “NO CHEATING.” That’s his clue, made clear in his Tales from the Crypto talk, that this stage requires the Playfair cipher. But what key? Well, for his Mardi Gras puzzle, he used the title of his talk, so what talk did he give at DEF CON 16? “A Hacker Looks Past Fifty.”

Plugging this into a friendly online Playfair decoder reveals the final cleartext:

TEXTTHEPHRASEFIFTYISNI
FTYTOSEVENTIMESSEVENFO
URTHREETIMESFOURTWOEIG
HTFIVEZERONINEANDTHEFI
RSTPERSONTOSOLVEWINSIT

Or, cleaned up:

TEXT THE PHRASE FIFTY IS NIFTY TO 
SEVEN TIMES SEVEN FOUR THREE TIMES FOUR TWO EIGHT FIVE ZERO NINE 
AND THE FIRST PERSON TO SOLVE WINS IT

Still not quite finished. So now we’ve got to do some math and number manipulation. At first, I thought it was several different multiplaction operations, somethng like:

7 * 7, 4, 3 * 4, 2, 8, 5, 0, 9 == 49 4 12 2 8 5 0 9 or 494-122-8509

I texted the phrase to that number, but got no response. After a while, I sent an email directly to G. Mark, who confirmed that I’d broken the cipher, but did the math wrong.

It wasn’t a bunch of separate operations, but a single operation, like this:

7 * 743 * 428509

Which yields the following (obviouly faked for this blog entry) phone number:

222 867 5309

This was a fun puzzle! I took some wrong turns, tried some new techniques, had some good luck, and made some stupid mistakes. A little of everything. Of course, tweaking the puzzle so I could (finally) publish the writeup was fun, too, especially factoring numbers to get them to fit into the ciphertext space available. Interesting bit of trivia: Turns out that 8675309 is a prime number. :)

CarolinaCon Flag Puzzle

May 8, 2011 1 comment

About two weeks ago, G. Mark Hardy asked if I was planning to attend CarolinaCon at the end of April. He had a puzzle set to go and was even thinking of using me as a clue. I replied that I wouldn’t be at the con, but would love to see the puzzle. So he sent me a copy.

Here is what he sent me, which was printed on the conference badge:

Unfortunately, I was already busy with another puzzle — THOTCON — and was eyeing a third (the Verizon DBIR). Plus, the Easter weekend was fast approaching. So I didn’t really have the time to hit it full force. But I did eventually solve the puzzle.

As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. The image above is all that you need to get started.

The cipher text, then, is just this:

OOAI YELL MBOP QXTY EBPL JJHQ KIPW FWAL VPHW OHYC ELJU WQCV CAIL AIJJ
RHNK UCNP JIGY XYJD WNAU LJCY GAIL VSNB WMTH GCLX XPTJ CWQI WRHA
BLCA EQMN XRKM VVQS PJXE OWHE SVGP HTTH EKSA VQKH YCTB MVRV XWNQ
QGPL RACG RLRF EFMW ITFP KHFS TPTZ UUBX XFVB SRSI WHCD JHZB VVUM
AYDY LKBF FEOA NTYF LZWP YWMY MMLG DMFL VIGU WGNA MQBP

Beyond that, there wasn’t much to go on. During the con, G. Mark tweeted a couple of clues trying to focus people on the flag — and to lead them to Google searches on Confederate cryptography. He also tried to help people recognize the kind of cipher it likely was, and those it was not.

Of course I didn’t need any of those hints. Having written an extensive post about a Civil War message, I not only knew what kind of cipher the Confederacy used, I also knew the three keys they used most frequently.

Not wanting to make it too easy on myself, I chose to try a crib first. I guessed the message might start with CONGRATULATIONS, and after three letters, I knew what the key was. But for illustration, here’s a way that one could have tested a crib using an online tool (I already discussed a more manual approach in the Civil War post).

A site I frequently use for crypto tools (and suggested by G. Mark in one of his hints at the con) is Rumkin Cipher tools. Using the Vigenère tool, enter the ciphertext and select “decrypt.” Then, instead of the key, enter the start-of-message crib. In this case, I tried “CONGRAT” (to account for the possibility it was abbreviated). Doing this gives something like this for the start of the plaintext:

key: CONGRAT
MANC HESJ YOIY QERK RVYL QHTD ERPD DINF EPOU AUSL
ESHG JKLV JYUY URJQ PTAE DCUN VVAH XFHP JHJU SHOL

So the first 6 letters represent the key that would spell CONGRAT in the plaintext. Change the key to MANCHES and now we see this:

key: MANCHES
CONG RATZ MOMI MFHY RZIH RXHD IBLE TWNJ OLPK OUWV
ATXU JOVR KOIY YBFR FHAI NYVD JVER TGXD JLTQ TXCL

Now, if this were the whole key, then we’d see words pop out later in the output. There’s “D IBLE” in the first line, but nothing anywhere else. So start adding As to the end of the key, and eventually we find:

key: MANCHESAAAAAAAA
CONG RATL MBOP QXTM EONE FRHQ KIPW FWOL INAS WHYC
ELJU WECI ATET AIJJ RHNK ICAN CEOY XYJD WNAI LWAR

It looks like “OLINAS” on the first line, which must be “CAROLINAS,” so figure out which letters in the key correspond to the WFW just in front of it, and change them to CAR.

key: MANCHESAAAAACAR
CONG RATL MBOP OXCM EONE FRHQ KIPU FFOL INAS WHYC
ELHU FECI ATET AIJJ RFNT ICAN CEOY XYJD UNJI LWAR

The three characters in question are now UFF, so that’s the next key fragment. Replace CAR with UFF and look for another place to stretch the key out:

key: MANCHESAAAAAUFF
CONG RATL MBOP WSOM EONE FRHQ KIPC AROL INAS WHYC
ELPP RECI ATET AIJJ RNIF ICAN CEOY XYJD CIVI LWAR

We’re definitely on the right track, as line 2 now includes “CIVIL WAR.” In the second line is “IF ICAN CE,” which is probably SIGNIFICANCE. Do the same trick: replace the end of the AAA with SIG, see the corresponding plaintext letters change to RBL, and change the letters in the key from SIG to RBL, and now we see:

key: MANCHESAARBLUFF
CONG RATL MKNE WSOM EONE FRHQ THEC AROL INAS WHYL
DAPP RECI ATET AISI GNIF ICAN CEOY XHIS CIVI LWAR

Let’s reformat to maybe make it easier to find the missing words:

CONGRAT LM KNEW SOMEONE FJHQ THE CAROLINAS OHYLD
APPRECIATE LAI SIGNIFICANCE GYXHIS CIVI LWAR

We still have two letters left to guess in the key, and there’s a two-letter bit in the first line that looks like it should be “SI.” Insert SI into the key, retrieve “TE” from the plaintext, put those in place of SI, and bingo:

key: MANCHESTER BLUFF

CONGRATS I KNEW SOMEONE FROM THE CAROLINAS WOULD
APPRECIATE THE SIGNIFICANCE OF THIS CIVIL WAR
CIPHER CH RHHH TAET FWPS BLWD RFHN ZEYI LMVM MXFH
JVDQ IFFL KFGT YQBD HGRA ASZW EPZN TXHB ZTKR FDJZ
PVVG MOCT PENN LBVV XZAK YHSQ MLBG QDAM DAQP SEQB
SPJZ SGOH QQIM NWWU TRXO ETUV IHYS JSSX FSVX BSGB
RMSJ OEOB SPMP SLWD

And bingo! We’re — wait, what? Dammit.

At this point, I was stumped for a while. For one: do I use the “decrypted” output of the first stage? One other G. Mark puzzle worked that way, so it seemed reasonable. Plus, that would make the second stage dependent upon solving the first. Or, should I just find the original cipertext that corresponds to what didn’t decrypt and use that?

In the end, I tried both avenues with a variety of approaches. I tried the other two commonly-used Confederate keys, ruled out Playfair and simple Caesar shifts, and just tried lots of different keys. I also tried dragging a crib back and forth. This is essentially the same as what I described above, but I try the word (“THE” is what I tried) against every position in the ciphertext, and hope that I’ll see an obvious 3-letter sequencde pop out. None of these met with any success.

I was sure this was a Vigenère, based on the historical connection, so I kept plugging away. In addition to crib dragging, I tried various other tests to help guess a key size, and even started noodling with some new techniques of my own devising. But no luck. (Though I did learn a lot more about Civil War cryptography in the process.)

After a few days not getting far, I regrouped and tried simplifying (per G. Mark’s inevitable admonition that I’m making it too complicated.) Looking at the remaining text, I decide to try an “offset” key. Basically, I took COMPLETE VICTORY and just started rolling letters off the beginning and onto the end. When I hit TORYCOMPLETEVIC I found success.

UNFORTUNATELY BAD CRYPTO MAY HAVE LED TO THE DEFEAT OF LEE IN THE WAR OF
NORTHERN AGGRESSION BUT YOU CAN MAKE UP FOR IT

But even that didn’t get everything. There’s still a block of cipher text at the end. Of course, now I know what to do. I simply put the entire original cipher text into the online applet and use each of the three Confederate keys in sequence. The first decoded the first block, when replaced with the second it decoded a chunk in the middle, and when I replaced it with COME RETRIBUTION the last message was decrypted:

TO CLAIM THE PRIZE FOR SOLVING THIS YOU MUST TELL G MARK THIS WHOLE TEXT
BY THE END OF THE CON

In the end, a very simple, almost trivial, solution. Especially since all the keys were available in the Wikipedia article on Vigenère. But mashing all three texts together the way he did totally ruined my attempts at traditional cryptanalysis. If I’d known there were three parts to the puzzle, I might’ve figured out the trick earlier. Maybe. Now I’m just trying to figure out if there’s an easy way to “discover” such partitions in the cipher text or if you just have to guess or stumble upon them.

But this was all before the con even happened. Once it started, I periodically checked Twitter to see if anyone was working the puzzle, and if so, whether they were making any progress. Early on, I saw a couple of people post links to the image, or to a pastebin copy of just the text, but not much beyond that. One person did suggest “POTOMAC RIVER,” probably as a possible key, as the battle flag originally came from the Confederate Army of the Potomac.

Finally, late on Sunday, I started to see a few people make progress. Then about 3:45, a tweet from Korotos to G. Mark said, simply, “Solved.” So congratulations to Korotos! :)

Knowing the secret, being “on the inside,” was an interesting change for me. It was a different challenge having to keep my mouth shut….and I’m glad I did. Both because to say anything would’ve been wrong (it’s not my game, after all!), but also because the few times I did think about what to say, I realized hours later that I would have given away too much. There’s an art to giving hints that are Just Good Enough…

So speaking of hints, what ever happened to the bit about using me as a hint? About midday Sunday, G. Mark tweeted this:

Hint: on CTF network was file named “.notthis”; contents were: a8979e8b df88908a 939bdfbb 9e8d8b97 dfb18a93 93df9b90 c0ff

The file name was a hint as to how to decode the hint: logically invert (or NOT) all the bits. Or, XOR with 0xFF, which is functionally the same. Doing this reveals the hint he’d warned me he might use:

What would Darth Null do?

I don’t know if anyone ever decoded the hint. I do know that nobody viewed my Civil War blogpost during the entire con, so if anyone did decode it, they didn’t take the next step. Of course, the first key was right there in my blog…and even without the hint, a Google search for “G. Mark confederate crypto puzzle” lists my blog as the first hint — proving that sometimes, the direct attack actually is the best choice.

Analysis of iOS Location Data from Multiple Devices

April 25, 2011 1 comment

This “Your iPhone Is Tracking Your Every Move!!” craziness just won’t go away. I’ve been kind of disappointed by the lack of very detailed analysis of the data that’s actually being collected, so I spent some time collecting information of my own.

I have access to four iOS devices running 4.0 or better: my personal iPhone 3GS, a family iPad with 3G subscription, a company-owned iPad (whose 3G has never been activated), and just arrived an iPad 2 that belongs to a client. So I spent some time this weekend trying to better understand what the Core Location daemons are doing.

First, please forgive me if I’m retreading already explored ground. Turns out that a few other people did the same thing this weekend, and so maybe I’m late to the party. I don’t want to be a “Me, too!” poster, but I also think there’s a little that I’ve found that I haven’t seen mentioned yet. Plus, I should mention the work of Alex Levinson, who looked at this in detail a year ago and has been a solid voice of reason from the beginning.

Anyway, first I’ll talk about some what I observed, then I’ll see if I can’t draw a few (hopefully valid) inferences. Some of the data were taken from the devices just as they were last week. Saturday, though, we went out to lunch and I took my phone, company iPad, and personal iPad all with me. During that trip, I kept the personal iPad locked the entire time, and I used the company iPad on the road (with Google Maps open the whole way). I used my phone briefly to make a call, and checked twitter a couple times while at the restaurant, and also for a while in a parking lot as my wife went into the grocery store.

First, the database.

I can see 5 tables within the consolidated.db that seem to be pertinent: CellLocation, CellLocationLocal, CellLocationHarvest, WifiLocation, and WifiLocationHarvest. All of these include details about speed, accurracy, elevation, and other such items that I’m not really concerned with (and many of which don’t seem to be used, at any rate). All also include a timestamp, latitude, and longitude, as well as some way of uniquely identifying the point it represents. In the case of a Wi-Fi access point, this is the MAC address, and in the case of a cell tower, it’s a tuple of four data items. Each entry in these tables appears to be unique — that is, no single cell tower or Wi-Fi access point appears more than once. Point 1: The devices are not tracking my every movement.

Now, my phone.

I see several access points noted all around my house. The accuracy isn’t phenomenal, as it puts my access point on my deck, and a neighbor’s in the middle of my kitchen. In fact, there are 11 different access points displayed either in my house, my yard, or just into my neighbors’ yards. Point 2: The Wi-Fi data points are not precisely located.

Also, the timestamps are varied. Four of the 11 around my house show a date/time from a couple days before I dumped the database (and another 4 are stamped two seconds later). But the other three are from early March, late February, and mid January. Point 3: The Wi-Fi data does not represent the last time I visited a location.

Finally, huge swaths are blanketed with data about Wi-Fi access points. Neighborhoods I’ve not driven through in months, if not years (or ever). These points share similar timestamps as the data within my neighborhood. Point 4: Data is present in the database for locations I’ve not visited.

The cell tower data is very similar. It shows towers located in areas I’ve not recently visited, with locations not corresponding to actual towers (in many cases, not even close — several were shown in residential communities where I’ve never seen a tower). The timestamps are similarly varied, with some I randomly clicked on going back to October 2010. Point 5: Cell tower data is treated the same as Wi-Fi access point data.

I did not see any new data points appear during the drive to the restaurant, or while we ate. However, a batch of data, both Cell and Wi-Fi, was timestamped while we sat outside the grocery store. The cell data, in particular, was scattered over a very wide area, at least several miles on a side. Point 6: Data appears for a wide area simultaneously, and is not necessarily tied to length of time sitting still.

Finally, I observed new data in the WifiLocationHarvest table. A total of 11 Wi-Fi access points were simultaneously recorded while I waited in the parking lot. The precision on this was pretty good — only about 50 feet from where I was sitting. Points 7 and 8: Actual recording of new data is not predictable, and is highly accurate.

Wi-Fi points near Greenbriar shopping center. Expanded red points from WifiHarvest.

I was also able to look at some past data on the phone. I took a one-day trip to Dallas at the end of March, and found large collections of data centered on the location I’d visited, the area I ate lunch, and three locations on the highway leading from the airport. Those locations roughly, I believe, correspond with times when I’d refreshed Google Map directions. Point 9: You may be able to force a data fetch by refreshing the maps application.

Next, iPads.

My family iPad, which I’d woken up before we left and promptly locked again, did not record any new data the entire time. Point 10: When locked, the device might not record anything at all.

The company iPad was in use the whole way to the restaurant. It has no record of any cell towers, which isn’t terribly surprising, since it does not have an active 3G data plan (though it does have the 3G hardware). Point 11: No data plan, no cell info.

Obviously, since there was no data plan, it couldn’t collect any new data along the way. However, as we left the grocery store, I unlocked the device, refreshed the map location, and locked it again. Once we’d returned home, the iPad fetched 394 Wi-Fi points, in an area about a 1/2 mile by 1/2 mile square, roughly corresponding to the place we were when I refreshed the map. All these data points were timestamped when they were fetched — that is, when the iPad had access to the Wi-Fi at home — not when I was actually on the road. Point 12: The device may cache your last request and fetch related data the next time a network is availble.

All three iPads showed a curious distribution of points around my office. The customers’s iPad, which has only been to the customer facility and my office, displayed points in a very short and wide rectangle centered on my office. My family iPad, which has only been a few placed since I loaded 4.0 on it, showed virtually the same distribution around the office and a similar distribution, but not as wide, around my house. Not all of these points had the same timestamp, but over time, it definitely started filling out that shape. Point 13: When fetching data, the device appears to collect points over a nearly-fixed vertical range (about 30 arcseconds of Latitude) and a variable horizontal range.

Finally, my wife had taken the family iPad on a short trip last weekend. The iPad showed a square burst of Wi-Fi data points about where she pulled over to check a map, and another wide rectangle around the hotel she stayed in. It also showed data in the CellLocationLocal table. That table showed her track along the interstate, and appeared to be an actual positional track. Interestingly, the CellLocation table did not have tower locations for virtually anywhere along that track. On my phone, I had two points from my Dallas trip, and a half-dozen points from a taxi ride into Manhattan a week prior. Point 14: The CellLocationLocal table may record actual trip data, but it appears to be very limited.

One further point of (potential) interest: The timestamps on the data were, if you’ll pardon the pun, all over the map. Many data sets had timestamps only a few seconds or minutes apart. But when I stripped out data sets that were within five minutes of another set of points, the average time between updates was about 14 hours. Note that there’s very little stastical rigor to this, but I thought it was interesting. Point 15: When the device spends an extended time at one place, data appears to be fetched about twice a day.

Summary of Observations

So, to sum up, here are my observations thus far:

  • Point 1: The devices are not tracking my every movement.
  • Point 2: The Wi-Fi data points are not precisely located.
  • Point 3: The Wi-Fi data does not represent the last time I visited a location.
  • Point 4: Data is present in the database for locations I’ve not visited.
  • Point 5: Cell tower data is treated the same as Wi-Fi access point data.
  • Point 6: Data appears for a wide area simultaneously, and is not necessarily tied to length of time sitting still.
  • Points 7 and 8: Actual recording of new data is not predictable, and is highly accurate.
  • Point 9: You may be able to force a data fetch by refreshing the maps application.
  • Point 10: When locked, the device might not record anything at all.
  • Point 11: No data plan, no cell info.
  • Point 12: The device may cache your last request and fetch related data the next time a network is available.
  • Point 13: When fetching data, the device appears to collect points over a nearly-fixed vertical range (about 30 arcseconds of Latitude) and a variable horizontal range.
  • Point 14: The CellLocationLocal table may record actual trip data, but it appears to be very limited.

What does all this tell us? I think we can infer at least a few things, which are consistent with what others have been saying, and with Apple’s statements last year.

  • The data in WifiLocation and CellLocation are not your device’s actual location at any given point in time, but instead are the location of others’ Wi-Fi access points and cell towers.
  • The location of these points are estimated by Apple based on data harvested by iOS devices and provided to Apple on a periodic basis.
  • Individual devices periodically record the Wi-Fi points and cell towers visible to them, record a precise location, and send that data to Apple. (I have not yet observed this happen, but it makes sense, and Apple’s already said as much).
  • Periodically, the device will poll Apple’s servers for location information nearby. This seems to happen when the device has been at rest for some time, or when the location information is refreshed in the map application (it may be reasonable to expect that other applications querying the Core Location service may also trigger a refresh). There may be some logic in terms of what data gets fetched, perhaps to avoid downloading duplicate information. I haven’t been able to dig into that yet.
  • The timestamp for the fetched data appear to be the time the data was fetched. One may be able to look in the middle of a set of identically-stamped data to infer where the user was when that data was fetched. However, the data don’t appear to be fetched every time you’re in any given location, even if you’re there for an extended time (like, say, lunch).

So what’s my conclusion? I’m still not sure about the CellLocationLocal table, which perhaps might be for recording locations for future data fetches. But the rest of the data all seem very consistent with what Apple’s told us: they’re used to aid in geolocating the device. Why are so many points stored? So that it won’t have to pull data down again in the future. It’s a big, personalized cache, made to make my personal use of geolocated features faster and more accurate.

[Note -- if you're interested in the python script I used to load the data into Google Earth, I'm posting it on the Intrepidus Group blog. It should be attached to this post from last week about my first review of the data.]

Categories: iOS, Security
Follow

Get every new post delivered to your Inbox.