Great Googly Moogly! I’m speaking at Black Hat!
One week from today I’ll be presenting a talk at Black Hat. Black Hat! Wow. I’m still a little amazed at this turn of events, but am trying not to dwell on it for fear of slipping into a blind panic.
But I think I’m ready. I submitted a nice long white paper a couple of weeks ago, and sent in my presentation yesterday. I’m comfortable with the material. I (think) I’ll be able to intelligently field questions. I’m pretty sure I won’t be a complete, blithering idiot on stage. And to settle my nerves, I’ve put in an early order for a bottle of Drambuie. Though I think I’ll save that for the obligatory post-talk celebration.
Of course, this isn’t the first time I’ve spoken at a conference — I was lucky enough to get a spot on the closing panel at ShmooCon this past January. There were four of us on the panel, so I didn’t get to speak long (only about 10 minutes). But being the closing session, most of the con was there — perhaps as many as 1000 people. I haven’t seen the video, but people tell me that I did well, so I guess there’s really no reason to be nervous here.
I still have yet to write up anything about that ShmooCon appearance, and hopefully I’ll finally do something soon. There’s been quite a bit happening in the password cracking / authentication business in the past six months, and I have a lot of interesting ideas swirling around that I really need to put down for others to comment on. Maybe I’ll write some on the flight to Vegas. You know, to keep my mind off of my talk.
It’s actually my talk that I’m writing now, to, er, talk about. Since joining Intrepidus Group, I’ve spent a good deal of time helping to assess risk and craft security guidelines for iOS devices in large enterprises. A large part of securing iStuff in the enterprise relies upon the use of Mobile Device Management technology (MDM). MDM has been around for a while, especially for some of the older, more corporately-established mobile devices (like BlackBerry or Windows Mobile). Last summer, though, Apple jumped into the arena, adding support for their devices as part of iOS 4.0.
Unfortunately, the way that MDM works for iOS hasn’t been very well described, publicly. Which makes it difficult when you’re trying to demonstrate to a customer that it will make their enviroment more secure.
So I set about doing everything I could to understand, at a deep, technical level, exactly how the technology worked. We were already pretty satisifed, abstractly, with the features and capabilities of Apple’s MDM, but we felt it necessary to go that extra step to truly know what it’s doing. The end result of this is that we now have a mostly-complete understanding of how the protocol works.
Which is what I’ll be talking about next week. I start with how iOS settings work, move into additional features available through the iPhone Configuration Utility, and then start talking about MDM. The talk shows in detail how MDM uses the Apple Push Notification Service, and describes the message format used to make that notification. It’ll also document the interaction between device and server, from authentication and enrollment to receiving commands and providing responses. Enough detail is provided to enable you to write your own experimental MDM server (or, you could simply use the one I’ll be releasing at the talk).
Finally, I’ll talk about some limitations and weaknesses I’ve uncovered, and their potential security ramifications. There might even be a surprise for those hardy enough to sit through the whole talk.
This is going to be quite the experience for me. If your work involves securing iOS devices, especially at the enterprise level, please drop by and give a listen. If you can’t make it, check out the Intrepidus Group website after the conference — I hope to write up some of the more interesting bits of the talk for a standalone post, and we should also have the slides, white paper, and source code available for download at some point.
See you in Vegas!
DEF CON 16 Punch Card Puzzle
Back in 2008, at DEF CON 16, G. Mark Hardy presented his second crypto challenge. I didn’t go to DC16, so I didn’t see the challenge (and even if I had, I wasn’t really tracking these at the time). But in 2010, at ShmooCon, he dusted the challenge off and handed it out again, as nobody had solved it yet. I’d managed, with a buddy, to solve the ShmooCon badge puzzle that year, and after I got home I started on the DC16 puzzle. It took me a few days, but I managed to beat it.
I’ve held off on writing this one up, because the original included a phone number, and I didn’t want to publish that without G. Mark’s approval. And though we’re in frequent contact, it wasn’t until recently that I remembered to ask him about it. At his request, I’ve modified the puzzle slightly, with a different phone number (which I’m sure you’ll recognize). The method to arrive at the solution is still the same as the original.
The puzzle was handed out in five pieces, each printed on old computer punch cards. Each card included some additional text and two lines of code. Here are the five cards (again, modified for a different endgame):
VFLASGGGGIUGAAGYBDAWHOEVHUUVLLHGJYOLGFGP
GHALGGGOAAGGJPLLHZIHBFMHWIHSRYOIFPMIFVTF
XBMGRMBULEMPBMSRGMEBYRGMGRGHFMAGNMRLRZOM
GXMJRMLNBMEMUAZEGNVSOSFCUMXDSLDPFFUMXDVY
BVQZWOOBPPUSAZJEAUBTMATDFAJTTAUIFDSAQPVI
PFTIBOPWAUFOFHFAAJBUGBQBBCNXLQJMBUJVQDGN
QRJRWGDNMCZQTGYRZGFWRLRJRUFRSYWWKARAGMLS
RRGSKGMWYZKGSREOAVSXAQRZWHDKEQICCVMVUSAQ
KCPNCEJPKPPAFFFZZKDKEPEPZZFXRCOKLAVDYDKO
XTXEJHKKPPEKECMSKKWAMCLAADOJDADZKSNXIJJQ
As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. The text above is all that you need to get started.
One of the first things I did was to try the simple attacks: ROT-13, for example. After those gained me nothing, I wrote a simple python script to output letter frequencies for each card. The results looked something like this:
A : 7 2 9 5 6
B : 2 5 9 0 0
C : 0 1 1 3 5
D : 1 3 3 2 6
E : 1 4 1 2 6
F : 6 4 6 2 4
G : 15 8 2 7 0
H : 8 1 1 1 1
I : 5 0 3 1 1
J : 2 1 5 2 5
K : 0 0 0 4 12
L : 7 4 1 2 2
M : 2 15 2 4 2
N : 0 3 2 1 2
O : 4 2 4 1 3
P : 3 2 5 0 8
Q : 0 0 5 5 1
R : 1 7 0 12 1
S : 2 4 2 6 2
T : 1 0 5 1 1
U : 3 4 6 2 0
V : 4 2 3 3 1
W : 2 0 2 6 1
X : 0 4 1 1 4
Y : 3 2 0 3 1
Z : 1 2 2 4 5
So the five cards have distinctly different frequency distributions, but none of them are really flat. The first card had more Gs than any other letter, the second, slightly more Ms than Gs, etc. Pretty quickly I’d noticed a pattern: GMARK. I later saw this as a recurring theme in his puzzles, but this was the first time I’d seen it, and so I was kind of stoked. First, I tried shifting the letters back such that the most common letter was E, but that didn’t seem to look right. Remembering that he often uses Z for a space, I then shifted them back to Zs (G -> Z, M -> Z, etc.), and now my texts looked like this:
OYETLZZZZBNZTTZRUWTPAHXOANNOEEAZCRHEZYZI
ZATEZZZHTTZZCIEEASBAUYFAPBALKRHBYIFBYOMY
KOZTEZOHYRZCOZFETZROLETZTETUSZNTAZEYEMBZ
TKZWEZYAOZRZHNMRTAIFBFSPHZKQFYQCSSHZKQIL
AUPYVNNAOOTRZYIDZTASLZSCEZISSZTHECRZPOUH
OESHANOVZTENEGEZZIATFAPAABMWKPILATIUPCFM
YZRZEOLVUKHYBOGZHONEZTZRZCNZAGEESIZIOUTA
ZZOASOUEGHSOAZMWIDAFIYZHEPLSMYQKKDUDCAIY
ZRECRTYEZEEPUUUOOZSZTETEOOUMGRDZAPKSNSZD
MIMTYWZZEETZTRBHZZLPBRAPPSDYSPSOZHCMXYYF
But this still didn’t give me a cleartext. Some kind of wild guess made me think that I was dealing with a columnar transposition, which I’d never tried to break before. So I resolved to do this one, and to do it “by hand” (without resorting to brute-force computer programs). I tried some simple rearrangements of each card’s text, but got nowhere…
Then I realized, that I might be able to do an attack “in depth”: Since I had 5 different ciphertexts, if they were all encoded with the same key, then I could use bits of one to help solve another. I lined all the text up in five columns, and started trying to rearrange the rows such that words formed. For example, if I found a Q in the first column, I’d then look for another row with a U in the first column, and put them together. I did that for all the Qs I could find, then looked in the other columns to see if other obvious digraphs were being formed.
This way, I figured, I might start with “QUI” in one column, and notice “HIS” in another. Then I’d just have to put a row with “T” above HIS” and I’d have another word built. Repeat, and repeat, and eventually I’d solve all of them.
Except that this wasn’t how the puzzle worked.
As I realized that I was getting nowhere, I noticed that there were two rows which read “Z Y O U Z.” And for the first time, I saw the word “YOU” in the middle of two Zs. And realized that I was being an idiot.
I eliminated some spaces, to make it easier to read, and found the plaintext. [I was working vertically, but to save space I'll rotate it here, in two blocks. The first block is the 1st half of each card's shifted text, placed one on top of the next, the 2nd block is the same for the 2nd half of each card].
OYETLZZZZBNZTTZRUWTPAHXOANNOEEAZCRHEZYZI
KOZTEZOHYRZCOZFETZROLETZTETUSZNTAZEYEMBZ
AUPYVNNAOOTRZYIDZTASLZSCEZISSZTHECRZPOUH
YZRZEOLVUKHYBOGZHONEZTZRZCNZAGEESIZIOUTA
ZRECRTYEZEEPUUUOOZSZTETEOOUMGRDZAPKSNSZD
ZATEZZZHTTZZCIEEASBAUYFAPBALKRHBYIFBYOMY
TKZWEZYAOZRZHNMRTAIFBFSPHZKQFYQCSSHZKQIL
OESHANOVZTENEGEZZIATFAPAABMWKPILATIUPCFM
ZZOASOUEGHSOAZMWIDAFIYZHEPLSMYQKKDUDCAIY
MIMTYWZZEETZTRBHZZLPBRAPPSDYSPSOZHCMXYYF
Reading down each column in the 1st block, then continuing in the 2nd, we get:
OKAYZYOUZREZPRETTYZCLEVERZZNOTZONLYZHAVEZYOUZBROKE
NZTHEZCRYPTOZBUTZYOUZFIGUREDZOUTZHOWZTOZTRANSPOSEZ
ALLZTHEZTEXTSZTOZCREATEZONEZCONTINUOUSZMESSAGEZZGR
ANTEDZTHEZCAESARZCIPHERZKEYZISZEPONYMOUSZBUTZIZHAD
ZTOZMAKEZITZSOMEWHATZEASYZZNOWZYOUZHAVEZTOZGETZTHE
ZRESTZZNOZCHEATINGZREMEMBERZWHATZIZSAIDZ
Or, cleaned up:
OKAY YOU RE PRETTY CLEVER
NOT ONLY HAVE YOU BROKEN THE CRYPTO BUT YOU FIGURED OUT HOW
TO TRANSPOSE ALL THE TEXTS TO CREATE ONE CONTINUOUS MESSAGE
GRANTED THE CAESAR CIPHER KEY IS EPONYMOUS BUT I HAD TO MAKE
IT SOMEWHAT EASY
NOW YOU HAVE TO GET THE REST
NO CHEATING REMEMBER WHAT I SAID
Woohoo! Of course, that’s not all. There’s still a block of text at the end that’s not right:
BAUYFAPBALKRHBYIFBYOMY
IFBFSPHZKQFYQCSSHZKQIL
ATFAPAABMWKPILATIUPCFM
AFIYZHEPLSMYQKKDUDCAIY
LPBRAPPSDYSPSOZHCMXYYF
So there’s more to decode. Fortunately, G. Mark gave us a big hint when he said “NO CHEATING.” That’s his clue, made clear in his Tales from the Crypto talk, that this stage requires the Playfair cipher. But what key? Well, for his Mardi Gras puzzle, he used the title of his talk, so what talk did he give at DEF CON 16? “A Hacker Looks Past Fifty.”
Plugging this into a friendly online Playfair decoder reveals the final cleartext:
TEXTTHEPHRASEFIFTYISNI
FTYTOSEVENTIMESSEVENFO
URTHREETIMESFOURTWOEIG
HTFIVEZERONINEANDTHEFI
RSTPERSONTOSOLVEWINSIT
Or, cleaned up:
TEXT THE PHRASE FIFTY IS NIFTY TO
SEVEN TIMES SEVEN FOUR THREE TIMES FOUR TWO EIGHT FIVE ZERO NINE
AND THE FIRST PERSON TO SOLVE WINS IT
Still not quite finished. So now we’ve got to do some math and number manipulation. At first, I thought it was several different multiplaction operations, somethng like:
7 * 7, 4, 3 * 4, 2, 8, 5, 0, 9 == 49 4 12 2 8 5 0 9 or 494-122-8509
I texted the phrase to that number, but got no response. After a while, I sent an email directly to G. Mark, who confirmed that I’d broken the cipher, but did the math wrong.
It wasn’t a bunch of separate operations, but a single operation, like this:
7 * 743 * 428509
Which yields the following (obviouly faked for this blog entry) phone number:
222 867 5309
This was a fun puzzle! I took some wrong turns, tried some new techniques, had some good luck, and made some stupid mistakes. A little of everything. Of course, tweaking the puzzle so I could (finally) publish the writeup was fun, too, especially factoring numbers to get them to fit into the ciphertext space available. Interesting bit of trivia: Turns out that 8675309 is a prime number.
CarolinaCon Flag Puzzle
About two weeks ago, G. Mark Hardy asked if I was planning to attend CarolinaCon at the end of April. He had a puzzle set to go and was even thinking of using me as a clue. I replied that I wouldn’t be at the con, but would love to see the puzzle. So he sent me a copy.
Here is what he sent me, which was printed on the conference badge:
Unfortunately, I was already busy with another puzzle — THOTCON — and was eyeing a third (the Verizon DBIR). Plus, the Easter weekend was fast approaching. So I didn’t really have the time to hit it full force. But I did eventually solve the puzzle.
As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. The image above is all that you need to get started.
The cipher text, then, is just this:
OOAI YELL MBOP QXTY EBPL JJHQ KIPW FWAL VPHW OHYC ELJU WQCV CAIL AIJJ
RHNK UCNP JIGY XYJD WNAU LJCY GAIL VSNB WMTH GCLX XPTJ CWQI WRHA
BLCA EQMN XRKM VVQS PJXE OWHE SVGP HTTH EKSA VQKH YCTB MVRV XWNQ
QGPL RACG RLRF EFMW ITFP KHFS TPTZ UUBX XFVB SRSI WHCD JHZB VVUM
AYDY LKBF FEOA NTYF LZWP YWMY MMLG DMFL VIGU WGNA MQBP
Beyond that, there wasn’t much to go on. During the con, G. Mark tweeted a couple of clues trying to focus people on the flag — and to lead them to Google searches on Confederate cryptography. He also tried to help people recognize the kind of cipher it likely was, and those it was not.
Of course I didn’t need any of those hints. Having written an extensive post about a Civil War message, I not only knew what kind of cipher the Confederacy used, I also knew the three keys they used most frequently.
Not wanting to make it too easy on myself, I chose to try a crib first. I guessed the message might start with CONGRATULATIONS, and after three letters, I knew what the key was. But for illustration, here’s a way that one could have tested a crib using an online tool (I already discussed a more manual approach in the Civil War post).
A site I frequently use for crypto tools (and suggested by G. Mark in one of his hints at the con) is Rumkin Cipher tools. Using the Vigenère tool, enter the ciphertext and select “decrypt.” Then, instead of the key, enter the start-of-message crib. In this case, I tried “CONGRAT” (to account for the possibility it was abbreviated). Doing this gives something like this for the start of the plaintext:
key: CONGRAT
MANC HESJ YOIY QERK RVYL QHTD ERPD DINF EPOU AUSL
ESHG JKLV JYUY URJQ PTAE DCUN VVAH XFHP JHJU SHOL
So the first 6 letters represent the key that would spell CONGRAT in the plaintext. Change the key to MANCHES and now we see this:
key: MANCHES
CONG RATZ MOMI MFHY RZIH RXHD IBLE TWNJ OLPK OUWV
ATXU JOVR KOIY YBFR FHAI NYVD JVER TGXD JLTQ TXCL
Now, if this were the whole key, then we’d see words pop out later in the output. There’s “D IBLE” in the first line, but nothing anywhere else. So start adding As to the end of the key, and eventually we find:
key: MANCHESAAAAAAAA
CONG RATL MBOP QXTM EONE FRHQ KIPW FWOL INAS WHYC
ELJU WECI ATET AIJJ RHNK ICAN CEOY XYJD WNAI LWAR
It looks like “OLINAS” on the first line, which must be “CAROLINAS,” so figure out which letters in the key correspond to the WFW just in front of it, and change them to CAR.
key: MANCHESAAAAACAR
CONG RATL MBOP OXCM EONE FRHQ KIPU FFOL INAS WHYC
ELHU FECI ATET AIJJ RFNT ICAN CEOY XYJD UNJI LWAR
The three characters in question are now UFF, so that’s the next key fragment. Replace CAR with UFF and look for another place to stretch the key out:
key: MANCHESAAAAAUFF
CONG RATL MBOP WSOM EONE FRHQ KIPC AROL INAS WHYC
ELPP RECI ATET AIJJ RNIF ICAN CEOY XYJD CIVI LWAR
We’re definitely on the right track, as line 2 now includes “CIVIL WAR.” In the second line is “IF ICAN CE,” which is probably SIGNIFICANCE. Do the same trick: replace the end of the AAA with SIG, see the corresponding plaintext letters change to RBL, and change the letters in the key from SIG to RBL, and now we see:
key: MANCHESAARBLUFF
CONG RATL MKNE WSOM EONE FRHQ THEC AROL INAS WHYL
DAPP RECI ATET AISI GNIF ICAN CEOY XHIS CIVI LWAR
Let’s reformat to maybe make it easier to find the missing words:
CONGRAT LM KNEW SOMEONE FJHQ THE CAROLINAS OHYLD
APPRECIATE LAI SIGNIFICANCE GYXHIS CIVI LWAR
We still have two letters left to guess in the key, and there’s a two-letter bit in the first line that looks like it should be “SI.” Insert SI into the key, retrieve “TE” from the plaintext, put those in place of SI, and bingo:
key: MANCHESTER BLUFF
CONGRATS I KNEW SOMEONE FROM THE CAROLINAS WOULD
APPRECIATE THE SIGNIFICANCE OF THIS CIVIL WAR
CIPHER CH RHHH TAET FWPS BLWD RFHN ZEYI LMVM MXFH
JVDQ IFFL KFGT YQBD HGRA ASZW EPZN TXHB ZTKR FDJZ
PVVG MOCT PENN LBVV XZAK YHSQ MLBG QDAM DAQP SEQB
SPJZ SGOH QQIM NWWU TRXO ETUV IHYS JSSX FSVX BSGB
RMSJ OEOB SPMP SLWD
And bingo! We’re — wait, what? Dammit.
At this point, I was stumped for a while. For one: do I use the “decrypted” output of the first stage? One other G. Mark puzzle worked that way, so it seemed reasonable. Plus, that would make the second stage dependent upon solving the first. Or, should I just find the original cipertext that corresponds to what didn’t decrypt and use that?
In the end, I tried both avenues with a variety of approaches. I tried the other two commonly-used Confederate keys, ruled out Playfair and simple Caesar shifts, and just tried lots of different keys. I also tried dragging a crib back and forth. This is essentially the same as what I described above, but I try the word (“THE” is what I tried) against every position in the ciphertext, and hope that I’ll see an obvious 3-letter sequencde pop out. None of these met with any success.
I was sure this was a Vigenère, based on the historical connection, so I kept plugging away. In addition to crib dragging, I tried various other tests to help guess a key size, and even started noodling with some new techniques of my own devising. But no luck. (Though I did learn a lot more about Civil War cryptography in the process.)
After a few days not getting far, I regrouped and tried simplifying (per G. Mark’s inevitable admonition that I’m making it too complicated.) Looking at the remaining text, I decide to try an “offset” key. Basically, I took COMPLETE VICTORY and just started rolling letters off the beginning and onto the end. When I hit TORYCOMPLETEVIC I found success.
UNFORTUNATELY BAD CRYPTO MAY HAVE LED TO THE DEFEAT OF LEE IN THE WAR OF
NORTHERN AGGRESSION BUT YOU CAN MAKE UP FOR IT
But even that didn’t get everything. There’s still a block of cipher text at the end. Of course, now I know what to do. I simply put the entire original cipher text into the online applet and use each of the three Confederate keys in sequence. The first decoded the first block, when replaced with the second it decoded a chunk in the middle, and when I replaced it with COME RETRIBUTION the last message was decrypted:
TO CLAIM THE PRIZE FOR SOLVING THIS YOU MUST TELL G MARK THIS WHOLE TEXT
BY THE END OF THE CON
In the end, a very simple, almost trivial, solution. Especially since all the keys were available in the Wikipedia article on Vigenère. But mashing all three texts together the way he did totally ruined my attempts at traditional cryptanalysis. If I’d known there were three parts to the puzzle, I might’ve figured out the trick earlier. Maybe. Now I’m just trying to figure out if there’s an easy way to “discover” such partitions in the cipher text or if you just have to guess or stumble upon them.
But this was all before the con even happened. Once it started, I periodically checked Twitter to see if anyone was working the puzzle, and if so, whether they were making any progress. Early on, I saw a couple of people post links to the image, or to a pastebin copy of just the text, but not much beyond that. One person did suggest “POTOMAC RIVER,” probably as a possible key, as the battle flag originally came from the Confederate Army of the Potomac.
Finally, late on Sunday, I started to see a few people make progress. Then about 3:45, a tweet from Korotos to G. Mark said, simply, “Solved.” So congratulations to Korotos!
Knowing the secret, being “on the inside,” was an interesting change for me. It was a different challenge having to keep my mouth shut….and I’m glad I did. Both because to say anything would’ve been wrong (it’s not my game, after all!), but also because the few times I did think about what to say, I realized hours later that I would have given away too much. There’s an art to giving hints that are Just Good Enough…
So speaking of hints, what ever happened to the bit about using me as a hint? About midday Sunday, G. Mark tweeted this:
Hint: on CTF network was file named “.notthis”; contents were: a8979e8b df88908a 939bdfbb 9e8d8b97 dfb18a93 93df9b90 c0ff
The file name was a hint as to how to decode the hint: logically invert (or NOT) all the bits. Or, XOR with 0xFF, which is functionally the same. Doing this reveals the hint he’d warned me he might use:
What would Darth Null do?
I don’t know if anyone ever decoded the hint. I do know that nobody viewed my Civil War blogpost during the entire con, so if anyone did decode it, they didn’t take the next step. Of course, the first key was right there in my blog…and even without the hint, a Google search for “G. Mark confederate crypto puzzle” lists my blog as the first hint — proving that sometimes, the direct attack actually is the best choice.
Analysis of iOS Location Data from Multiple Devices
This “Your iPhone Is Tracking Your Every Move!!” craziness just won’t go away. I’ve been kind of disappointed by the lack of very detailed analysis of the data that’s actually being collected, so I spent some time collecting information of my own.
I have access to four iOS devices running 4.0 or better: my personal iPhone 3GS, a family iPad with 3G subscription, a company-owned iPad (whose 3G has never been activated), and just arrived an iPad 2 that belongs to a client. So I spent some time this weekend trying to better understand what the Core Location daemons are doing.
First, please forgive me if I’m retreading already explored ground. Turns out that a few other people did the same thing this weekend, and so maybe I’m late to the party. I don’t want to be a “Me, too!” poster, but I also think there’s a little that I’ve found that I haven’t seen mentioned yet. Plus, I should mention the work of Alex Levinson, who looked at this in detail a year ago and has been a solid voice of reason from the beginning.
Anyway, first I’ll talk about some what I observed, then I’ll see if I can’t draw a few (hopefully valid) inferences. Some of the data were taken from the devices just as they were last week. Saturday, though, we went out to lunch and I took my phone, company iPad, and personal iPad all with me. During that trip, I kept the personal iPad locked the entire time, and I used the company iPad on the road (with Google Maps open the whole way). I used my phone briefly to make a call, and checked twitter a couple times while at the restaurant, and also for a while in a parking lot as my wife went into the grocery store.
First, the database.
I can see 5 tables within the consolidated.db that seem to be pertinent: CellLocation, CellLocationLocal, CellLocationHarvest, WifiLocation, and WifiLocationHarvest. All of these include details about speed, accurracy, elevation, and other such items that I’m not really concerned with (and many of which don’t seem to be used, at any rate). All also include a timestamp, latitude, and longitude, as well as some way of uniquely identifying the point it represents. In the case of a Wi-Fi access point, this is the MAC address, and in the case of a cell tower, it’s a tuple of four data items. Each entry in these tables appears to be unique — that is, no single cell tower or Wi-Fi access point appears more than once. Point 1: The devices are not tracking my every movement.
Now, my phone.
I see several access points noted all around my house. The accuracy isn’t phenomenal, as it puts my access point on my deck, and a neighbor’s in the middle of my kitchen. In fact, there are 11 different access points displayed either in my house, my yard, or just into my neighbors’ yards. Point 2: The Wi-Fi data points are not precisely located.
Also, the timestamps are varied. Four of the 11 around my house show a date/time from a couple days before I dumped the database (and another 4 are stamped two seconds later). But the other three are from early March, late February, and mid January. Point 3: The Wi-Fi data does not represent the last time I visited a location.
Finally, huge swaths are blanketed with data about Wi-Fi access points. Neighborhoods I’ve not driven through in months, if not years (or ever). These points share similar timestamps as the data within my neighborhood. Point 4: Data is present in the database for locations I’ve not visited.
The cell tower data is very similar. It shows towers located in areas I’ve not recently visited, with locations not corresponding to actual towers (in many cases, not even close — several were shown in residential communities where I’ve never seen a tower). The timestamps are similarly varied, with some I randomly clicked on going back to October 2010. Point 5: Cell tower data is treated the same as Wi-Fi access point data.
I did not see any new data points appear during the drive to the restaurant, or while we ate. However, a batch of data, both Cell and Wi-Fi, was timestamped while we sat outside the grocery store. The cell data, in particular, was scattered over a very wide area, at least several miles on a side. Point 6: Data appears for a wide area simultaneously, and is not necessarily tied to length of time sitting still.
Finally, I observed new data in the WifiLocationHarvest table. A total of 11 Wi-Fi access points were simultaneously recorded while I waited in the parking lot. The precision on this was pretty good — only about 50 feet from where I was sitting. Points 7 and 8: Actual recording of new data is not predictable, and is highly accurate.
I was also able to look at some past data on the phone. I took a one-day trip to Dallas at the end of March, and found large collections of data centered on the location I’d visited, the area I ate lunch, and three locations on the highway leading from the airport. Those locations roughly, I believe, correspond with times when I’d refreshed Google Map directions. Point 9: You may be able to force a data fetch by refreshing the maps application.
Next, iPads.
My family iPad, which I’d woken up before we left and promptly locked again, did not record any new data the entire time. Point 10: When locked, the device might not record anything at all.
The company iPad was in use the whole way to the restaurant. It has no record of any cell towers, which isn’t terribly surprising, since it does not have an active 3G data plan (though it does have the 3G hardware). Point 11: No data plan, no cell info.
Obviously, since there was no data plan, it couldn’t collect any new data along the way. However, as we left the grocery store, I unlocked the device, refreshed the map location, and locked it again. Once we’d returned home, the iPad fetched 394 Wi-Fi points, in an area about a 1/2 mile by 1/2 mile square, roughly corresponding to the place we were when I refreshed the map. All these data points were timestamped when they were fetched — that is, when the iPad had access to the Wi-Fi at home — not when I was actually on the road. Point 12: The device may cache your last request and fetch related data the next time a network is availble.
All three iPads showed a curious distribution of points around my office. The customers’s iPad, which has only been to the customer facility and my office, displayed points in a very short and wide rectangle centered on my office. My family iPad, which has only been a few placed since I loaded 4.0 on it, showed virtually the same distribution around the office and a similar distribution, but not as wide, around my house. Not all of these points had the same timestamp, but over time, it definitely started filling out that shape. Point 13: When fetching data, the device appears to collect points over a nearly-fixed vertical range (about 30 arcseconds of Latitude) and a variable horizontal range.
Finally, my wife had taken the family iPad on a short trip last weekend. The iPad showed a square burst of Wi-Fi data points about where she pulled over to check a map, and another wide rectangle around the hotel she stayed in. It also showed data in the CellLocationLocal table. That table showed her track along the interstate, and appeared to be an actual positional track. Interestingly, the CellLocation table did not have tower locations for virtually anywhere along that track. On my phone, I had two points from my Dallas trip, and a half-dozen points from a taxi ride into Manhattan a week prior. Point 14: The CellLocationLocal table may record actual trip data, but it appears to be very limited.
One further point of (potential) interest: The timestamps on the data were, if you’ll pardon the pun, all over the map. Many data sets had timestamps only a few seconds or minutes apart. But when I stripped out data sets that were within five minutes of another set of points, the average time between updates was about 14 hours. Note that there’s very little stastical rigor to this, but I thought it was interesting. Point 15: When the device spends an extended time at one place, data appears to be fetched about twice a day.
Summary of Observations
So, to sum up, here are my observations thus far:
- Point 1: The devices are not tracking my every movement.
- Point 2: The Wi-Fi data points are not precisely located.
- Point 3: The Wi-Fi data does not represent the last time I visited a location.
- Point 4: Data is present in the database for locations I’ve not visited.
- Point 5: Cell tower data is treated the same as Wi-Fi access point data.
- Point 6: Data appears for a wide area simultaneously, and is not necessarily tied to length of time sitting still.
- Points 7 and 8: Actual recording of new data is not predictable, and is highly accurate.
- Point 9: You may be able to force a data fetch by refreshing the maps application.
- Point 10: When locked, the device might not record anything at all.
- Point 11: No data plan, no cell info.
- Point 12: The device may cache your last request and fetch related data the next time a network is available.
- Point 13: When fetching data, the device appears to collect points over a nearly-fixed vertical range (about 30 arcseconds of Latitude) and a variable horizontal range.
- Point 14: The CellLocationLocal table may record actual trip data, but it appears to be very limited.
What does all this tell us? I think we can infer at least a few things, which are consistent with what others have been saying, and with Apple’s statements last year.
- The data in WifiLocation and CellLocation are not your device’s actual location at any given point in time, but instead are the location of others’ Wi-Fi access points and cell towers.
- The location of these points are estimated by Apple based on data harvested by iOS devices and provided to Apple on a periodic basis.
- Individual devices periodically record the Wi-Fi points and cell towers visible to them, record a precise location, and send that data to Apple. (I have not yet observed this happen, but it makes sense, and Apple’s already said as much).
- Periodically, the device will poll Apple’s servers for location information nearby. This seems to happen when the device has been at rest for some time, or when the location information is refreshed in the map application (it may be reasonable to expect that other applications querying the Core Location service may also trigger a refresh). There may be some logic in terms of what data gets fetched, perhaps to avoid downloading duplicate information. I haven’t been able to dig into that yet.
- The timestamp for the fetched data appear to be the time the data was fetched. One may be able to look in the middle of a set of identically-stamped data to infer where the user was when that data was fetched. However, the data don’t appear to be fetched every time you’re in any given location, even if you’re there for an extended time (like, say, lunch).
So what’s my conclusion? I’m still not sure about the CellLocationLocal table, which perhaps might be for recording locations for future data fetches. But the rest of the data all seem very consistent with what Apple’s told us: they’re used to aid in geolocating the device. Why are so many points stored? So that it won’t have to pull data down again in the future. It’s a big, personalized cache, made to make my personal use of geolocated features faster and more accurate.
[Note -- if you're interested in the python script I used to load the data into Google Earth, I'm posting it on the Intrepidus Group blog. It should be attached to this post from last week about my first review of the data.]
The 2009 Verizon Data Breach Investigation Report
In 2009, the Verizon Business Risk Team released their first public Data Breach Investigations Report. I saw it reasonably soon after release, and noticed a whole bunch of binary numbers in the background on the cover. “Cool,” I thought, but I didn’t bother trying to decode it. A week or so later, I learned that there’d been a contest, and I missed out.
In 2010, I was ready, and tried to solve the puzzle, but failed. That story comes later.
But now, on the eve of the release of the 2011 DBIR, I’m finally documenting the method needed to solve these puzzles. Here’s a quick, fresh look at the 2009 puzzle.
As always, if you’d like to try to solve this yourself, then STOP now, as the rest of this post is full of spoilers. If you’d like a copy of just the raw data (in this case, two ciphertexts), click here.
So, I vaguely remembered how this worked. And also that it was a very simple puzzle. Let’s see how quickly I can solve it, without digging too deep into my memory for what needed to get done. First, I pulled down the original PDF. And there, all over the background of the cover, is a whole bunch of binary numbers. Highlight, copy, and paste out into a file.
First, how do we break up the numbers? 8-bits? 7-bits? I removed the line breaks, counted, and divided by 8, but didn’t get an even number of bytes. Found some text in the middle, removed that. Now count again — ah, beter. Looks like it’s 900 8-bit characters.
Next up — a simple script to decode the binary. Doesn’t take more than a few minutes, and now I’ve got a big block of ciphertext.
EVNTXIGYIMWSNEHEIEFOTXBSCWYHRQMWGUZABVYCBBFREYFBVEDKEVMFRIFN
GFNRBFGVKSFPNBUFZJGCEEEWAKHPXEBTZJCZOWGTBSQGTMIAYDPYDRIRYETK
CJRPYHEPWKUOAEKNVTVZHSMZNTTIVIKMMRYSNUIAKBRKQMSTYCGCCRLRRIIR
EFGYTJUBUXHEYSGLEYRVHIYXDEYZCJKVTOSOIXJEHOXEVMWJBNZMTKWZEFOF
CNBWNCUWMYFIUVBKWNPWTYOEYQTIRRYRCMNVFVLRSBNTPWPAOCZPEKHLFCEE
RRVWVUYBVJPUVPOAYMIKQQNSWZGHZKDGYLAEGWPKESGCYZFVJDMEPQKSSLNV
SVPUVVRVYERHDTUTYYMQGEVWRMQSZFNPNRJIGGWAJNNJLKOEQHNETRPUQYDF
ZWCZKVJEXLMCKCSIFTCTSUTLDRRMIKQTNINPGRPQQXPTZDPAIOTCEUAZFEWD
QLLPZRHXLXQGSLRJTBLZRIRVISNZIWLMVYADVOHFEVNAKKGORRXSYGXPUMVG
BOMRJLCREFCMRQVXTMIYMJJVHXNBTSZMTJEFKFGKURFLNHXPKCWLEXMIYLGY
NNRWAKSEWTHPKGZKKXGAZELLUTAYCIEKWISHUNDKEKWARGBYZFGKEPKQGZZS
RIMFLGKARTURAINSNGEEUMEXRVEELZXTISUWVZKOYLTPBHZWEOQWNXNPXPKS
SXJHPANCVFPRYADRLROEWEBQEWHZRGATZDGUCEKLFYHZJNNZIJRGNZRVBOCA
UYEZGKPSJXJIASMVFTDWFXBIDHQZEYKDRTDRIOPPKJRPISSKMCZJFZTBVBJU
GEYANJIGJTDCPTZDEOGUTLZPEKHTNIHTGGUMVGBOMRJLCREFSWFZOCROHEAU
Okay, what kind of encryption did they use? A quick test of ROT-13 and such doesn’t get me anywhere. It’s awfully long, so I really don’t want to try a substitution cipher if I can avoid it. Then I remember that there was a clue somewhere in the report. Skimming through, I found a footnote on page 48:
yr puvsser vaqrpuvssenoyr
Let’s run that through ROT-13, and sure enough, we get a hint:
le chiffre indechiffrable
Aha! That’s French. And one of the most commonly found ciphers, it seems, for hacker crypto challenges was created by a Frenchman. And I also know, because of how often I’ve run against this cipher, that he called it “le chiffre indechiffrable” (or the indecipherable cipher). So which cipher it is has been decided: it’s a Vigènere.
But how long is the key? I found an online applet that would do a Kasiski analysis, which looks for repeated trigraphs in the cipher and measures the distance between them. If you can find a common factor amongst a bunch of repeated trigraph distances, that could very well be the key length. I found 10 repeated trigraphs, but their distances are all over the map, and I can’t see anything that’s a clear common factor.
Next up, the index of coincidence, which is a way of looking at the ciphertext in varying keylengths to see which one seems to have “slices” that are the most internally consistent. That’s a simplification. Truth is, I don’t understand it much beyond a zen-like vagueness, so I’m not going to try to explain it here.
Anyway, the IC applet makes 9 characters look like a good potential key length, though it’s far from certain. But at least one of the Kasiski distances was 72, which is a multiple of 9, and so this is as good a place to start as any.
Next up, I stick the ciphertext into a nice interactive Vigenere applet, set it to a 9-character key, and start sliding the alphabets around to see if anything pops out. Not having anywhere better to start, I make a guess that the plaintext starts with “CONGRATULATIONS.” As I adjust the various alphabets to make this happen, the key starts to appear. C-H-A-N-G-I-N-E. Hm. So close. Let’s change it to CHANGING…and now it’s looking a lot more real. Here’s the beginning of the plaintext:
CONGRATSGFWFHWUYGXFBNPOMAPYULIZQENZNVNLWZUFEYQSVTXDXYNZZPBFA
AXALZYGIEKSJLUUSTBTWCXEJUCUJVXBGTBPTMPGGVKDARFINSVCSBKIESWGE
ACRCSZRJUDUBUWXHTMVMBKZTLMTVPAXGKKYFHMVUIURXKEFNWVGPWJYLPBIE
YXTSRCUOOPUYWLGYYQEPFBYKXWLTACKINGFIGQJRBGKYTFWWVFMGRDWMYXBZ
Except that this is the only plaintext I see. If it were a 9-character key, then a key of “CHANGINGA” would at least give me 8 characters of real text, repeated down the length of the output, with a junk character in between each. At this point, I could think back, remember that the key was actually present in the text, find the two instances of “changing” in the report and have the puzzle solved in less than 30 minutes total. But that’d be cheating. So let’s try something new.
It’s looking pretty likely that the key starts with “CHANGING.” But I don’t know how many characters come next. I didn’t see a repeat at 8 or 9 characters, so let’s add another A, and another, and another, until I see things repeat. Once I get to 26 characters it happens. Now I’ve got plaintext that starts like this:
CONGRATSIMWSNEHEIEFOTXBSCW
WARDGOTOZABVYCBBFREYFBVEDK
COMSLASHGFNRBFGVKSFPNBUFZJ
EVERYONEHPXEBTZJCZOWGTBSQG
RFINSVCSDRIRYETKCJRPYHEPWK
SHAREFINVZHSMZNTTIVIKMMRYS
LNINETEEQMSTYCGCCRLRRIIREF
So now, let’s start changing the letters after CHANGING and see what happens. A is no good, neither is B, nor C, but D — that seems to extend the cleartext words properly. In fact, the ZZZ after GOTO are probably supposed to be WWW. To make that happen, my key now starts with “CHANGING DEF”, which gives me this:
CONGRATSFIRSNEHEIEFOTXBSCW
WARDGOTOWWWVYCBBFREYFBVEDK
COMSLASHDBIRBFGVKSFPNBUFZJ
EVERYONEELSEBTZJCZOWGTBSQG
RFINSVCSANDRYETKCJRPYHEPWK
SHAREFINSVCSMZNTTIVIKMMRYS
LNINETEENINTYCGCCRLRRIIREF
From here, it’s a pretty easy job to finish out the key this way. The result is “Changing default credentials.” (it also appears in the report as “Changing default credentials is key.” Is. Key. Heh. Funny.) The final plaintext tells where to write with your solution, and the rest is a terse, high-level summary of the entire report. Here it is with spaces and newlines entered for clarity.
CONGRATS
FIRST TO CRACK GETS REWARD
GO TO WWW VERIZONBUSINESS COM SLASH DBIRHUNT TO CLAIM
FOR EVERYONE ELSE HIGH LVL STATS FOR FIN SVCS AND RETAIL FOLLOWPLS SHARE
FIN SVCS
SOURCES EXTERNAL NINETEEN INTERNAL NINE PARTNER TWO
THREATS MALWARE ELEVEN HACKING FIFTEEN DECEIT FOUR MISUSE SIX PHYSICAL TWO ERROR ONE
ERROR SIG CONTRIBUTOR IN FIFTEEN
TOP THREE HACK TYPES SQL INJECTION SEVEN MISCONFIG ACLS SEVEN DEFAULT CREDS TWO
TOP HACK VECTOR IS WEB APP
TEN TOP ASSET IS ONLINE DATA TWENTY SIX AND
ALL RECORDS TOP THREE DATA TYPES AUTH CRED ELEVEN PII TEN PYMNT CARD EIGHT
PYMNT CARD WAS NINETY EIGHT PCT OF RECORDS
TOP UU IS UNKNOWN CONNECTIONS SEVEN
DISCOVERY TAKES WEEKS TO MONTHSRETAIL SOURCES
EXTERNAL TWENTY THREE INTERNAL ONE PARTNER EIGHT
THREATS MALWARE TEN HACKING TWENTY ONE DECEIT TWO MISUSE TWO PHYSICAL ZERO ERROR ZERO
ERROR SIG CONTRIBUTOR IN SIXTEEN
TOP TWO HACK TYPES SQL INJECTION SEVEN STOLEN CREDS SEVEN
TOP HACK VECTOR IS REM ACCMGT EIGHT
TOP ASSET IS POS ELEVEN AND
OVER HALF OF RECORDS TOP TWO DATA TYPES PAYCARD TWENTY THREE PII NINE
DISCOVERY TAKES MOSTLY MONTHS
My memory was correct in one respect — this was a very simple puzzle. Even the long approach I took, once I’d figured it out, went fast. If I’d received this puzzle new, today, I’m sure I would have solved it in an evening, tops. Two years ago, I almost certainly wouldn’t have been so lucky. For one, the trick of padding out the potential key to look for repeats isn’t something that’d ever occurred to me before, that I can recall, though it’s pretty obvious in retrospect. I’ll definitely have to remember this technique for future puzzles.
Also, having “CONGRATS” as the opening word gave me a really easy crib. Without that, I honestly don’t know where I’d have started.
So though I was right, this was a simple puzzle, I was wrong in another key respect: That its simplicity would mean it wasn’t going to be any fun, especially (subconciously at least) knowing what I needed to do. Learning a new approach to break this cipher was fantastic fun. And proof that even the easy puzzles shouldn’t be ignored.
Thanks to the whole Verizon crew for this one. The 2010 puzzle was a different story, but that’ll wait until later. Hopefully I’ll write that up before next week’s new puzzle starts sucking up all my free time…
Crazy idea for multi-user iPads
While lying on the couch last Friday, trying to decompress after a busy day and expecting an even more hectic weekend, I had a crazy idea for how Apple might implement multiple user accounts on iOS devices like the iPad.
File System Overlays.
Applications in iOS are all restricted to their own sandbox — that is, they can only access files and data within their own application bundle, and nothing else. So right off the bat, data’s pretty well segregated.
Now, imagine that there’s an easy way for the operating system to distinguish between the application itself and its data. Like if all apps stored data in, say, Documents and Private Documents and Caches and other similarly-named folders. Anything that’s user-specific would be pretty easy to identify and peel away from the rest of the app.
Here’s where the hare-brained idea comes in: Across the entire filesystem, take any of those such folders, and move them off of the main disk, and into a second filesystem that’s mounted as an overlay on the actual disk.
This is sort of weird. It probably needs a picture.
The base iOS filesystem has system files (the operating system itself plus built-in apps and such), and has separate applications installed by the user. Let’s assume that each app stores user-specific data in a standardized place, like “Documents.”
The device simply puts all the Documents folder into a separate filesystem, then depending on which user has been activated, takes that filesystem and merges it with the base filesystem, overlaying the folders back into their proper locations. So to the device, to the apps, it’s as if nothing has changed. Data’s where you expect it to be.
You could merge preferences in a similar way. iOS already supports multiple configuration profiles, and dynamically merges them into a single active settings profile. So you could have perhaps one “master” account, that can make unalterable settings for the entire device, then create different users, each of which could add their own preferences to what’s already been defined.
Imagine going back to the main home screen, and doing a five-finger pinch to “zoom out” of the iPad and into a new screen with different users listed. Tap on a different user (and enter a passcode, if that user has one set), and the OS removes your overlay and installs the other user’s overlay. Then it’s a whole new iPad!
And the best part about this is it’s all handled at the operating system level. No changes to the applications are necessary (obviously, they need to be following at least some kind of predictable approach for storing data, though there might be some sneaky ways for the OS to figure that out on the fly as well). Of course, if users wanted to share data with other users on the same device (think music or videos), then applications would need to add support for that.
iOS already supports some pretty fancy magic at the filesystem level, with the built-in data protections present in iOS 4. (In fact, it was while musing on those protections that this idea occurred to me). So I don’t see this as being too far off in terms of difficulty to implement. Provided they can get the right filesystem support into the kernel, which I’m sure wouldn’t be too difficult.
Any comments? Is this totally whacked out, or is there some potential here? Also, think about taking this to the desktop…it could definitely add a lot more security to data at rest where multiple users (or the same user, with multiple roles) are sharing a system…..
Simple Bypass of Safari Restrictions on iOS
Okay, so in iOS you can disable things. To protect the user, the device, the organization, from misuse, etc. One of the things you can do is disable Safari, so the end user can’t surf to anything bad. (I’m being a little snarky — there are some good cases where you’d want to prevent end-user web surfing: Gambling sites. Porn. Chat rooms. Competitors’ tip sites. Stuff like that). It’s very easy, and appears to be very complete.
But yesterday I was testing something out and found an easy way around the restrictions. You can install what’s called a Web Clip to the iOS device (iPhone, iPad, etc.) That clip is basically a single web page, taken from whatever URL you configure when you create the clip. This clip goes on the main application screen of the device, just like a “real” application would, and allows quick and easy access to, well, just about anything. You could have a clip that shows a security dashboard. Or a weather report. Or list of important emergency contacts. Really, just about anything you could put into a web page.
The trick is that the device disables any links within that clip. So though you could display, for example, the front page of CNN, you couldn’t navigate to any of the links on that site. Or so I thought.
Turns out that some simple javascript methods aren’t properly trapped by the display engine. I found this while testing a web clip that pointed to the Google home page. As I entered terms into the search field, it instantly showed me similar searches in a drop-down list. It’s a pretty cool feature that I’ve grown to like. But what I didn’t expect was that I could tap on any of those suggested searches (or on the “I Feel Lucky” button), and that the clip would load the desired search results. I was able to navigate beyond the source of the web clip. For most pages, that was the end — I couldn’t navigate further. But some more “modern” web applications, like Gmail, worked just fine, as if I weren’t in a restricted browser at all.
So I dug a little deeper, and figured out how to replicate the behavior. I’m not sure if I’m using exactly the same method that Google did (their javascript code is notoriously obfuscated, and I’m definitely not a javascript expert). I’m sure there are other ways to accomplish this. But what I ended up creating was a simple document that uses window.location=url to replace the contents of the window with the contents of the supplied URL variable. Pretty simple stuff.
I looked around (via google, naturally) for any other writeups of this vulnerability, but couldn’t find any. So I wrote it up and posted it here, on the Intrepidus Group website.
If you’ve seen this before, or have any additional details or thoughts, or especially, suggestions for a workaround, please let me know. I can’t believe I’m the only person to have noticed this.


