I’d heard about the alleged FBI/Apple UDID leak shortly after arriving at work last Tuesday morning, and immediately downloaded and began reviewing the data. Less than an hour later, I’d surmised that comparing apps across multiple devices might help narrow down the source.

Several hours later, at 3:00, I saw a tweet from @Jack_Daniel suggesting that people checking their UDIDs in online forms only enter partial numbers . And that made me wonder: “How many digits is the minimum people need to enter in order to be guaranteed a unique result?” Sort to the rescue:

cat data | cut -c 2-7 | sort | uniq -c | more`

This gave me a bunch of repeats. That’s not too surprising, as I’m only looking at 6 digits. Next up was 8 digits, and still I saw hundreds of repeats. Then I changed tactics and simply counted the number of unique UDIDs…and I came up with a number significantly different from the 1,000,001 that were released: 985,117. So there are almost 15,000 duplicates. Looking further, I saw that many of these duplicates have different device tokens, prompting a tweet, about 3:15:

Interesting. Just noticed there are UDID duplicates in that data dump, 
with multiple APNS tokens. Different app providers, or multiple regs?

About 45 minutes later, on my way home, @danimal suggested: “@DarthNull multiple apps? Seems like maybe a game or ad company.” I immediately thought, damn, that must be it. At 4:23 pm, I replied “Yes! makes sense.”

And two minutes after that, I found what seemed to be the source of the breach.

I had decided to look more closely at the most frequently repeated device IDs, on the theory that perhaps that would belong to a developer. They’d naturally test multiple apps for their company, each of which should have a different device token. So first, more shell magic:

cat data | cut -c 2-7 | sort | uniq -c | sort -n -r | head

Wow, some are repeated 10 or even 11 times!

11 4daa64abd
10 d1f575954
10 aa5c7aedb
 8 12e6ec97e
 7 f661c1396
 7 4225e2a59
 6 91a83b0e3
 6 480074431

I searched for the first one, and found 11 different entries for a “Gary Miller.” Nothing much there. The next one, though, had some interesting device names:

'Bluetoad Support'
'Bluetoad Support'
'BT iPad WiFi'
'BT iPad WiFi'
'CSR iPad'
'Customer Service iPad'
'Developer iPad'
'Developer iPad'
'Hutch Hicken’s iPad'
'Hutch Hicken’s iPad'

Six different names, four repeated twice (implying at least a pair of apps and several users). Then I looked at the next entry, with 10 repeats: it’s variously named Robert, Red, and HP Pavilion. Meh. The entry, with 8 repeats: GoldPad. But the entry with 7 repeats really grabbed my attention:

'Bluetoad iPad'
'Bluetoad iPad'
'Client iPad BT'
'Client iPad BT'
'CSR/Marketing iPad'
'CSR/Marketing iPad'
'Jessica Aslanian’s iPad'

Support? Customer service? Developer? Marketing? A quick Google search revealed that, yes, BlueToad does develop iOS apps. In fact, they build magazine apps for many different publishers, and a quick trip through the iTunes store showed me that these applications use Push Notifications.

As this was the kids' first day of school, we went out for a nice dinner to celebrate. While there, I thought more about what I’d found, and decided to roll the dice: I sent an email to BlueToad, using the email address on their website. I didn’t say much, just that there’d been a breach involving UDID and push tokens, and I’ve found some interesting data that suggest they may be involved. After returning home, I spent another four hours digging for more.

By the time I went to bed, I had identified nineteen different devices, each tied to BlueToad in some way. One, appearing four times, is twice named “Hutch” (their CIO), and twice named “Paul’s gift to Brad” (Paul being the first name of the CEO, and Brad being their Chief Creative Officer). I found iPhones and iPads belonging to their CEO, CIO, CCO, a customer service rep, the Director of Digital Services, the lead System Admin, and a Senior Developer.

This felt really significant. But as I started writing up my notes, doubt crept in. What are some other explanations? Perhaps everyone at the company uses a common suite of applications. Like the same timesheet app, for example. Then of course they’d all appear in the data. But even still, I couldn’t shake the feeling that I’m onto something.

I spent much of the next day writing a detailed analysis of the situation for our blog. Then, about 4:30, I drafted a follow-up message to BlueToad about what I’ve found, how I found it, and what I think it means. Also, I mentioned that though I’m reluctant to publicly name them without more solid data, it seems likely that others will also find their name in the dump.

Since I now have several more employee’s names, I spent some time looking for email addresses, to (hopefully) increase the chance of a response. While searching, I stumbled on a partial password dump for the company! And it was dated March 14, the same week that the hackers claimed they’d hacked into the FBI computer. Suddenly, I felt a lot more confident again, and I mentioned this connection in the email.

Shortly after 8:00 that evening, I heard from Hutch Hicken, their CIO. He thanked me for what I’ve done, and for my discretion in contacting them first rather than simply going public. He told me that they’re assessing the situation, but don’t yet know anything for certain. He didn’t think the March leak (which they’d already been aware of) was related, but that the rest of my findings were concerning. He told me they plan to “do this right,” he promised to keep me in the loop (as much as is feasible for a non-employee).

Most of the next day (Thursday), I didn’t really hear much. Then about 2:30 on Friday, Hutch called me again. Almost immediately, he told me that we can talk, but only if I agree to embargo the story until noon on Monday. My response was “Well, the fact that you’re asking me this tells me that I’ll want to say yes,” so naturally I agree.

I’m told that they’re confident the leak came from them, and he filled me in on some of the technical details (I’ll leave those details to others, to make sure I don’t make any mistakes). But they’re almost certain of their involvement, and are continuing to handle the situation.

Then he hit me with a big surprise: Kerry Sanders, a correspondent for NBC Nightly news in Miami, wants to interview me. On camera. He’s in the next room, and the phone gets passed to the reporter, and next thing I know we’re arranging an interview that night. He didn’t arrive at my house until 11:00 (his plane was delayed), and we spent 45 minutes talking about what I found, how I found it, the privacy implications of the breach, and other related topics.

By the time he left at midnight, I was exhausted. As I write this, I still don’t know how much of the interview he’s going to use, or even if it’s going to make it onto the air Monday night. Either way, it was certainly a surreal way to conclude what started out largely as another puzzle hunt.

I’m still not completely clear on all the technical details. Was BlueToad really the source of the breach? How did the data get to the FBI (if it really did at all)? Or is it possible this is just a secondary breach, not even related to the UDID leak, and it was just a coincidence that I noticed? Finally, why haven’t I noticed any of their applications in the (very few) lists of apps I’ve received?

Hopefully, I’ll learn the answers to many of these questions in the coming days. Either way, I’m glad to have been able to help, and offer my thanks to BlueToad for their cooperation, and their quick response.

UPDATE: Here’s the link for the NBC Nightly News post.

UPDATE: Timo Hetzel kindly corrected a misunderstanding that I had regarding how device tokens are created. I had believed that each application on a device had a different token, but that was in error – it’s a single device token for all apps on a device. So whenever I saw many tokens for a given device, that represented (in most cases) multiple refreshes of the same device. However, apps using the development “sandbox” Push Server receive a different token than what apps using the main production server receive, so seeing multiple devices from BlueToad, each with exactly two tokens for a given device name, further implicates the use of those devices by developers.

(view Archived Comments)