Analysis of iOS Location Data from Multiple Devices
This “Your iPhone Is Tracking Your Every Move!!” craziness just won’t go away. I’ve been kind of disappointed by the lack of very detailed analysis of the data that’s actually being collected, so I spent some time collecting information of my own.
I have access to four iOS devices running 4.0 or better: my personal iPhone 3GS, a family iPad with 3G subscription, a company-owned iPad (whose 3G has never been activated), and just arrived an iPad 2 that belongs to a client. So I spent some time this weekend trying to better understand what the Core Location daemons are doing.
First, please forgive me if I’m retreading already explored ground. Turns out that a few other people did the same thing this weekend, and so maybe I’m late to the party. I don’t want to be a “Me, too!” poster, but I also think there’s a little that I’ve found that I haven’t seen mentioned yet. Plus, I should mention the work of Alex Levinson, who looked at this in detail a year ago and has been a solid voice of reason from the beginning.
Anyway, first I’ll talk about some what I observed, then I’ll see if I can’t draw a few (hopefully valid) inferences. Some of the data were taken from the devices just as they were last week. Saturday, though, we went out to lunch and I took my phone, company iPad, and personal iPad all with me. During that trip, I kept the personal iPad locked the entire time, and I used the company iPad on the road (with Google Maps open the whole way). I used my phone briefly to make a call, and checked twitter a couple times while at the restaurant, and also for a while in a parking lot as my wife went into the grocery store.
First, the database.
I can see 5 tables within the consolidated.db that seem to be pertinent: CellLocation, CellLocationLocal, CellLocationHarvest, WifiLocation, and WifiLocationHarvest. All of these include details about speed, accurracy, elevation, and other such items that I’m not really concerned with (and many of which don’t seem to be used, at any rate). All also include a timestamp, latitude, and longitude, as well as some way of uniquely identifying the point it represents. In the case of a Wi-Fi access point, this is the MAC address, and in the case of a cell tower, it’s a tuple of four data items. Each entry in these tables appears to be unique – that is, no single cell tower or Wi-Fi access point appears more than once. Point 1: The devices are not tracking my every movement.
Now, my phone.
I see several access points noted all around my house. The accuracy isn’t phenomenal, as it puts my access point on my deck, and a neighbor’s in the middle of my kitchen. In fact, there are 11 different access points displayed either in my house, my yard, or just into my neighbors' yards. Point 2: The Wi-Fi data points are not precisely located.
Also, the timestamps are varied. Four of the 11 around my house show a date/time from a couple days before I dumped the database (and another 4 are stamped two seconds later). But the other three are from early March, late February, and mid January. Point 3: The Wi-Fi data does not represent the last time I visited a location.
Finally, huge swaths are blanketed with data about Wi-Fi access points. Neighborhoods I’ve not driven through in months, if not years (or ever). These points share similar timestamps as the data within my neighborhood. Point 4: Data is present in the database for locations I’ve not visited.
The cell tower data is very similar. It shows towers located in areas I’ve not recently visited, with locations not corresponding to actual towers (in many cases, not even close – several were shown in residential communities where I’ve never seen a tower). The timestamps are similarly varied, with some I randomly clicked on going back to October 2010. Point 5: Cell tower data is treated the same as Wi-Fi access point data.
I did not see any new data points appear during the drive to the restaurant, or while we ate. However, a batch of data, both Cell and Wi-Fi, was timestamped while we sat outside the grocery store. The cell data, in particular, was scattered over a very wide area, at least several miles on a side. Point 6: Data appears for a wide area simultaneously, and is not necessarily tied to length of time sitting still.
Finally, I observed new data in the WifiLocationHarvest table. A total of 11 Wi-Fi access points were simultaneously recorded while I waited in the parking lot. The precision on this was pretty good – only about 50 feet from where I was sitting. Points 7 and 8: Actual recording of new data is not predictable, and is highly accurate.
I was also able to look at some past data on the phone. I took a one-day trip to Dallas at the end of March, and found large collections of data centered on the location I’d visited, the area I ate lunch, and three locations on the highway leading from the airport. Those locations roughly, I believe, correspond with times when I’d refreshed Google Map directions. Point 9: You may be able to force a data fetch by refreshing the maps application.
Next, iPads.
My family iPad, which I’d woken up before we left and promptly locked again, did not record any new data the entire time. Point 10: When locked, the device might not record anything at all.
The company iPad was in use the whole way to the restaurant. It has no record of any cell towers, which isn’t terribly surprising, since it does not have an active 3G data plan (though it does have the 3G hardware). Point 11: No data plan, no cell info.
Obviously, since there was no data plan, it couldn’t collect any new data along the way. However, as we left the grocery store, I unlocked the device, refreshed the map location, and locked it again. Once we’d returned home, the iPad fetched 394 Wi-Fi points, in an area about a 1/2 mile by 1/2 mile square, roughly corresponding to the place we were when I refreshed the map. All these data points were timestamped when they were fetched – that is, when the iPad had access to the Wi-Fi at home – not when I was actually on the road. Point 12: The device may cache your last request and fetch related data the next time a network is availble.
All three iPads showed a curious distribution of points around my office. The customers’s iPad, which has only been to the customer facility and my office, displayed points in a very short and wide rectangle centered on my office. My family iPad, which has only been a few placed since I loaded 4.0 on it, showed virtually the same distribution around the office and a similar distribution, but not as wide, around my house. Not all of these points had the same timestamp, but over time, it definitely started filling out that shape. Point 13: When fetching data, the device appears to collect points over a nearly-fixed vertical range (about 30 arcseconds of Latitude) and a variable horizontal range.
Finally, my wife had taken the family iPad on a short trip last weekend. The iPad showed a square burst of Wi-Fi data points about where she pulled over to check a map, and another wide rectangle around the hotel she stayed in. It also showed data in the CellLocationLocal table. That table showed her track along the interstate, and appeared to be an actual positional track. Interestingly, the CellLocation table did not have tower locations for virtually anywhere along that track. On my phone, I had two points from my Dallas trip, and a half-dozen points from a taxi ride into Manhattan a week prior. Point 14: The CellLocationLocal table may record actual trip data, but it appears to be very limited.
One further point of (potential) interest: The timestamps on the data were, if you’ll pardon the pun, all over the map. Many data sets had timestamps only a few seconds or minutes apart. But when I stripped out data sets that were within five minutes of another set of points, the average time between updates was about 14 hours. Note that there’s very little stastical rigor to this, but I thought it was interesting. Point 15: When the device spends an extended time at one place, data appears to be fetched about twice a day.
Summary of Observations
So, to sum up, here are my observations thus far:
- Point 1: The devices are not tracking my every movement.
- Point 2: The Wi-Fi data points are not precisely located.
- Point 3: The Wi-Fi data does not represent the last time I visited a location.
- Point 4: Data is present in the database for locations I’ve not visited.
- Point 5: Cell tower data is treated the same as Wi-Fi access point data.
- Point 6: Data appears for a wide area simultaneously, and is not necessarily tied to length of time sitting still.
- Points 7 and 8: Actual recording of new data is not predictable, and is highly accurate.
- Point 9: You may be able to force a data fetch by refreshing the maps application.
- Point 10: When locked, the device might not record anything at all.
- Point 11: No data plan, no cell info.
- Point 12: The device may cache your last request and fetch related data the next time a network is available.
- Point 13: When fetching data, the device appears to collect points over a nearly-fixed vertical range (about 30 arcseconds of Latitude) and a variable horizontal range.
- Point 14: The CellLocationLocal table may record actual trip data, but it appears to be very limited.
What does all this tell us? I think we can infer at least a few things, which are consistent with what others have been saying, and with Apple’s statements last year.
- The data in WifiLocation and CellLocation are not your device’s actual location at any given point in time, but instead are the location of others' Wi-Fi access points and cell towers.
- The location of these points are estimated by Apple based on data harvested by iOS devices and provided to Apple on a periodic basis.
- Individual devices periodically record the Wi-Fi points and cell towers visible to them, record a precise location, and send that data to Apple. (I have not yet observed this happen, but it makes sense, and Apple’s already said as much).
- Periodically, the device will poll Apple’s servers for location information nearby. This seems to happen when the device has been at rest for some time, or when the location information is refreshed in the map application (it may be reasonable to expect that other applications querying the Core Location service may also trigger a refresh). There may be some logic in terms of what data gets fetched, perhaps to avoid downloading duplicate information. I haven’t been able to dig into that yet.
- The timestamp for the fetched data appear to be the time the data was fetched. One may be able to look in the middle of a set of identically-stamped data to infer where the user was when that data was fetched. However, the data don’t appear to be fetched every time you’re in any given location, even if you’re there for an extended time (like, say, lunch).
So what’s my conclusion? I’m still not sure about the CellLocationLocal table, which perhaps might be for recording locations for future data fetches. But the rest of the data all seem very consistent with what Apple’s told us: they’re used to aid in geolocating the device. Why are so many points stored? So that it won’t have to pull data down again in the future. It’s a big, personalized cache, made to make my personal use of geolocated features faster and more accurate.
[Note – if you’re interested in the python script I used to load the data into Google Earth, I’m posting it on the Intrepidus Group blog. It should be attached to this post from last week about my first review of the data.]