Passive Data Disclosures: A first look at Passive Data Kit

In a prior blog post and in numerous in-person conversations, I’ve spoken about my goals for Passive Data Kit, the open source framework Audacious Software is producing to serve as the foundation for a new generation of sensor- and service-aware applications.

One of the key differentiators between Passive Data Kit and other similar frameworks – like Purple Robot and AWARE – is its explicit design goal to not only to make passive data easily understood by developers, but to end users as well. Our current computing environments include many apps that collect data about their users, but those users are none the wiser about what data are being silently gathered and how those data are being used and disseminated by developers.

Passive Data Kit addresses this missing element by introducing a feature called the Passive Data Disclosure. In short, the Passive Data Disclosure provides the same type of transparency to consumers in the form of a virtual table of what’s going on in a particular app. It’s like a “nutrition label” for data like your location, movement, app usage, and online service usage.

The primary problem that Passive Data Disclosures solve is bridging the information gap between users and software developers gathering information automatically without the user's direct input or involvement. This information gap becomes an issue when the user's mental model of how and what data are being used diverges from the actual implementation of an app or other computing system. This divergence is almost always invisible, but when it becomes visible, the character of the system's developer comes into question as the user tries to reconcile the privacy trade-off that they're making in exchange for use of the system.

These headlines from the last several weeks illustrate what happens when this divergence is made visible:

• Surveillance: Google collects meta data (phone calls, SMS) from Android phones

• Is Facebook using your location data to suggest 'people you may know'?

• Firm pays $950,000 penalty for using Wi-Fi signals to secretly track phone users

• RunKeeper acknowledges location data leak to ad service, pushes updates

Read the comments to these stories to experience users' reactions to these incidents.

The primary issue in each of these cases is that user devices were being used to collect and apply passive data in a manner without the user's knowledge or consent. The second issue is that the user has no mechanism to inspect the information being gathered and make an independent judgment of their own privacy exposure. Finally, in all of theses cases, other than choosing to not use the app (or some of its major features), the user has no simple way to exercise agency and selectively filter or throttle the information headed to third-party servers.

The underlying platforms have not been helpful on these fronts, either. In versions of Android before Marshmallow (Android 6.0), the user had to make an all-or-nothing choice when reviewing the requested permissions when installing an app. If an app requested microphone access and the user did not consent, the app would not be installed. The Android 6.0 update addressed this issue by implementing the same permission scheme that Apple had been using in iOS -- permissions are requested after the app is installed when a relevant feature was needed. In this "on demand" scheme, users had the opportunity to accept or reject permissions to access the user's current location, microphone, and so forth.

While doing away with the all-or-nothing permission requests improved the passive data situation for users, that is the total extent of the support offered by the underlying platform vendors. Once the app's permission was granted, it was free to exploit that permission as much as the developer chose, and typically in a completely opaque fashion.

For the responsible developer seeking to do the right thing, she encounters the issue that implementing the necessary infrastructure can be time-intensive and a distraction from other tasks that will make her product more successful (in the short run, at least). For the less scrupulous developer who doesn't want his users knowing how their data are being used, the current situation is extremely convenient – simply make up an easy excuse to get permission to capture and mine the information that he can turn around to sell to third parties (example).

The Passive Data Disclosure is designed to counteract these trends and return to users some level of transparency and agency over their personal data. As part of its standard package, Passive Data Kit includes a full Passive Data Disclosure interface that may be easily embedded in apps using the framework's APIs. This is not going to stop developers intent on silently gathering your data – that's something that only Apple and Google can address – but it does serve as a model that responsible developers can implement to achieve more transparency in order to build understanding and trust with their audiences.

Implementation Requirements

To implement the Passive Data Disclosure in their own apps, developers must do several things:

1. Use Passive Data Kit as the API for accessing sensors, online services, and other data sources. Support for Passive Data Disclosure is baked into framework's foundation, and simply using the API will gather and package the data for use in a Passive Data Disclosure interface.

2. In addition to configuring the parameters for data collection, the developer must provide another layer of metadata describing which data are essential for a fully functional app and which data are desirable, but negotiable with the user. (An example using location data is presented below.)

3. Compose explanations and justifications for gathering and using the passive data. These are basic HTML documents, allowing a good deal of freedom in their presentation, but each should explain what data are being gathered, why the data are necessary, and how the data are transmitted and used when no longer on the device.

4. Provide a link to the Passive Data Disclosure interface within the app.

If a developer can supply these four components, the Passive Data Kit framework does the rest of the hard work of constructing and presenting users the Passive Data Disclosure.

To illustrate how this comes together in practice, I'll describe how Audacious Software's uses Passive Data Kit to achieve user transparency and control in one of its apps.

Passive Data Disclosure in Fresh Comics

Fresh Comics is a native mobile app that helps comic book fans keep up with the latest comic book releases, nearby shops and events, and upcoming conventions. The latest version (4.0) is the pilot deployment of Passive Data Kit and uses those APIs to collect and log passive data describing the user.

Fresh Comics relies on two types of passive data collected from the mobile device. Location sensing provides the application the user's current location and that information is used to identify nearby comic shops, events, and conventions. The app collects full-fidelity location data from the device, but lower quality data may be used in place of device readings without affecting the overall quality of the app's experience. This is an example of a negotiable data type.

The second passive data collected are app-usage events that the app generates when the user takes an action within the app. Examples include events such as "opened shop details", "reviewed releases for this week", and "shared an issue with a friend". These events can be used to recreate interaction sequences on the server that are useful for identifying which parts of the system are popular and frequently used, as well as aborted interactions that may highlight parts of the app that are not sufficiently intuitive or simply broken.

As part of its sponsored placement features, Fresh Comics makes direct use of two events related to the placements. A sponsor_appeared event is generated when a placement becomes visible (equivalent to an impression in digital marketing) and a tapped_sponsor event is generated when an user taps a placement and is taken to the sponsored destination (equivalent to a click in digital marketing). Since sponsors financially support Fresh Comics, and these two event types are crucial for determining the performance of their sponsored placements, this kind of passive data is an example of a non-negotiable data type in this particular situation.

To access their Passive Data Disclosure, users visit the app's settings:

I grouped the Passive Data Disclosure with the link to the system settings that allows the user to revoke permissions on a system level. Tapping the disclosure item opens up Passive Data Kit's disclosure interface:

The screen is split in half. The top half provides a historical visualization of the data gathered by a particular "generator" (the PDK component that collects data), and the user can inspect each generator's data history by selecting it from the list in the bottom half of the interface:

Passive Data Disclosure: Application Events

Tapping the gear in to the right of the generator's name opens the disclosure for that data source:

Passive Data Disclosure: Location Details

All generators include the Data Collection Description, which displays the HTML page in the upper portion of the interface describing how the app is using this particular type of data. In the example above, the Data Collection Description mentions that the user can modify the data collection by tapping the Location Accuracy option in the bottom list. This replaces the Data Collection Description with another interface:

Passive Data Disclosure: Location Default

Passive Data Disclosure: Location Locally Randomized

Passive Data Disclosure: Location User Provided

Passive Data Disclosure: Location Disabled

In this example, the user is presented with a variety of options:

Best Accuracy: This consists of data directly from location hardware.

Locally Randomized: This option allows the user to specify a random element to add to the data generated the location hardware to mask the user's exact location. This option isn't relevant in this particular case since the location data never leaves the device, but it will be relevant in applications that log the user's location to a outside destination.

User Provided: This option allows the user to specify a place name (such as a city or address) and uses the device's built-in geocoding. For example, the value "New York City, NY" would return a coordinate like "40.7128,-74.0059" which would be the value returned by the generator when the app asks for the current location.

Disabled: This prevents the Passive Data Kit framework from reporting any location data to the host app.

This type of interface illustrates how the user can effectively negotiate with the developer to find a suitable middle ground that balances the user's control of their data with the application's need for data to provide particular features or services.

In the case of the Application Events data, the interface only consists of the Data Collection Description. These data are required for the ongoing operation of the Fresh Comics enterprise, and I (as the app developer) disclose enough information to justify its collection, but I do not allow the user to disable that data collection:

Passive Data Disclosure: App Events Required

In the event that I wanted to give the user control of this kind of data, I can provide instructions to Passive Data Kit to allow some control:

Passive Data Disclosure: App Events Optional

Other than the integration with Fresh Comics' settings screen, all of the items pictured above are standard parts of Passive Data Kit that are available to other developers under the framework's Apache open source license. Implementing a Passive Data Disclosure interface involves fewer than twenty lines of source code.

Conclusion

Fresh Comics is serving as the pilot deployment of Passive Data Kit and the Passive Data Disclosure features will be available within the next week or two with the 4.0.2 release. Passive Data Kit is also being used in several Audacious Software client projects and will be released to the public later this summer. (I'm running about a month behind the timeline posted on passivedatakit.org.)

If you are an app developer interested in keeping up with the latest developments, please join the Passive Data Kit (low volume) mailing list at passivedatakit.org or e-mail me at chris@audacious-software.com. I'm looking for motivated partners with data collection needs to help me shape this framework.