Ancile - Use-Based Privacy for applications

This project implements the following paper:

Eugene Bagdasaryan, Griffin Berlstein, Jason Waterman, Eleanor Birrell, Nate Foster, Fred B. Schneider, Deborah Estrin, 2019 Ancile: Enhancing Privacy for Ubiquitous Computing with Use-Based Privacy, WPES.

Ask questions through GH Issues, our Slack, or just email (eugene@cs.cornell.edu).

Introduction

Widespread deployment of Intelligent Infrastructure and the Internet of Things (IoT) creates large quantities passively-generated data. This has ushered in the era of data-rich applications, such as location-based services, while posing new privacy threats. This project explores the challenges that arise in applying use-based privacy to such data. We have developed Ancile, a platform that enforces use-based privacy for applications wishing to access users' personal data. We find that Ancile constitutes a functional, performant platform for deploying privacy-enhancing ubiquitous computing applications.

System design

Ancile supports the development of privacy-aware applications. It acts as a trusted computing environment, ensuring users' data is used only in ways that are compliant with a well defined, use-based privacy policy. Anclie enforces this policy as a middleware layer, sitting between personal data sources (e.g., mail server, location data server, etc.) and third-party applications which wish to utilize such data in a privacy compliant manor. We currently support Python and work with any OAuth service.

Our system allows applications to submit an arbitrary Python program that requests data from Ancile registered data sources. Ancile, upon receiving this program, fetches the policy and access tokens associated with the user and the data source. Ancile attempts to execute the application's program in a restricted environment, enforcing the policies. If the program completes without policy violations the result of the program is returned back to the application.

Use-based privacy (Birrell et al.) focuses on preventing harmful uses (NYTimes) rather than restricting access to data. The application gets to use all necessary data for non-harmful purposes. Each datapoint in Ancile has a policy that specifies what uses are permitted. Furthermore, this framework utilizes reactive approach meaning that after performing transformations on data policy will change.

Use Case

Company's data -- data collected by the company's internal services such as emails, location data, etc. Novel third-party applications propose new services such as optimizing workplaces, person/room finders, depression/suicide preventions. However, these services require access to sensitive data, but usually given access is too broad for the needs of the applications. For example, a service that provides information on nearby available rooms does not need constant access to user location data. Unrestricted release of raw data can lead to malicious uses where the user location is accessed after hours or outside of the office. Ancile can address this problem by defining a policy on user's location data that shares data only at specific hours or at the specific location.

Sample workflow

We define three roles:

Admin - responsible for configuring Ancile, approving applications, maintaining user policies
Application -- needs user's sensitive data
User -- possesses sensitive information available through OAuth endpoints

Once Ancile is installed we assume the following sample workflow:

Admin configures Ancile and connects OAuth-enabled data sources
User registers on Ancile and performs OAuth-authentication with required data sources.
Application developer registers on Ancile
User picks a policy associated with the application and connected data source
Application sends a Python program that requests user's data
Ancile executes the program with the associated policy and if successful returns the data back to the application otherwise return error.

Policy language

Policies define an automata that changes on operations with data. For example, applying transformation that fuzzes the location can enable a bigger set of further operations on this data.

Our policy is defined as a regular expression over an alphabet of operations (Python commands) using the following operations:

Sequence -- commandA . commandB declares that the program has two call commandB only after calling commandB.
Union -- commandA + commandB either of both commands can be invoked.
Intersection -- commandA & commandB both commands need to match.
Iteration -- commandA* command can be repeated multiple times.
Negation -- !commandA can be any command except commandA.

We use Brzozowski derivatives approach that allows to advance the regular expression when calling a command. Brzozowski defines two key operations: D-step that applies when any command is invoked and E-step that applies only when the application wants to get data back from Ancile.

Data Policy Pair

In Ancile data travels with the policy in a special container: DataPolicyPair. This object is protected using RestrictedPython framework. To obtain data from the user the developer submits the following program:

dpp = fetch_data(user=user('user@abcd.com'))

That puts fetched data into the object dpp. The developer can only execute functions that are allowed by the policy framework. For example, if the policy specifies: transform.return_to_app for some commands transform and return_to_app then the following program will work:

dpp1 = transform(data=dpp)
return_to_app(dpp1)

Commands return_to_app are special commands that have to run only in the end of the policy and if successful Ancile will return data back to the application.

Ancile Lib

Ancile supports custom functions as well as normal third-party libraries to be controlled by the policies. All custom functions have to be defined under ancile/lib/.

We use three different types of functions:

Fetch functions: annotated by @ExternalDecorator() functions can get OAuth token for the user and perform external calls
Transformation functions: annotated by @TransformDecorator() functions take DataPolicyPair object and return transformed DataPolicyPair object
Return functions: annoted by UseDecorator() functions take DataPolicyPair object and return it back if successful.

Beyond these functions we as well support conditional and collection operations that we will introduce later.

Installation

Here are the installation Instructions.

Development Environment

We have a development environment running at https://dev.ancile.smalldata.io so please free to explore it. There are few test accounts set up for exploration. user/user_password and app/app_password.

Login with app credentials
Choose app view on the right-top corner
Click on Conole in the left bar
Pick the first app
Specify user as user and press Enter
My user has the following policy: fetch_location.fuzz_location.return

Put the following program and click Run:

dpp1 = indoor_location.fetch_location(user=user('user'))
dpp2 = indoor_location.fuzz_location(data=dpp1['location'], 
                                    mean=0, std=0.2)
return_to_app(data=dpp2['sta_location_x'])

You will get my distorted location.

Contributors

Eugene Bagdasaryan (eugene@cs.cornell.edu)
Griffin Berlstein
Mohamad Safadieh
Corin Rose
Jason Waterman
Eleanor Birrell
Nate Foster
Fred B. Schneider
Deborah Estrin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ancile - Use-Based Privacy for applications

Table of Contents

Introduction

System design

Use Case

Sample workflow

Policy language

Data Policy Pair

Ancile Lib

Installation

Development Environment

Contributors

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ancile - Use-Based Privacy for applications

Table of Contents

Introduction

System design

Use Case

Sample workflow

Policy language

Data Policy Pair

Ancile Lib

Installation

Development Environment

Contributors