Privacy-Preserving Statistical Analysis

This use case addresses the privacy-preserving computation of data analytics, focusing on the computation of statistics over large usage data.

Objective

The use case aims to enable Wallix clients to be able to perform analytics about their clients’ data while providing guarantees as to the privacy of those data.

Target

Initially, Wallix will be the first beneficiary since there is an in-house project which would benefit. Secondly, Wallix use case wishes to offer this technology as a service to DataPeps clients.

Methodology

We focus on the computation of statistics over large usage data. Statistical functions include number, sum, mean and standard deviation of counted events.

Previous work

Currently, the preliminary framework has been completed, including updates to the server code extracted from WALLIX's DataPeps product and code from the Awless open source tool for AWS management.

At this stage, the preliminary inner product DMCFE protocol based around pairings has been successfully integrated into this framework giving an initial design which yields average counts of accesses to AWS endpoints. Initial performance results are within acceptable bounds.

More recently the DSum protocol has been added to extend the range of values which can be handled.

Finally, a primitive sampling mechanism has been devised and implemented, which involved revising the design in previous FENTEC documents slightly.

Current work

In order to implement the selection phase, the cryptographic initialization is now deferred until after the statistics gathering phase has completed. This does not impact the cryptography in any way but does require a new redistribution phase (sample selection). Sampling is currently performed on a random basis but there is provision for using locally-computed estimates of significance in order to provide ranking based on significance.

By encoding an encryption of the squares of counts alongside the encrypted counts, WALLIX is able to provide a naive computation of the variance of statistical counts. This computation is essentially for free since wrapping the communications with the counts does not lead to any communications overheads.

Finally, a simple demonstration framework has been devised which allows a live demonstration of the technique. This involves running Awless by proxy on a simple server and is linked to a predefined data analytics poll.

Next steps

The next tasks will include completing the test code and doing some more detailed performance analysis.

The main target for the second year was developing a demonstration of the polling technique for the next project review in November. This is now available and may be used in conducting the performance tests.

Challenges

The current effort is in preparing for the testing and performance evaluation which will be required later. The test code is about half-finished with the server test code completed and the client test code under way. A similar effort will be required for the performance analysis.

The new DSum protocol has now been implemented and integrated into the framework. This protocol is efficient for any size of data unlike the original DMCFE protocol which requires a discrete logarithm to access the result. Unfortunately, the DSum protocol is much heavier in communications so as part of the performance evaluation we will need to quantify the conditions under which each protocol is appropriate.