Blick in das LC Gebäude

About

The content on this page is currently available in German only.

Fact Sheet

Name:AI-Based Privacy-Preserving Big Data Sharing for Market Research (Anonymous Big Data)
Project Number:867514
Call:ICT of the Future, 6th Call 2017
Duration:26 months from 01.10.2019 to 30.11.2021.
Total Funding:671 433 €

Pro­ject Over­view

The ANITA (AN­ony­mous bIg daTA) pro­ject is a re­se­arch pro­ject fun­ded by the Aus­tri­an Re­se­arch Pro­mo­ti­on Agen­cy (ICT of the Fu­ture, 6th Call 2017).

The goal of this re­se­arch pro­ject is, for the first time, to train deep ge­ne­ra­ti­ve model ar­chi­tec­tu­res to se­quen­ti­al per­so­nal data while pro­vi­ding dif­fe­ren­ti­al pri­va­cy gua­ran­te­es, in order to sys­te­ma­ti­cal­ly va­li­da­te the fe­a­si­bi­li­ty of using syn­the­tic, privacy-​preserving se­quen­ti­al data for third party mar­ket re­se­arch. ANITA aims to pre­pa­re the ground for de­ve­lo­ping general-​purpose an­ony­miza­ti­on so­lu­ti­ons that also work for high-​dimensional data.

In ANITA we are going to:

  • collect and ana­ly­ze use cases for privacy-​sensitive se­quen­ti­al data;

  • con­duct a li­te­ra­tu­re re­view of ge­ne­ra­ti­ve deep neural net­work ar­chi­tec­tu­res and of pri­va­cy gua­ran­te­es for deep lear­ning;

  • de­sign and crea­te a vir­tu­al data lab that al­lows to sys­te­ma­ti­cal­ly in­ves­ti­ga­te the con­di­ti­ons under which a va­rie­ty of deep ge­ne­ra­ti­ve mo­dels are able to de­ri­ve syn­the­tic re­pli­cas which cap­tu­re struc­tu­re and cor­re­la­ti­ons, while pro­tec­ting individual-​level pri­va­cy;

  • im­ple­ment and test the selec­ted model ar­chi­tec­tu­res; and

  • re­port the re­sults of the si­mu­la­ti­on study and of the em­pi­ri­cal use case va­li­da­ti­ons.

The con­sor­ti­um part­ners are the In­sti­tu­te for Ser­vice Mar­ke­ting at the Vi­en­na Uni­ver­si­ty of Eco­no­mics and Busi­ness, the Most­ly AI So­lu­ti­ons MP GmbH, the Ge­or­ge Labs GmbH and the Sta­tis­tics Aus­tria.

Pro­ject Plan

Project Plan

WP1 – Pro­ject Ma­nage­ment

The ge­ne­ral ob­jec­ti­ve of this work packa­ge is to sup­port the sci­en­ti­fic and tech­ni­cal pro­gress in the best way, so that the main con­cen­tra­ti­on stays on the sub­stan­ti­ve con­tri­bu­ti­on to the pro­ject and the ef­fi­ci­ent and ti­me­ly im­ple­men­ta­ti­on of the work packa­ge tasks.

WP2 – Re­qui­re­ments Ana­ly­sis

The goal of the WP2 is to sys­te­ma­ti­cal­ly collect use cases for sharing privacy-​sensitive se­quen­ti­al data with third par­ties as well as to cap­tu­re re­qui­re­ments with re­spect to ac­cu­ra­cy and pri­va­cy.

The use cases will be do­cu­men­ted in terms of

  • num­ber of sub­jects;

  • num­ber and cha­rac­te­ris­tics of shared at­tri­bu­tes;

  • fre­quen­cy / la­ten­cy for data sharing;

  • ac­cu­ra­cy & pri­va­cy re­qui­re­ments;

  • tech­ni­cal re­qui­re­ments; and

  • ex­pec­ted (busi­ness) im­pact if the data are an­ony­mi­zed.

WP3 – Ge­ne­ra­ti­ve Deep Neural Net­work Ar­chi­tec­tu­res

WP3’s goal is to gain an over­view of ge­ne­ra­ti­ve deep neural net­work ar­chi­tec­tu­res that are dee­med ca­pa­ble of pre­ser­ving individual-​level as well as population-​level in­for­ma­ti­on wit­hin se­quen­ti­al per­so­nal data, and thus could be used for data an­ony­miza­ti­on.

In ad­di­ti­on to the li­te­ra­tu­re re­view on ge­ne­ra­ti­ve deep neural net­work ar­chi­tec­tu­res, a li­te­ra­tu­re re­view will be con­duc­ted on the sub­ject of dif­fe­ren­ti­al pri­va­cy gua­ran­te­es for the trai­ning of deep lear­ning mo­dels.

WP4 – Vir­tu­al Data Lab

WP4’s goal is to setup and run a vir­tu­al data lab en­vi­ron­ment, that can be used for ex­pe­ri­men­ta­ti­on by ge­nera­ting ar­ti­fi­cial da­ta­sets to be used for va­li­da­ting ac­cu­ra­cy and pri­va­cy.

A vir­tu­al data lab en­vi­ron­ment will in­clu­de:

  • a fle­xi­ble data fac­to­ry for ge­nera­ting a va­rie­ty of ar­ti­fi­cial se­quen­ti­al da­ta­sets;

  • a GPU cloud com­pu­te setup; and

  • va­li­da­ti­on tests, me­a­su­res and vi­su­al re­ports for as­ses­sing re­tai­ned in­for­ma­ti­on; and

  • va­li­da­ti­on tests and me­a­su­res for as­ses­sing pri­va­cy gua­ran­te­es.

The de­sign of the data lab will be clo­se­ly ali­gned with the re­sults of the use case and re­qui­re­ment ana­ly­sis. Any de­ve­lo­ped source code re­la­ting to the vir­tu­al data lab will be open-​sourced.

Once the WP5 mo­dels have been de­ve­lo­ped, the data lab will be con­ti­nuous­ly used to:

  • ge­ne­ra­te ar­ti­fi­cial da­ta­set given model as­sump­ti­ons and pa­ra­me­ters;

  • train mo­dels with given hyper pa­ra­me­ters to fit the data;

  • use the mo­dels to ge­ne­ra­te syn­the­tic da­ta­sets;

  • as­sess the re­tai­ned in­for­ma­ti­on wit­hin the syn­the­tic da­ta­sets;

  • as­sess any dis­c­lo­sed individual-​level in­for­ma­ti­on wit­hin the syn­the­tic da­ta­sets; and

  • re­cord run­ti­me and used com­pu­te re­sour­ces.

WP5 – Model De­ve­lo­p­ment

WP5’s goal is to pro­vi­de re­fe­rence im­ple­men­ta­ti­ons of exis­ting ge­ne­ra­ti­ve deep neural net­work ar­chi­tec­tu­res, in­clu­ding pri­va­cy pre­ser­ving tech­ni­ques, and as well re­fi­ne exis­ting ar­chi­tec­tu­res for mee­ting the cap­tu­red re­qui­re­ments to­wards syn­the­tic data.

Based on the out­co­me of WP3, a list of al­rea­dy pu­blished deep neural net­work ar­chi­tec­tu­res will be im­ple­men­ted on top of an es­tab­lished deep lear­ning frame­work. If avail­able, the cor­rect­ness of the im­ple­men­ta­ti­on will be es­tab­lished against pu­blished re­fe­rence re­sults. All im­ple­men­ta­ti­ons will be cou­pled with va­rious types of privacy-​preserving me­cha­nisms, to be able to limit the amount of in­for­ma­ti­on that is re­tai­ned about in­di­vi­du­al sub­jects. The soft­ware will be de­si­gned to be com­pa­ti­ble with the vir­tu­al data lab, and in par­ti­cu­lar allow easy tu­ning of va­rious pa­ra­me­ters, in order to quick­ly ite­ra­te on dif­fe­rent choices hyper pa­ra­me­ters and net­work set­tings. Fur­ther, exis­ting model ar­chi­tec­tu­res could be re­fi­ned in order to meet the re­qui­re­ments es­tab­lished by WP2.

WP6 – Em­pi­ri­cal Va­li­da­ti­on

WP6’s goal is to va­li­da­te the fe­a­si­bi­li­ty of using syn­the­tic data in lieu of ac­tu­al data for mar­ket re­se­arch pur­po­ses with ac­tu­al em­pi­ri­cal use cases pro­vi­ded by the con­sor­ti­um part­ners.

Ge­or­ge Labs and Sta­tis­tics Aus­tria will each pro­vi­de an ac­tu­al use case, that will be used for va­li­da­ting the fe­a­si­bi­li­ty of the ap­proach with ac­tu­al data. The ap­pro­pria­te model ar­chi­tec­tu­res will be selec­ted based on the fin­dings of the si­mu­la­ti­on study, and then, these mo­dels will be trai­ned on the ac­tu­al data. Sub­se­quent­ly, these mo­dels will used for ge­nera­ting syn­the­tic, privacy-​preserving da­ta­sets that will be as­ses­sed by the data pro­vi­ders.

WP7 – Ex­ploi­ta­ti­on & Dis­se­mi­na­ti­on

WP7’s goal is to as­sist in the dis­cus­sion and do­cu­men­ta­ti­on the de­ve­lo­ped tech­no­lo­gy as well as to dis­se­mi­na­te the fin­dings of this pro­ject wit­hin the re­le­vant aca­de­mic and in­dus­try au­di­ence.

To help dis­se­mi­na­te the fin­dings, the con­sor­ti­um part­ners will in­itia­te and par­ti­ci­pa­te in a cri­ti­cal dis­cour­se on the pro­ject fin­dings with all re­le­vant aca­de­mic and in­dus­try au­di­en­ces. This in­clu­des a con­ti­nuous up­da­ting of working pa­pers, the do­cu­men­ta­ti­on and dis­tri­bu­ti­on of fin­dings via a pro­ject web­page and the par­ti­ci­pa­ti­on in re­le­vant (aca­de­mic and in­dus­try) con­fe­ren­ces and work­shops.