PhD Defence: A.N.K. Zaman

Date and Time

Location

Rozanski Hall Room 106

Details

Title: Privacy Preserving Data Sanitization and Publishing

Abstract:

Recent trends have shown a drastic increase in large data repositories by corporations, governments, and healthcare organizations. According to Bernard Marr of the Forbes Tech magazine (2015), the growth in data in 2014/15 alone was twice that created in the entire history of the human race. Data sharing has been found to be beneficial in areas such as healthcare services. However, there is a significant risk of compromising sensitive information, for example through de-anonymization. Privacy Preserving Data Publishing (PPDP) is a way to allow one to share sanitized data while ensuring protection against identity disclosure of an individual. Removing explicit identifiers/personally identifiable information (PII) from a data set and making the data set compliant according to the Health Insurance Portability and Accountability Act (HIPAA) does not guarantee the privacy of data donors. Data sanitization may be achieved in different ways (e.g., k-anonymization, l-diversity, and d-presence), however, differential privacy paradigm provides the strongest privacy guarantee for sanitized data publishing. This research proposed two privacy preserving algorithms that satisfy e-differential privacy requirement and adopt the non-interactive privacy model for sanitizing and publishing data. Along with the differential privacy, we applied, generalization and suppression of attributes to impose privacy and to prevent re-identification of records of a data set. The key contributions of this thesis are: 1) the proposed algorithm adopts the non-interactive model for data publishing; as a result data miners have full access to the published data set for further processing, to promote data sharing in a safe way; 2) the algorithm can sanitize micro and/or HIPPA compliance data sets for publishing; 3) the published data is independent of adversary's background knowledge; 4) the algorithm is independent of the choice of quasi-identifiers (QIDs), and finally, 5) it protects published data set from the re-identification risk. The published sanitized data using the proposed algorithm is shown to have higher data usability in the case of data classification accuracy compared to other existing works, and significantly reduces the risk of re-identification.

Chair:  Dr. Joseph Sawada
Advisor: Dr. Charlie Obimbo
Advisory Committee Member: Dr. David Chiu 
Non-Advisory Committee Member: Dr. David Calvert
External Examiner: Dr. Samira Sadaoui (University of Regina)

Find related events by keyword

Events Archive