Page 739

Ming Ding | Information Security and

Privacy Group | Data61, CSIROWhat is re -identification? –Definition

•Re-identification is a process of determining certain data in a dataset belongs to a particular

individual

➢Removing an individual’s personally identifiable information ( PII) is not sufficient (recall the “Mustang”

example)

➢Background information acts as quasi -identifiers , not controllable

➢87% of US population is uniquely identifiable using Gender, 5 Digits of Zip Code and Date of Birth

➢To re -identify an individual in a dataset, the individual needs to be in that dataset

•A toy example: The Adult dataset (30k rows, 15 attributes) -https://archive.ics.uci.edu/ml/datasets/adultL. Sweeney. Uniqueness of Simple

Demographics in the U.S. Population.

LIDAP -WP4 Carnegie Mellon

University, 2000.

age workclass education marital occupation relationship race gender native -

country…

42 State -gov Bachelors Never -

marriedExec -

managerialNot-in-

familyWhite Female United -

States…

58 Private Some -

collegeDivorced Craft -repair Unmarried Asian -Pac-

IslanderMale Philippines …

… … … … … … … … … …Note: values are fake