Ming Ding | Information Security and
Privacy Group | Data61, CSIROWhat is re -identification? –Definition
•Re-identification is a process of determining certain data in a dataset belongs to a particular
individual
➢Removing an individual’s personally identifiable information ( PII) is not sufficient (recall the “Mustang”
example)
➢Background information acts as quasi -identifiers , not controllable
➢87% of US population is uniquely identifiable using Gender, 5 Digits of Zip Code and Date of Birth
➢To re -identify an individual in a dataset, the individual needs to be in that dataset
•A toy example: The Adult dataset (30k rows, 15 attributes) -https://archive.ics.uci.edu/ml/datasets/adultL. Sweeney. Uniqueness of Simple
Demographics in the U.S. Population.
LIDAP -WP4 Carnegie Mellon
University, 2000.
age workclass education marital occupation relationship race gender native -
country…
42 State -gov Bachelors Never -
marriedExec -
managerialNot-in-
familyWhite Female United -
States…
58 Private Some -
collegeDivorced Craft -repair Unmarried Asian -Pac-
IslanderMale Philippines …
… … … … … … … … … …Note: values are fake