Unraveling the Complexities of Data Stripping: Insights from a Data Ethicist on Anonymizing Information

In the digital age, ensuring data protection and privacy has become a paramount concern for organisations handling personal, sensitive, or confidential data. To comply with regulations such as the General Data Protection Regulation (GDPR) and Quebec's Law 25, a combination of technical, organisational, and legal strategies is essential for minimising identification risk and maintaining strong data protection.

Use a Combination of Proven De-identification Techniques

A robust data de-identification process involves the application of various techniques tailored to the specific needs of the data. These techniques include:

Data masking: Replacing sensitive data (names, addresses, IDs) with fictitious or masked values that preserve format but remove real identifiers, thereby reducing direct exposure of Personally Identifiable Information (PII).
Pseudonymization: Replacing direct identifiers with pseudonyms or codes, keeping the key separately protected. This allows data processing without direct identification but is reversible and still regulated under GDPR.
Anonymization: Irreversibly removing or obscuring identifiers so that individuals cannot be re-identified; anonymized data falls outside GDPR’s scope but must be robust to prevent re-identification from quasi-identifiers or auxiliary data.
Data aggregation: Grouping individual records into larger cohorts to obscure identities while retaining analytical value.
Differential privacy: Introducing controlled noise/randomization mathematically to datasets to mask individual contributions without significantly degrading utility.
Synthetic data generation: Creating realistic, artificial datasets mimicking statistical patterns without containing real personal data.

Implement Privacy-by-Design Principles and Data Minimization

Integrating privacy measures into system architecture from the start is crucial to ensure minimal collection and exposure of personal data. Adhering to the GDPR principle of least privilege by limiting data access only to what is necessary for specific purposes is also essential.

Maintain Strict Organisational and Technical Controls

Securely storing keys or mapping tables for pseudonymization separately and restricting access is vital. Using strong encryption in storage and transmission to prevent leaks or breaches is another crucial step. Regular privacy audits and risk assessments should be conducted to evaluate the effectiveness of de-identification measures.

Address Legal Compliance Specificities

Under GDPR, pseudonymized data remains personal data and requires compliance, including protecting additional information allowing re-identification. Quebec’s Law 25 enforces strong privacy rules, including breach alerts and data access rights, so organisations must ensure compliance through robust data protection and transparency mechanisms.

Share only de-identified or anonymized datasets when feasible.
Use pseudonymized data when re-identification may be necessary internally but restrict access.
Clearly document de-identification methods and assess re-identification risk continuously, especially when combining datasets.

In conclusion, the best practices blend robust technical methods—masking, pseudonymization, anonymization, differential privacy—with organisational controls that enforce privacy-by-design, data minimization, and compliance with GDPR and Quebec’s Law 25 provisions for breach notification, data access, and secure processing.

The Information Commissioner's Office (ICO) provides detailed guidance on pseudonymization, suggesting considering goals, risks, technique, who performs it, and documenting decisions and risk assessments. The IPCO's "De-identification guidelines for structured data" offers a step-by-step way to calculate data and context risk. The overall risk of de-identification can be calculated using the equation: Overall risk = Data risk x Context risk.

It is worth noting that one justice has issued an opinion that pseudonymized data shared with a third party might constitute the data as effectively anonymized, but this depends on details and context.

[1] Office of the Australian Information Commissioner. (2021). De-identification Decision Making Framework. Retrieved from https://www.oaic.gov.au/privacy/privacy-resources/de-identification-decision-making-framework

[2] Information Commissioner's Office. (2019). Pseudonymisation guidance. Retrieved from https://ico.org.uk/media/for-organisations/documents/2402786/pseudonymisation-guidance.pdf

[3] European Commission. (2018). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Retrieved from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679

[4] Quebec (2018). An Act to modernize legislative provisions as regards the protection of personal information. Retrieved from https://www.canlii.org/en/qc/laws/statutes/q-2018-c-31

[5] European Data Protection Supervisor. (2017). Guidelines on the right to data protection in the big data context. Retrieved from https://edps.europa.eu/data-protection/our-work/publications/guidelines-right-data-protection-big-data-context_en

Data-and-cloud-computing technology plays a significant role in the application of proven de-identification techniques such as data masking, pseudonymization, anonymization, data aggregation, differential privacy, and synthetic data generation for enhancing data protection and privacy. To maintain strong organizational controls, organizations should integrate privacy measures into system architecture, limit data access only to what is necessary, securely store encryption keys, conduct regular privacy audits, and ensure compliance with GDPR and Quebec's Law 25 provisions.

Unraveling the Complexities of Data Stripping: Insights from a Data Ethicist on Anonymizing Information