Statistical disclosure control

description256 papers

group80 followers

lightbulbAbout this topic

Statistical disclosure control is a set of methods and techniques used to protect the confidentiality of individual data in statistical outputs while maintaining the utility of the data. It aims to prevent the identification of individuals or sensitive information in published statistics through various anonymization and perturbation strategies.

lightbulbAbout this topic

Key research themes

1. How can information leakage be quantified and mitigated when adversaries have imperfect knowledge of joint data distributions?

This research area focuses on refining information leakage metrics to better capture privacy risks when adversaries do not possess complete statistical information about the data and mechanisms. Traditional metrics assume full knowledge of data distributions, an assumption that often fails in practical scenarios. Addressing this gap is crucial for designing privacy-utility trade-offs and optimal disclosure mechanisms under realistic adversarial uncertainty.

Variations and Extensions of Information Leakage Metrics with Applications to Privacy Problems with Imperfect Statistical Information

by Shahnewaz Karim Sakib

2023

Key finding: Introduced novel information-theoretic leakage metrics that account for adversaries lacking full knowledge of joint statistics between private and disclosed data. Experimental results demonstrated that these metrics better... Read more

articleView Paper downloadDownload

Risk-Based Privacy-Aware Information Disclosure

by Nadia Metoui

2016

Key finding: Proposed a risk-aware access control framework that evaluates disclosure risk dynamically and employs adaptive anonymization to mitigate risk in real-time. This approach extends classical binary access control by integrating... Read more

articleView Paper downloadDownload

A New Approach to Utility-Based Privacy Preserving in Data Publishing

by Murat Aydos

2021, 2017 IEEE International Conference on Computer and Information Technology (CIT)

Key finding: Demonstrated a method combining k-anonymity and l-diversity to balance privacy and utility by classifying equivalence classes into utility-preserving and outlier groups and reducing outlier classes. Experiments showed... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the trade-offs between differential privacy guarantees and data utility in statistical disclosure control for official statistics and census data?

This theme examines the challenges and methodologies in implementing differential privacy (DP) and similar noise-injection mechanisms in official statistical releases. It focuses on balancing rigorous privacy protections against the utility of statistical outputs, especially in the context of sensitive, high-dimensional population and employer-employee datasets. Issues such as noise distribution choice, bounded vs. unbounded noise, and output complexity effects on privacy-utility trade-offs are investigated.

Differential privacy and noisy confidentiality concepts for European population statistics

by Fabian Bach

2022

Key finding: Provided a comprehensive analysis distinguishing differential privacy as a risk measure from noisy output mechanisms that enforce privacy. Showed that unbounded noise distributions (e.g., Laplace) required by strict DP may... Read more

articleView Paper downloadDownload

Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics

by John M Abowd

2022, Proceedings of the 2017 ACM International Conference on Management of Data

Key finding: Developed new algorithms with provable privacy guarantees tailored to linked employer-employee data, using the Pufferfish privacy framework aligned with legal requirements. Empirical evaluation on US Census production data... Read more

articleView Paper downloadDownload

Gradual Release of Sensitive Data under Differential Privacy

by George Pappas

2021, Journal of Privacy and Confidentiality

Key finding: Introduced an accuracy-optimal mechanism for relaxing privacy levels over time without loss of accuracy when releasing differential private data in multiple releases. Demonstrated that correlated noise addition can achieve... Read more

articleView Paper downloadDownload

The 2020 Census Disclosure Avoidance System TopDown Algorithm

by John M Abowd

2024, Special Issue 2: Differential Privacy for the 2020 U.S. Census

Key finding: Presented the TopDown Algorithm (TDA), a large-scale implementation of zero-Concentrated Differential Privacy in the 2020 US Census. The TDA applied differentially private noise to hierarchical tabulations while incorporating... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can synthetic data and related statistical disclosure control methods preserve data utility for machine learning and statistical inference while ensuring privacy?

This theme investigates techniques for generating privacy-preserving synthetic datasets and their impact on downstream analytical tasks, including machine learning classification and inference on covariance structures. It covers evaluation of synthetic data generators, the role of anonymization (e.g., microaggregation enhanced by linear discriminant analysis), and statistical procedures adapted for synthetic datasets, balancing confidentiality protection with preserving empirical data utility.

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing (Preprint)

by Michaela Black

2023

Key finding: Conducted empirical evaluation of supervised machine learning models trained on synthetic health datasets generated using classification and regression trees, parametric, and Bayesian network methods. Found minimal... Read more

articleView Paper downloadDownload

Preserving empirical data utility in k-anonymous microaggregation via linear discriminant analysis

by ANA HOYOS

2022, Engineering Applications of Artificial Intelligence

Key finding: Proposed an anonymization method integrating linear discriminant analysis to rotate and scale data towards classification thresholds before k-anonymous microaggregation. This approach preserves machine learning accuracy... Read more

articleView Paper downloadDownload

Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling

by Ricardo Moura

2022, Sankhya B

Key finding: Derived finite-sample valid statistical tests for generalized variance, sphericity, independence, and regression coefficients based solely on singly imputed synthetic datasets generated via plug-in sampling under a... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Statistical disclosure control

Privacy in Statistical Databases: k-Anonymity Through Microaggregation

by Agusti Solanas

2026

The amount of computer-stored information is growing faster with each passing day. This growth and the way in which the stored data are accessed through a variety of channels have raised the alarm about the protection of the individual... more

descriptionView Paper arrow_downwardDownload

A variable-MDAV-based partitioning strategy to continuous multivariate microaggregation with genetic algorithms

by Agusti Solanas

2026

Microaggregation is a Statistical Disclosure Con trol (SDC) technique that aims at protecting the privacy of individual respondents before their data are released. Optimally microaggregating multivariate data sets is known to be an... more

descriptionView Paper arrow_downwardDownload

100 Multivariate Microaggregation Based Genetic Algorithms

by Agusti Solanas

2026

Microaggregation is a clustering problem with cardinality constraints that originated in the area of statistical disclosure control for microdata. This article presents a method for multivariate microaggregation based on genetic... more

descriptionView Paper arrow_downwardDownload

Micro-aggregation-based heuristics for p-sensitive k-anonymity

by Agusti Solanas

2026, Proceedings of the 2008 international workshop on Privacy and anonymity in information society

Micro-data protection is a hot topic in the field of Statistical Disclosure Control (SDC), that has gained special interest after the disclosure of 658000 queries by the AOL search engine in August 2006. Many algorithms, methods and... more

descriptionView Paper arrow_downwardDownload

A variable-MDAV-based partitioning strategy to continuous multivariate microaggregation with genetic algorithms

by Agusti Solanas

2026, The 2010 International Joint Conference on Neural Networks (IJCNN)

descriptionView Paper arrow_downwardDownload

Privacy in Statistical Databases: k-Anonymity Through Microaggregation

by Agusti Solanas

2026, 2006 IEEE International Conference on Granular Computing

descriptionView Paper arrow_downwardDownload

A polynomial-time approximation to optimal multivariate microaggregation

by Agusti Solanas

2026, Computers & Mathematics with Applications

Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on... more

descriptionView Paper arrow_downwardDownload

Watermarking for multilevel access to statistical databases

by Francesc Sebé

2025, Proceedings International Conference on Information Technology: Coding and Computing

Increased corporate, government and academic demand has prompted official statistics to release individual respondent data (microdata) in addition to the traditional tabular data. Microdata must be masked by a statistical disclosure... more

descriptionView Paper arrow_downwardDownload

IPUMS-International: an overview

by Miriam King PhD

2025

Unlike aggregated census tabulations, census microdata provide information about individual persons and households. This makes it possible for researchers to design analyses tailored to their particular research questions. Other microdata... more

descriptionView Paper arrow_downwardDownload

A new shiny GUI for sdcMicro

by Thijs Benschop

2025

The application of many anonymization methods is complex and requires knowledge of the methods and access to suitable tools for implementation. For users comfortable with using R, the package sdcMicro [1] provides a tool for the... more

descriptionView Paper arrow_downwardDownload

A 2(d)-tree-based blocking method for microaggregating very large data sets

by Agusti Solanas

2025, First International Conference on Availability, Reliability and Security, Proceedings

Blocking is a well-known technique used to partition a set of records into several subsets of manageable size. The standard approach to blocking is to split the records according to the values of one or several attributes (called blocking... more

descriptionView Paper arrow_downwardDownload

A Post-processing Method to Lessen k-Anonymity Dissimilarities

by Agusti Solanas

2025, 2008 Third International Conference on Availability, Reliability and Security

Protecting personal data is essential to guarantee the rule of law 1 . Due to the new Information and Communication Technologies (ICTs) unprecedented amounts of personal data can be stored and analysed. Thus, if the proper measures are... more

descriptionView Paper arrow_downwardDownload

An Anonymity Model Achievable Via Microaggregation

by Agusti Solanas

2025, Lecture Notes in Computer Science

k-Anonymity is a privacy model requiring that all combinations of key attributes in a database be repeated at least for k records. It has been shown that k-anonymity alone does not always ensure privacy. A number of sophistications of... more

descriptionView Paper arrow_downwardDownload

Multivariate Microaggregation Based Genetic Algorithms

by Agusti Solanas

2025, 2006 3rd International IEEE Conference Intelligent Systems

descriptionView Paper arrow_downwardDownload

Classifying data from protected statistical datasets

by Stan Matwin

2025, Computers & Security

Statistical Disclosure Control (SDC) is an active research area in the recent years. The goal is to transform an original dataset X into a protected one X , such that X does not reveal any relation between confidential and... more

descriptionView Paper arrow_downwardDownload

Microdata Protection Method Through Microaggregation: A Median-Based Approach

by ENAMUL KABIR

2025, Information Security Journal: A Global Perspective

Microaggregation for Statistical Disclosure Control (SDC) is a family of methods to protect microdata from individual identification. SDC seeks to protect microdata in such a way that can be published and mined without providing any... more

descriptionView Paper arrow_downwardDownload

Scalable k-anonymous Microaggregation: Exploiting the Tradeoff between Computational Complexity and Information Loss

by Rüdiger Reischuk

2025

k-anonymous microaggregation is a standard technique to improve privacy of individuals whose personal data is used in microdata databases. Unlike semantic privacy requirements like differential privacy, k-anonymity allows the unrestricted... more

descriptionView Paper arrow_downwardDownload

Geographically intelligent disclosure control for flexible aggregation of census data

by David Martin

2025, International Journal of Geographical Information Science

This paper describes a geographically intelligent approach to disclosure control for protecting flexibly aggregated census data. Increased analytical power has stimulated user demand for more detailed information for smaller geographical... more

descriptionView Paper arrow_downwardDownload

Computer for Programmies for Generating and Processinga Document Data Stream Containing Structured Felds

by Dennis Dicker

2025

44 45 46 (58) Field of Classification Search ................ 358/1.13, 358/1.18, 1.15, 1.16; 709/231, 234 See application file for complete search history. (56) References Cited U.S. PATENT DOCUMENTS 4,649,513 A * 3/1987 Martin et al.... more

descriptionView Paper arrow_downwardDownload

COMPREHENSIVE ANALYSIS OF STATISTICAL DISCLOSURE CONTROL

by Dr Omondi J A M E S Okeda

2025, Dr Omondi James Okeda

This document provides a comprehensive critical literature analysis of Statistical Disclosure Control (SDC), highlighting its methodologies, applications, and implications for data protection. The significance of SDC lies in its critical... more

descriptionView Paper arrow_downwardDownload

Towards a general record linkage framework for statistical disclosure control

by Mark Elliot

2024, Proceedings of the 1st International Workshop on AI for Privacy and Security

The assessment of statistical disclosure risk often requires the linking of data. There are effective means of linking data for simple scenarios; but it is not clear how best to approach linkage for more complex scenarios. We examine... more

descriptionView Paper arrow_downwardDownload

Applying disclosure control to temporal data

by Mark Elliot

2024

An important aspect of disclosure control is the isolation and control of individual-level records that have a high probability of being identified (as their contents, or variables. are unusual) consider, for example, a sixteen-year-old... more

descriptionView Paper arrow_downwardDownload

Key Variable Mapping System II

by Mark Elliot

2024

The Key Variable Mapping System (KVMS) is an approach for identifying matching possibilities across datasets within a data environment. It is a formalised approach for identifying key variables. An overview of KVMS is provided in Elliot... more

descriptionView Paper arrow_downwardDownload

Barriers to data access and matching in Europe

by Gábor Békés

2024

descriptionView Paper arrow_downwardDownload

A Brief Survey on Different Privacy Preserving Techniques

by Khushboo Saxena

2024

As data mining is used to extract valuable information from large amount of data. But this is harmful in some cases so some kind of protection is required for sensitive information. So privacy preserving mining is emerge with the goal to... more

Fig 1: Data Collection Protocol Taxonomy Basic requirements for the data collection protocol; First, it must be scalable; because a data warehouse server can deal with thousands of data providers like online survey system. Second, data provide should be provided data mining at lower cost to increase their participation. Lastly, the protocol must be robust; it must produce relatively accurate data mining results while protecting data provider’s privacy, even if data providers have lacking consistency. For example, if data which is provided by an online survey system deviate from the protocol or submit meaningless data, then it must be control the influence of such erroneous behavior and ensure that global data mining results remain sufficiently accurate. Figure-1 shows data collection protocol taxonomy based on two data collection methods.

descriptionView Paper arrow_downwardDownload

A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases

by Ebaa Fayyoumi

2024, Software: Practice and Experience

This paper surveys the fields of Statistical Disclosure Control (SDC) and Micro-Aggregation Techniques (MATs), which are both areas fundamental to the science of secure Statistical DataBases (SDBs). The paper is written from the... more

descriptionView Paper arrow_downwardDownload

Design and Development of Key Representationauditing Scheme for Secure Online and Dynamicstatistical Databases

by Asim Abdelaziz Abdallah

2024

who was abundantly helpful and offered invaluable assistance, support and guidance. Deepest gratitude is also due to

descriptionView Paper arrow_downwardDownload

Obtaining Information while Preserving Privacy: A Markov Perturbation Method for Tabular Data

by George Duncan

2024

data user is assessed on two dimensions ing information Ways exist however to resolve the horizontal axis is the level of knowledge about this value paradox in an important context Sta-the legitimate object of empirical inquiry the verti... more

descriptionView Paper arrow_downwardDownload

CASTLE: Enhancing the Utility of Inequality Query Auditing Without Denial Threats

by taeho jung

2024, IEEE Transactions on Information Forensics and Security

descriptionView Paper arrow_downwardDownload

Privacy-Preserving Edge-Cloud Architecture for IoT Healthcare Systems

by Payal Goyal

2024

With the surging demand for Internet of Things (IoT) healthcare applications, a myriad of data privacy concerns come to light. Cloud computing inherits the risks of exposing data to re-identification vulnerabilities. A secure solution is... more

descriptionView Paper arrow_downwardDownload

Experiments with controlled rounding for statistical disclosure control in tabular data with linear constraints

by juan carlos Salazar

2024

We thank Alberto Caprara who implemented the separation procedures for Gomory cuts and for {0, 1/2}-cuts.

descriptionView Paper arrow_downwardDownload

The ARGUS Software in CENEX

by Anco Hundepool

2024, Lecture Notes in Computer Science

In this paper we will give an overview of the CENEX project and concentrate on the current state of affairs with respect to the ARGUS-software twins. The CENEX (Centre of Excellence) is a new initiative by Eurostat. The main idea behind... more

descriptionView Paper arrow_downwardDownload

Measuring Rule Retention in Anonymized Data - When One Measure Is Not Enough

by Md Zahidul Islam

2024, Trans. Data Priv.

In this paper, we explore how anonymizing data to preserve privacy affects the utility of the classification rules discoverable in the data. In order for an analysis of anonymized data to provide useful results, the data should have as... more

descriptionView Paper arrow_downwardDownload

Privacy preserving data mining: A noise addition framework using a novel clustering technique

by Md Zahidul Islam

2024, Knowledge-Based Systems

During the whole process of data mining (from data collection to knowledge discovery) various sensitive data get exposed to several parties including data collectors, cleaners, preprocessors, miners and decision makers. The exposure of... more

descriptionView Paper arrow_downwardDownload

Calibrated Hot Deck Imputation for Numerical Data Under Edit Restrictions

by Natalie Shlomo

2024, Journal of survey statistics and methodology

We develop a non-parametric imputation method for item non-response based on the wellknown hot-deck approach. The proposed imputation method is developed for imputing numerical data that ensure that all record-level edit rules are... more

descriptionView Paper arrow_downwardDownload

Assessment of Statistical Disclosure Control Methods for the 2001 UK Census

by Natalie Shlomo

2024

We define the disclosure risk scenarios that led to the statistical disclosure control (SDC) methods for the 2001 UK Census. We examine the SDC methods that were implemented based on a disclosure risk-data utility framework and assess... more

descriptionView Paper arrow_downwardDownload

Statistical Disclosure Limitation: New Directions and Challenges

by Natalie Shlomo

2024, Journal of Privacy and Confidentiality

An overview of traditional types of data dissemination at statistical agencies is provided including definitions of disclosure risks, the quantification of disclosure risk and data utility and common statistical disclosure limitation... more

descriptionView Paper arrow_downwardDownload

Disclosure risk and data utility in flexible table generators

by Natalie Shlomo

2024

Statistical agencies are considering making more use of the internet to disseminate census tabular outputs through flexible table generation servers that allow users to define and generate their own tables. The key questions when... more

descriptionView Paper arrow_downwardDownload

Statistical Disclosure Control Methods for Census Frequency Tables

by Natalie Shlomo

2024, International Statistical Review

This paper provides a review of common statistical disclosure control (SDC) methods implemented at Statistical Agencies for standard tabular outputs containing whole population counts from a Census (either enumerated or based on a... more

descriptionView Paper arrow_downwardDownload

Contingency Tables: Analisis Data Kategorik

by Mohammad Ghani, Ph.D.

2024

descriptionView Paper arrow_downwardDownload

$21^{st}$ Century Statistical Disclosure Limitation: Motivations and Challenges

by John M Abowd

2024, arXiv (Cornell University)

descriptionView Paper arrow_downwardDownload

A software package for the application of probabilistic anonymisation to sensitive individual-level data: a proof of principle with an example from the ALSPAC birth cohort study

by Demetris Avraam

2024, Longitudinal and life course studies

descriptionView Paper arrow_downwardDownload

Data Mining Methods for Linking Data: Coming From Several Sources

by Prakhar Kulshrestha

2024, International Journal of Modern Trends in Engineering and Research

Statistical offices are faced with the problem of multiple-database data mining at least for two reasons. On one side, there is a trend to avoid direct collection of data from respondents and use instead administrative data sources to... more

descriptionView Paper arrow_downwardDownload

Geographic segment disclosures under IFRS 8: Changes in materiality and fineness by European, Australian and New Zealand blue chip companies

by Donna Street

2024, Research in Accounting Regulation

This study examines how the adoption of International Financial Reporting Standard (IFRS) 8, Operating Segments , changed the entity-wide geographic segment reporting by European, Australian and New Zealand blue chip companies. The focus... more

descriptionView Paper arrow_downwardDownload

Frameworks, principles and accreditation in modern data management

by Felix Ritchie

2024

The Five Safes framework is increasingly widely used for data governance. Since its conception in 2003, it has influenced data management in many ways, particularly in the public sector. As it has become established, both the advantages... more

descriptionView Paper arrow_downwardDownload

Can data owners and data users think alike? Designing incentives to shape the provision of access to data

by Felix Ritchie

2024, IASSIST Conference

descriptionView Paper arrow_downwardDownload

Addressing the human factor in data access: incentive compatibility, legitimacy and cost-effectiveness in public data resources

by Felix Ritchie

2024

Traditional models of incentivising people suggest that positive incentives are more effective than negative ones. We argue that in data access the opposite can be true, as the assumptions made at the design stage can fundamentally change... more

descriptionView Paper arrow_downwardDownload

Statistical disclosure control

Key research themes

1. How can information leakage be quantified and mitigated when adversaries have imperfect knowledge of joint data distributions?

2. What are the trade-offs between differential privacy guarantees and data utility in statistical disclosure control for official statistics and census data?

3. How can synthetic data and related statistical disclosure control methods preserve data utility for machine learning and statistical inference while ensuring privacy?

Related Topics

All papers in Statistical disclosure control