DELIMIT
Deliberative workshops with public members: Establishing trust in the use of synthetic data
Background
Synthetic data (often referred to as ‘mock’ or ‘dummy’ data) is a term used to describe a new copy of a dataset. Synthetic data aims to replicate the structure and some of the patterns of the original ‘real’ data set, while minimising the risk of identifying any specific individual. This type of data be used to explore the potential utility of a ‘real’ dataset and provide an opportunity for training or developing code.
Different types of synthetic data pose more risk to confidentiality than others depending on how closely they match the original dataset. “Low-fidelity” synthetic data refers to data that is the most abstracted – and has the lowest risk in terms of re-identification.
Data providers including NHS and other UK Government departments already make some low-fidelity datasets available for researchers. Despite recent work in this area to expand the use of synthetic data and encourage both researchers and data providers to utilise these datasets, there has been no widespread consultation with the UK public.
Public consultation
In 2024, we undertook a large public consultation with members from across the four nations of the United Kingdom. We recruited a diverse range of public members (n=39) from across the four UK nations, working with a community engagement agency (Egality) to help recruit our cohort. Across two sets of four workshops, we explored public attitudes towards the use of synthetic data for research, including: perceived benefits and risks of synthetic data, the acceptability of scaling up its use and language and techniques for communicating about synthetic data with the public. Workshop content and dissemination strategies were guided by an expert steering group and two public collaborators.
Intended study outputs
Our outputs will include a set of recommendations for researchers and data provider, co-developed with our public workshop members. The recommendations will be relevant to departments releasing, or planning to release, synthetic data and to the researchers who access them. We will produce our own accessible outputs (e.g. an infographic, an information booklet and policy briefing) on the topic of synthetic data, based on our project findings and recommendations. By sharing our findings with a range of interested stakeholders, we aim to ensure that future synthetic data policy reflects the concerns and expectations of the UK public.
Information
Chief Investigator(s) | |
---|---|
Funder(s) |
Administrative Data Research (ADR) UK |
Key facts
Start date | 1 Mar 2024 |
---|---|
End date | 31 Mar 2025 |
Grant value | £120,893 |
Status |
|