DELIMIT

Deliberative workshops with public members: Establishing trust in the use of synthetic data

Background

Synthetic data (often referred to as ‘mock’ or ‘dummy’ data) is a term used to describe a new copy of a dataset. Synthetic data aims to replicate the structure and some of the patterns of the original ‘real’ data set, while minimising the risk of identifying any specific individual. This type of data be used to explore the potential utility of a ‘real’ dataset and provide an opportunity for training or developing code.

Different types of synthetic data pose more risk to confidentiality than others depending on how closely they match the original dataset. “Low-fidelity” synthetic data refers to data that is the most abstracted – and has the lowest risk in terms of re-identification.

Data providers including NHS and other UK Government departments already make some low-fidelity datasets available for researchers. Despite recent work in this area to expand the use of synthetic data and encourage both researchers and data providers to utilise these datasets, there has been no widespread consultation with the UK public.

Public consultation

In 2024, we undertook a large public consultation with members from across the four nations of the United Kingdom. We recruited a diverse range of public members (n=39) from across the four UK nations, working with a community engagement agency (Egality) to help recruit our cohort. Across two sets of four workshops, we explored public attitudes towards the use of synthetic data for research, including: perceived benefits and risks of synthetic data, the acceptability of scaling up its use and language and techniques for communicating about synthetic data with the public. Workshop content and dissemination strategies were guided by an expert steering group and two public collaborators.

Intended study outputs

Our outputs will include a set of recommendations for researchers and data provider, co-developed with our public workshop members. The recommendations will be relevant to departments releasing, or planning to release, synthetic data and to the researchers who access them. We will produce our own accessible outputs (e.g. an infographic, an information booklet and policy briefing) on the topic of synthetic data, based on our project findings and recommendations. By sharing our findings with a range of interested stakeholders, we aim to ensure that future synthetic data policy reflects the concerns and expectations of the UK public.

Information

Chief Investigator(s)	Dr Fiona Lugg-Widger Robert Trubey
Funder(s)	Administrative Data Research (ADR) UK

Key facts

Start date	1 Mar 2024
End date	31 Mar 2025
Grant value	£120,893
Status	Analysis and reporting

General enquiries

Robert Trubey
trubeyrj@cardiff.ac.uk
+44 (0)29 2068 7548

DELIMIT

Background

Public consultation

Intended study outputs

Information

Key facts

General enquiries

Related documents

Centre for Trials Research

Contact us

Connect with us

Visit us

Athena SWAN Award