Blog
                            
                    Exploring the legal boundaries of Synthetic Data

                                Published
                                28 Oct 2021
                            
                    
In recent years, big data has radically changed the way we live our lives, do business and conduct (scientific) research. This is reflected in the significant rise in demand for large amounts of (personal) data. Strict privacy legislation has reinforced this development, due to the fact that it is widely regarded as the main obstacle for (free) data sharing. Synthetic Data aims to offer a solution to this problem by utilizing AI to generate new, irreducible datasets that replicate the statistical correlations of real-world data. But how anonymous are these data? And can the safeguards of the General Data Protection Regulation (GDPR) truly be circumvented by using this method? In this write-up, we will consider the legal aspects of synthetic data and whether or not they are truly a blessing in disguise for the future of data sharing and, more importantly, our privacy.
     
What is synthetic data?
 
Anonymisation versus pseudonymisation
- Singling out: the possibility to distinguish and identify certain individuals within the dataset.
 - Linkability: the ability to link two or more datapoints concerning the same data subject within one or more different datasets.
 - Inference: the possibility to deduce, with significant probability, the value of an attribute from a data subject by using the values given to other attributes within the dataset.
 
 
Not completely exempt from the GDPR
 
Things to keep in mind
- Evaluate whether synthetic data offers a suitable solution for the specific needs of your business or organization. The main advantage of synthetic data lies in their ability to preserve the statistical properties of the original dataset. This attribute provides a lot of utility in certain use cases, such as compute learning and (scientific) research.
 - Assess whether the synthetic datasets deviate enough from the original datasets and adjust the settings of the AI-systems accordingly. In doing so, pay attention to the criteria as laid down by the WP29.
 - Give thought to your obligations under the GDPR. List your lawful basis for processing, as well as the purpose for which this is done.