Finding existing data

Before you start your project, it is a good idea to find out if the data you need is already available or if similar datasets exist. This could be existing research data, but also data or statistics initially collected for other purposes. After finding suitable data, the next step is to evaluate if the existing data is sufficient to conduct or complement your research with. Below, we briefly help you in finding existing datasets to use in your research.

The advantages of using existing data

Using already collected data can be useful for several reasons:

  • Reusing existing data can save time and expenses that can go into collecting new data, for example for study design, participant recruitment, data collection, personnel, facilities, etc.

  • Usually, already collected datasets are larger than what you could collect in a limited time frame. This can lead to larger statistical power and may allow you to answer more, or more complicated, research questions.

  • Existing datasets can be more diverse in terms of time-period, topic, and geographic region. This can let you answer different research questions across different contexts over a longer period of time.

  • Existing data can help you to validate and replicate findings from earlier studies.

Where to find existing data?

Attention points for reusing data

Once you have found the data you were looking for, consider the following questions: 

  • Check if there are costs associated with gaining access to the dataset. 

  • Check if the license or terms of use allow the type of re-use you are intending. If there is no license or terms of use, check if reuse is allowed at all as some data may be protected by intellectual property rights. 

  • Do the data contain personal data? If so, you need to consider privacy regulations. 

  • Check if there are specific requirements for you to re-use the data. For example:

    • Do you need to sign an agreement?

    • Do you need to implement additional security measures?

    • Do you need to cite the data in a specific way? See also from the European Union. 

  • Check the data's formats and documentation. Can you open the files, do you know how to interpret the value labels or missing data? 

  • Think about how you can make the use of these data transparent. How can others replicate your research, or at least come to the same starting position?