How FAIR is the software landscape of Utrecht 木瓜福利影视?
Nowadays, it is quite common to develop and use code and software for research. Graduate Keven Quach wanted to know how FAIR the code and software is developed by researchers at Utrecht 木瓜福利影视. So he dived into GitHub, an online platform where you can develop, manage and publish code and software. There, Keven found some interesting facts.
Keven鈥檚 study consists of 3 phases: 鈥淚n the first phase of my research, I collected the GitHub profiles of our researchers. As there is no central database with GitHub profiles of researchers from Utrecht 木瓜福利影视 (UU), we had to collect the users from various sources. I searched GitHub by Utrecht 木瓜福利影视 and collected the information. We also searched for GitHub profiles in the data of . Then we searched 鈥樷. When you limit that to Utrecht 木瓜福利影视 you can find the papers from scientists in Utrecht. The last source we used, was the employees page on the university website.鈥
Analysing with SWORDS
Keven Quach鈥 master thesis is called: 麓Mapping Research Software Landscapes through Exploratory Studies of GitHub Data麓. He performed his research as part of the Open Science Programme. Keven was already working with Professor Anna-Lena Lamprecht and Jonathan de Bruin as a research assistant to develop prior to his thesis.
SWORDS stands for 麓Scan and revieW of Open Research Data and Software鈥. SWORDS is a powerful tool to gain insight into the open source activities of a university or research institute. The thesis provided the SWORDS framework with additional variables. Although the analysis and data collection were done for UU researchers only, the purpose of this research is to serve as a template for other researchers to scan and review repositories for their university or organisation as well.
Donkey work
As a second step, Keven collected all code and software repositories. Keven merged all the information he had about the researchers and their GitHub profiles and then started his real donkey work. He went through all the repositories manually to check if the software that was published was either research software or software made for someone鈥檚 hobby. By checking all the software, he made sure he provided his research with an overview of research code and software. 鈥淲e found 1500 repositories in total. I manually labelled all these repositories. Doing so is extremely tedious, I can tell you now from experience,鈥 says Keven laughing.
34% of the research code and software doesn鈥檛 have any license information. If someone else wants to be able to work with this research code and software, you need a license that permits reuse.
Who is Keven Quach?

Keven Quach (1996) was born and raised in Germany. His parents came as refugees from Vietnam to our eastern neighbours in the seventies. After high school he attended the 木瓜福利影视 of Bamberg in Bavaria, where Keven did a bachelor鈥檚 degree programme in business informatics. He wanted to do his master abroad and chose business informatics at Utrecht 木瓜福利影视, a study he found the most interesting Since his graduation in November last year, Keven has been working s as a software engineer at Bosch, Friedrichshafen, back in Germany.
One hundred repositories per day
Keven thought he could do one hundred repositories per day, but he was being too optimistic. He needed about five to six weeks to go through all the repositories by hand. 鈥淩esearch software is often work in progress鈥, Keven continues, 鈥淲e needed some way to find out if the repository that we have in our dataset is really a research code or a software repository. Identifying if something is research output or not, was a challenge. And it鈥檚 not that simple to do this automatically. Sometimes I needed to contact the researchers to ask them if it was research software or not.鈥 By labelling these repositories, Keven also looked into the extra data. That way he really got to know the dataset quite well.
In the third phase of his research, Keven looked at different variables, such as:
- Does the software have a license?
- Is version control used correctly?
- Is citation information available?
That way he could analyse how FAIR the software and code are. These are some of the results:
鈥淚n the analyses of the research software we added a FAIRness score. We then added the score of each repository as one value and then we averaged that by use for each faculty. We also did this by distinguishing different types of research software.鈥
.
Remarkable findings
After collecting all this information, Keven started analysing the data. He looked at all kinds of aspects of the publications, such as quality, FAIRness, and popularity of the research software. Keven showed for example to which faculty each publication belonged. 鈥淚t was remarkable that I did not find any repositories from the Faculty of Law, Economics and Governance. And the Faculty of Veterinary Medicine had less than ten published repositories.鈥
There are two likely reasons for this. 鈥淭he first one is that our search is very biased towards the other faculties in the way we collect users. For example, if we go back to the previous search strategies, 鈥楶apersWithCode.com鈥 is quite heavily biased towards machine learning. Therefore, the Science Faculty and most likely no publisher from Veterinary Medicine will use this kind of website. And, of course, not 100% of the code and software is on GitHub. So there probably will be an unknown unreported number of research software that exists, but that we do not know of due to the way we have captured this. The other explanation is that some faculties simply do not use that much research software.鈥
So those two faculties were excluded from further analysis. The Faculty of Medicine was also not included in Keven鈥檚 research, since these researchers work at UMCU and they cannot be found on the employee pages of Utrecht 木瓜福利影视. The faculty with the two largest GitHub accounts in term of repositories is Humanities, namely the and the Institute for Language Sciences Labs. 鈥淚n the first Lab they already have 80 repositories and in the latter even more, 140.鈥
License for reuse
Keven found that 66% of the research code and software had an open license. 鈥淭hat means that 34% doesn鈥檛 have an open license. If you don鈥檛 use a licence and you publish something, no one can use the code legally. It鈥檚 protected by default. So, unless you give it an open licence, no one can use your work. You need a license to permit reuse. One of my recommendations is to inform researchers to add a license to their research software. It鈥檚 something that鈥檚 relatively simple to do.鈥
Find more information about licensing and publishing your data and software:
- Publishing code and software - Research Data Management Support - Utrecht 木瓜福利影视 (uu.nl)
- FAQ FAIR Data and Software - Open Science - Utrecht 木瓜福利影视 (uu.nl)
Programming languages
Keven also made an overview of the programming languages that are used by Utrecht researchers. He wanted to know if languages used were free and open or commercial and closed (e.g., Python versus MATLAB). Python and R are the most commonly used programming languages. These are open source languages, which can be widely reused. Python is most frequently used within the Science Faculty, the Faculty of Social Sciences is the largest user of R.
How FAIR is the software developed by Utrecht researchers?
鈥淩esearchers at Utrecht 木瓜福利影视 work relatively FAIR when it comes to research software鈥, Keven says. 鈥淲e see that the Support Departments (e.g. 木瓜福利影视 Library and ITS) and the Faculty of Social Sciences perform the best. So that鈥檚 why I said 鈥榬elatively鈥 because we can only see that in relation to the other faculties we examined. To publish more FAIR, colleagues from the Support Departments and the Faculty of Social Sciences can show others what they do right or how they can work more FAIR. The ultimate goal of FAIR is to facilitate the reuse of data, code and software and that鈥檚 still an ungoing process at the university.鈥
What we can do with these findings
According to Jonathan de Bruin, based on the results of Keven's master thesis, the following actions are to be taken:
We can provide proactive support in faculties where little output can be found.
We can tackle structural problems with quality in an integrated way.
We can create awareness within the organisation.
We can provide researchers with more or better information than we do now.
We aim to repeat the study after a year and thus monitor the impact.
What is FAIR research IT?
Utrecht 木瓜福利影视 offers its researchers IT-tools, services and infrastructure to support scientists and research supporters in their day-to-day work. Within the IT-department research supporters offer existing and to be developed tools and services. FAIR research IT means we want to offer our researchers structural and large-scale solutions, with an emphasis on repeatability, so that solutions and knowledge previously figured out or developed can be easily reused by others. These new tools, services or infrastructure comply with the FAIR (Findable, Accessible, Interoperable, Reusable) principles.