Utrecht researchers join forces with rapper Willie Wartaal to compete in AI song contest
“Using AI to generate music is a really interesting challenge”
With the Eurovision song contest cancelled due to the Corona pandemic, it is perhaps lucky that a digital counterpart was already being organised by public broadcaster VPRO, NPO Innovation and NPO 3FM. In the contest, 13 teams from Europe and Australia have submitted songs generated by artificial intelligence (AI). Utrecht computer scientists Anja Volk and Iris Ren are participating in the Dutch team called ‘Can AI Kick It’, together with rapper Willie Wartaal, scientists from the ľϸӰ of Amsterdam and VPRO Medialab.
 
 Intersection of music and computer science
Anja Volk and Iris Ren are working at the intersection of music and computer science. The researchers train deep learning models to generate music in a certain style. “Using AI to generate music is a really interesting challenge that teaches us a lot about creativity,” explains Volk. “But how to properly evaluate the results is still a big problem. That’s why we are so excited about this contest.”
VPRO followed the Dutch team during the process of writing and recording the song, producing a five-part web series for . The web series, which premieres today, will delve into various aspects of AI to investigate what ingredients are needed to compose the ultimate hit, exploring ways to enable a computer to independently generate a catchy melody and write the lyrics for a song.
Building blocks
 
 So how does a computer compose a song, exactly? “We’re not yet at the point where we have a machine that can just generate everything from scratch,” explains Anja Volk. “So we started with several building blocks for the song: deep learning models to generate melodies, bass lines and lyrics, a kit with AI-generated drum kicks, and a synthesised voice of Willie Wartaal.”
Together with a group of Master’s students, the Utrecht researchers worked mostly on the melodies. “We were given a data set consisting of 250 Eurovision songs. That seems like a lot, but for an AI, that’s not nearly enough. So we added thousands of Dutch folk songs to the training set as well.”
Catchiness analyser
Iris Ren adds: “Selecting the ingredients for the final song from all the different building blocks was a complex process. There are so many relationships between sections of the song.” To make a pre-selection of the results, a ‘catchiness analyser’ selected the best five or ten results for each building block. Then, the team got together to listen to these and choose. “As the performer, Willie Wartaal was the first critic of the AI-generated parts. He suggested to start by selecting the melody and bass line. We worked together as a team to put it all together.”
“For me, that was the most interesting part,” says Volk. “Sitting around the table, listening to the AI-generated parts, and hearing Willie Wartaal’s reaction.” Ren concurs: “He had a very clear vision of what could work together. And we encountered some unexpected things. For example, he expected regular blocks, like samples, consisting of two or four bars each, to play with. Many of our results were interesting, but missed that structure.” Volk adds: “We usually start by constructing a melody, but as it turned out, the first thing he needed was a good bass line.”
Audience vote
 
 Just like in the real Eurovision song contest, the scores will consist of two parts: an audience vote and a score given by an AI panel. The panel will partially base their verdict on process documents submitted by the teams, which show how the songs were constructed: which parts came from the AI and what was the human input. The audience can listen to the songs at and cast their vote before 10 May. The winner will be announced on 12 May.
“In the end, winning is not the most important thing for us,” says Volk. “Working like this, with a human artist in the loop, was already very valuable. We learned a lot from working with Willie Wartaal. And apart from that, we mainly hope it sparks a lot of discussion about AI and music.”
