Utrecht 木瓜福利影视 developed performance review: 鈥淪tructural evaluation of AI is needed鈥

Robot die leest of studeert. Foto: Andrea de Santis, via Unsplash

A project team led by Utrecht 木瓜福利影视 examined how companies can monitor their AI applications. 鈥淎rtificial intelligence is getting an increasingly important role in organisations, yet there is no structural monitoring of how AI performs its tasks,鈥 project leader Iris Muis reveals. As a result, risks of profiling and discrimination, for example, are growing. 鈥淥ur solution is a periodic 鈥榩erformance review鈥, just as is the case with human employees.鈥

Evaluating functioning of AI

鈥淎 鈥榡ob interview鈥 for AI is already common鈥, Muis says. 鈥淭here are many tests available to determine whether a particular AI would fit within a company.鈥 But once an AI system is implemented, it is not monitored or evaluated, her research team found. It turned out to be a gap that exists in academic literature as well as in practice. 鈥淲hile the performance of AI systems should be evaluated periodically to check whether they are doing and continue to do what is intended.鈥

Muis and her team subsequently developed a performance review for AI. 鈥淲ith this set-up, we provide tools for market players and supervisors to evaluate the functioning of AI,鈥 Muis explains.

Questionnaire for artificial intelligence

The review follows a similar structure to the assessment of human employees. It consists of four sections with questions attached, such as:

  1. Tasks. What kind of tasks does the AI have? Have these tasks changed over time? Has the AI itself changed, e.g. due to changes in the code?
  2. Performance. How does the AI perform? Has the AI made any mistakes? Is the performance in line with expectations?
  3. Organisation. Who is responsible for the functioning of the AI? Are those responsibilities clear? Has there been a performance review with the AI before? If so, how were any issues of concern followed up?
  4. Development. What opportunities are there to improve the AI, both in performance and usability? Have other AI technologies or methods become available that might work better than the current one?

Team of researchers and supervisors

鈥淎lthough the research project is now concluded, we remain committed to setting up supervision of AI to minimise risks from its use,鈥 Muis concludes.

The partnership was led by Utrecht 木瓜福利影视鈥檚 , involving Iris Muis, Elise Renkema, Mirko Schaefer, Julia Straatman, Arthur Vankan and Daan van der Weijden. They collaborated with supervisors from the and the . The project was funded by these institutions and Utrecht 木瓜福利影视鈥檚 AI Labs.