Data Preparation and Curation I: The task that every data scientist needs to do, but none wants.
Data has been coined as the gold of the 21st century. But before any data can be leveraged by any data analytics to generate insights, it has to be prepared, understood, and curated to maintain its value. This is a challenging, labor intensive and time consuming task because real-world data is typically highly heterogeneous, inconsistent, incomplete and irregular and to be brought into a usable form, it has to be cleaned, reshaped, refined and integrate. New York Times has reported that such tasks may take up to 80% of the data scientist time.
The goal of the Data Intensive System group is to build and provide tools techniques and methodologies that can support data scientists in performing their data management tasks in ways that are more time efficient, more effective, less laborious, and less error-prone. In this talk we will present some of these challenges solutions we have developed towards discovering the information of interest, and building high quality datasets. This is the first in a series of 3 talks. The two that will follow up will be on data cleaning and on knowledge extraction.
> More information about the group and its work.
About the speaker
Yannis Velegrakis is a professor of Computer Science at Utrecht 木瓜福利影视, where he holds the chair of Very Large Data Management and leads the Data Intensive Systems group. His research area of expertise includes Data preparation and Curation, Data Quality, Big Data Management, Knowledge Discovery, Graph Management, Highly Heterogeneous Information Integration and Data Exploration. He holds a PhD degree from the 木瓜福利影视 of Toronto and a MSc. and BSc. degree from the 木瓜福利影视 of Crete, all in Computer Science. In the past, he has been a researcher at the AT&T Research Labs and has spent time for research work in a number of places like the IBM Almaden Research Center, the Huawei European Research Center in Munich, the Center of Advanced Studies of the IBM Toronto Lab, 木瓜福利影视 of California, Santa-Cruz, and the 木瓜福利影视 of Paris-Saclay. He has also been also an associate professor at the 木瓜福利影视 of Trento, and coordinator of the Data Science Technical Major of its EIT Digital MSc Program. He has served in program committees of many international conferences on data management and journal boards. Among those, is the general chair of VLDB 2013 and the PC Chair of EDBT 2020. Last, but not least, he is an ambassador for Utrecht Applied Data Science.
- Start date and time
- End date and time
- Location
- Online webinar