PhD defence: Say the Same but Differently: Computational Approaches to Stylistic Variation and Paraphrasing
PLEASE NOTE: If a candidate gives a layman's talk, the livestream will start fifteen minutes earlier.
鈥淚k ben een Utrechter鈥 and 鈥淚k ben een Utrechtenaar鈥 are two Dutch sentences. They both might translate to 鈥淚 am an Utrecht resident鈥 using a tool like Google Translate. However, the choice of word matters. 鈥淯trechter鈥 is the more common modern term, while 鈥淯trechtenaar鈥 is the historic standard term for an Utrecht resident. In the 1730s, during a wave of prosecutions of gay men starting in Utrecht, 鈥淯trechtenaar鈥 became closely associated with homosexuality. That history still lingers. Today, when someone calls themselves an 鈥淯trechtenaar鈥 rather than an 鈥淯trechter鈥, we might know more about them 鈥 for example, that they are more likely part of the local queer community. Language technology like Google Translate, however, can lose this nuance.
This dissertation explores how language technology can better handle variation in language. I show that both people and language models struggle to recognize different ways of saying the same thing in the context of conversations. I also find that while internal representations of language models represent content, they often do not capture differences in linguistic style. To address this, I created a new model that recognizes variation 鈥 and that is already being used by researchers and practitioners. Finally, I demonstrate that language variation is relevant at every stage of language model design, including the basic building blocks such as tokenizers.
Overall, my work encourages the field of natural language processing to consider language variation more rigorously in the development of language technology.
- Start date and time
- End date and time
- Location
- PhD candidate
- A.M. Wegman
- Dissertation
- Say the Same but Differently: Computational Approaches to Stylistic Variation and Paraphrasing
- PhD supervisor(s)
- prof. dr. C.J. van Deemter
- Co-supervisor(s)
- dr. D.P. Nguyen
- More information