Automatically Dating Classical Chinese Texts: Preliminary Study on Biji and Buddhist Texts
- Main contributor
In recent years, there has been an increasing amount of literature on using computational methods to study language change. These studies demonstrate good performance in automatically identifying the time of text writing (Popescu and Strapparava, 2015), tracing semantic change (Schlechtweg et al, 2020), and even discovering rules underlying language change (Hamilton et al., 2016). However, such studies are questioned for taking at face value (Hengchen et al., 2021), and models' performance in varieties of languages or genres remains unclear. Regarding Classical Chinese, we realize that there is a clear lack of open-access diachronic data, and the lexical change among different genres is seldom addressed in a computational way with large data. In this study, we approach the issue of how language changes across time and across genres by using classification tasks. Two types of texts: Chinese Biji and Buddhist texts are included. We firstly aim to examine how well language models (such as ngram, word2vec, transformers) can predict the written time of historical texts. Then, we are interested in what we can learn from the language models' prediction. We analyze the results we obtained and discuss the future direction.
- Related Item
Presented at the IDAH Spring Symposium, Maxwell Hall, Indiana University Bloomington, April 22, 2022.
This item is accessible by: the public.