The Wenzhou Spoken Corpus

Zhang, Eric; Butler, Terry; Newman, John; Lin, Jingxia

doi:doi:10.7939/R3ZG3H

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Linguistics, Department of / Research Publications (Linguistics)

Usage

250 views
371 downloads

The Wenzhou Spoken Corpus

Author(s) / Creator(s)
The creation of the Wenzhou Spoken Corpus, an online searchable corpus of a modern Chinese dialect, presents a number of challenges that are of interest to the corpus linguistic community. We review issues involved with collection of spoken data, its transcription and markup, as well as the functionality of the search tools. The transcription makes use of Chinese characters as well as IPA symbols for Wenzhou colloquial forms not conventionally represented by characters. XML was adopted as the standard for the basic format of files, with file searches expressed in XPath form. The search tools provide the usual options of restricting searches by age, gender, etc., and yield concordances and tables of collocates. Though the collection of data for the corpus was ‘opportunistic’ in some ways, and so not ideally balanced or representative, it is nevertheless proving to be a valuable tool for corpus-based research on Wenzhou.
Date created

2007
Subjects / Keywords
Type of Item

Article (Published)
DOI

https://doi.org/10.7939/R3ZG3H

Language
- English
Citation for previous publication
- Newman, J. et al. (2007). The Wenzhou Spoken Corpus. Corpora, 2(1), 97-109.