Download the full-sized PDF
Permanent link (DOI): https://doi.org/10.7939/R3TQ5F
This file is in the following communities:
|Graduate Studies and Research, Faculty of|
This file is in the following collections:
|Theses and Dissertations|
A tightness continuum measure of Chinese semantic units, and its application to information retrieval Open Access
- Other title
- Type of item
- Degree grantor
University of Alberta
- Author or creator
- Supervisor and department
Goebel, Randy (Computing Science)
Ringlstetter, Christoph (Center of Language and Information Processing, University of Munich)
- Examining committee member and department
Zhao, Dangzhi (School of Library and Information Science)
Kondrak, Greg (Computing Science)
Department of Computing Science
- Date accepted
- Graduation date
Master of Science
- Degree level
Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks.
We propose a tightness continuum for Chinese semantic units. The construction of the continuum is based on statistical informations. Based on this continuum, sequences can be dynamically segmented, and then that information can be exploited in a number of information retrieval tasks.
In order to show that our tightness continuum is useful for NLP tasks, we propose two methods to exploit the tightness continuum within IR systems. The first method refines the result of a general Chinese word segmenter. The second method embeds the tightness value into IR score functions. Experimental results show that our tightness measure is reasonable and does improve the performance of IR systems.
- Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
- Citation for previous publication
- Date Uploaded
- Date Modified
- Audit Status
- Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 1004486
Last modified: 2015:10:12 16:09:01-06:00
Filename: Xu_Ying_Spring 2010.pdf
Original checksum: 455ecd6013ef27d9fdc552da0efa047f
Well formed: true
Page count: 68