Papers by Thomas R Emerson
This paper presents the results from the ACL-SIGHAN-sponsored First International Chinese Word Se... more This paper presents the results from the ACL-SIGHAN-sponsored First International Chinese Word Segmentation Bake- off held in 2003 and reported in conjunction with the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan. We give the motivation for having an international segmentation contest (given that there have been two within-China contests to date) and we report on the results of this first international contest, analyze these results, and make some recommendations for the future.
The world of Chinese computing consists of a plethora of character sets and character encodings, ... more The world of Chinese computing consists of a plethora of character sets and character encodings, containing tens of thousands of characters. Manipulating and converting between these is a complex undertaking, far more difficult than it first appears. Many people have heard of Simplified and Traditional Chinese, but are confused by their differences and similarities. Are they used to write separate languages? Are they mutually intelligible? Are simplified characters a recent phenomenon unique to the People’s Republic of China?
Uploads
Papers by Thomas R Emerson