A Study on Implementation of Southern-Min Taiwanese *Tone Sandhi System

43 %
57 %
Information about A Study on Implementation of Southern-Min Taiwanese *Tone Sandhi System
Technology

Published on November 24, 2008

Author: ungian

Source: slideshare.net

Description

A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System

A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System Iu n Un-gian [email_address] Lau Kiat-gak [email_address] Li Sheng-an d93005@ csie.ntu.edu.tw Kao Cheng-yan cykao@ csie.ntu.edu.tw Dept. of Computer Sci. and Info. Eng., National Taiwan Univ., Taiwan PACLIC19 2005 12/1~3

Paper Outline-1 In the past two hundred years or so, a sizable corpus of Taiwanese text in Latin script has been accumulated. However, due to the political and historical situation of Taiwan, few people can read these materials at present. It is regrettable that the utilization of these plentiful materials is very low. This paper addresses problems raised by the Taiwanese tone sandhi system by describing a set of computational rules to approximate this system, as well as the results obtained from our implementation.

In the past two hundred years or so, a sizable corpus of Taiwanese text in Latin script has been accumulated. However, due to the political and historical situation of Taiwan, few people can read these materials at present. It is regrettable that the utilization of these plentiful materials is very low.

This paper addresses problems raised by the Taiwanese tone sandhi system by describing a set of computational rules to approximate this system, as well as the results obtained from our implementation.

Paper Outline-2 Using the Taiwanese Latinization text as source, we take the sentence as the unit, translate every word into Chinese via a Taiwanese-Chinese dictionary, and obtain the POS information made by the CKIP group of the Academia Sinica. Using the POS data and tone sandhi rules we formulated based on linguistics, we then tag each syllable with its post-sandhi tone marker.

Using the Taiwanese Latinization text as source, we take the sentence as the unit, translate every word into Chinese via a Taiwanese-Chinese dictionary, and obtain the POS information made by the CKIP group of the Academia Sinica. Using the POS data and tone sandhi rules we formulated based on linguistics, we then tag each syllable with its post-sandhi tone marker.

Paper Outline-3 Finally we implemented a Taiwanese tone sandhi processing system which takes a Latinized sentence as input and outputs the tone markers. We were able to obtain an accuracy rate of 97.56% and 88.90% with training and testing data, respectively. We analyze the sources of error for the purpose of future improvement. Keywords: written Taiwanese, tone sandhi system, Taiwanese latinization

Finally we implemented a Taiwanese tone sandhi processing system which takes a Latinized sentence as input and outputs the tone markers.

We were able to obtain an accuracy rate of 97.56% and 88.90% with training and testing data, respectively. We analyze the sources of error for the purpose of future improvement.

Keywords: written Taiwanese, tone sandhi system, Taiwanese latinization

Tone Sandhi at Word Level -1 Normal sandhi : most cases follow this rule, 1 ->7 2 ->1 3 ->2 4->2 /-h (8 /-p-t-k) 5 ->7(3) 7 ->3 8 ->3 /-h (4 /-p-t-k)

Normal sandhi : most cases follow this rule,

1 ->7

2 ->1

3 ->2

4->2 /-h (8 /-p-t-k)

5 ->7(3)

7 ->3

8 ->3 /-h (4 /-p-t-k)

Tone Sandhi at Word Level -2 Following sandhi : this pattern generally occurs on pronouns or the suffix of names. The tone pitch depends on that of the immediately preceding syllable and is either tone 1, 3, or 7. Neutral sandhi : the previous syllable is read as base tone, and the tones of the neutral sandhi are read softly as if they were tone 3 or tone 4 Double sandhi : this pattern mostly appears in syllables endng in the glottal stop (-h) and having tone 4. The normal sandhi rules are applied twice in sequence (i.e. tone 4 -> tone 2 -> tone 1)

Following sandhi : this pattern generally occurs on pronouns or the suffix of names. The tone pitch depends on that of the immediately preceding syllable and is either tone 1, 3, or 7.

Neutral sandhi : the previous syllable is read as base tone, and the tones of the neutral sandhi are read softly as if they were tone 3 or tone 4

Double sandhi : this pattern mostly appears in syllables endng in the glottal stop (-h) and having tone 4. The normal sandhi rules are applied twice in sequence (i.e. tone 4 -> tone 2 -> tone 1)

Tone Sandhi at Word Level -2 Pre- á sandhi : the syllables before á are different from the normal sandhi unless they are tone 1 or tone 2 Triplicated sandhi : the first syllable of triplicated words does not follow normal sandhi rules unless it is of tone 2, 3, or 4 Rising sandhi : this pattern usually occurs in loanwords from Japanese; the sandhi tone is similar to tone 5

Pre- á sandhi : the syllables before á are different from the normal sandhi unless they are tone 1 or tone 2

Triplicated sandhi : the first syllable of triplicated words does not follow normal sandhi rules unless it is of tone 2, 3, or 4

Rising sandhi : this pattern usually occurs in loanwords from Japanese; the sandhi tone is similar to tone 5

Tone Sandhi at Sentence Level In brief, tonal groups are related to syntax in a way that it is possible to cut a sentence into a sequence of tonal groups on the basis of its syntactic structural description. A sentence has one or more tonal group, the boundary is at the last syllable of the sentence, the preceding syllable of ê , the last syllable of noun phrase, and so on. The boundary syllable is pronunciated as base tone. In fact, it seems a very long story.

In brief, tonal groups are related to syntax in a way that it is possible to cut a sentence into a sequence of tonal groups on the basis of its syntactic structural description.

A sentence has one or more tonal group, the boundary is at the last syllable of the sentence, the preceding syllable of ê , the last syllable of noun phrase, and so on. The boundary syllable is pronunciated as base tone.

In fact, it seems a very long story.

Our method -1 Method : we use rule-based instead of statistical-based method because no public training data at present. Data : we select 8 segment of Taiwanese Latinization text from 4 articles as training data, the published dates range from 1910’s to 1960’s, there are 614 syllables totally; and another 8 segment of text as testing data, the published dates range from 1880’s to 1990’s, there are 955 syllables totally. POS: we obtain the corresponding Chinese translation for each Taiwanese word by looking up the Taiwanese-Chinese On-line Dictionary. We then look up the POS of the Chinese in the CKIP database.

Method : we use rule-based instead of statistical-based method because no public training data at present.

Data : we select 8 segment of Taiwanese Latinization text from 4 articles as training data, the published dates range from 1910’s to 1960’s, there are 614 syllables totally; and another 8 segment of text as testing data, the published dates range from 1880’s to 1990’s, there are 955 syllables totally.

POS: we obtain the corresponding Chinese translation for each Taiwanese word by looking up the Taiwanese-Chinese On-line Dictionary. We then look up the POS of the Chinese in the CKIP database.

Our method -2 Rules : we formulate 20 rules on 4 different levels : the syllable, the word, the POS, and the sentence pattern(syntax) Example : Chhin-chhi ū ⁿ án-ni lâi kóng , ch ā i lán Tâi-ôan k ī n-k ī n ch í t-tiap-á-kú ê kang-hu , ài soaⁿ chi ū ū soaⁿ , ài hái chi ū ū hái , beh j ó ah chi ū ū j ó ah ,kôaⁿ chi ū ū kôaⁿ. ( 如此說來,在台灣只要花一 點 工夫,要山就有山、要海就有海; 要 熱就有熱、冷就有冷。 ) -> Chhin-chhi ū ⁿ án-ni# lâi kóng# , ch ā i lán Tâi-ôan# k ī n-k ī n ch í t-tiap&-á-kú# ê kang-hu# , ài soaⁿ# chi ū ū soaⁿ# , ài hái# chi ū ū hái# , beh j ó ah# chi ū ū j ó ah# ,kôaⁿ# chi ū ū kôaⁿ#. (we add tone marker)

Rules : we formulate 20 rules on 4 different levels : the syllable, the word, the POS, and the sentence pattern(syntax)

Example : Chhin-chhi ū ⁿ án-ni lâi kóng , ch ā i lán Tâi-ôan k ī n-k ī n ch í t-tiap-á-kú ê kang-hu , ài soaⁿ chi ū ū soaⁿ , ài hái chi ū ū hái , beh j ó ah chi ū ū j ó ah ,kôaⁿ chi ū ū kôaⁿ. ( 如此說來,在台灣只要花一 點 工夫,要山就有山、要海就有海; 要 熱就有熱、冷就有冷。 ) -> Chhin-chhi ū ⁿ án-ni# lâi kóng# , ch ā i lán Tâi-ôan# k ī n-k ī n ch í t-tiap&-á-kú# ê kang-hu# , ài soaⁿ# chi ū ū soaⁿ# , ài hái# chi ū ū hái# , beh j ó ah# chi ū ū j ó ah# ,kôaⁿ# chi ū ū kôaⁿ#. (we add tone marker)

Results Accuracy rates of sandhi marks Problems : Lack of POS standards for Taiwanese Lack of word segmentation standard and dictionary following the standard for Taiwanese standardization of written Taiwanese some tone sandhi problems cannot be solved by POS order 88.90% 105 955 Testing data 97.56% 15 614 Training data Acc Rate Errors Syllables

Accuracy rates of sandhi marks

Problems :

Lack of POS standards for Taiwanese

Lack of word segmentation standard and dictionary following the standard for Taiwanese

standardization of written Taiwanese

some tone sandhi problems cannot be solved by POS order

Future Work Solicit assistance from linguists ; Improve word segmentation, especially the processing of morphology, quantitative words, and proper nouns ; Improve the processing of POS tags to account for ambiguity ; Improve the dictionary of part-of-speech ; Improve the sandhi rules ; Find alternative ways of modeling sandhi processing .

Solicit assistance from linguists ;

Improve word segmentation, especially the processing of morphology, quantitative words, and proper nouns ;

Improve the processing of POS tags to account for ambiguity ;

Improve the dictionary of part-of-speech ;

Improve the sandhi rules ;

Find alternative ways of modeling sandhi processing .

Add a comment

Related presentations

Related pages

A Study on Implementation of Southern-Min Taiwanese Tone ...

A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System ... Keywords: written Taiwanese, tone sandhi system, Taiwanese latinization 1.
Read more

A Study on Implementation of Southern-Min Taiwanese Tone ...

A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System on ResearchGate, the professional network for scientists.
Read more

A Study on Implementation of Southern-Min Taiwanese Tone ...

A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System. Text
Read more

A Study on Implementation of Southern-Min Taiwanese Tone ...

8 Tone Sandhi at Sentence Level In brief, tonal groups are related to syntax in a way that it is possible to cut a sentence into a sequence of tonal groups ...
Read more

1 A Study on Implementation of Southern-Min Taiwanese Tone ...

1 A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System Iu n Un-gian d93001@csie.ntu.edu.tw Lau Kiat-gak kiatgak@gmail.com Li Sheng-an.
Read more

A Study on Implementation of Southern-Min Taiwanese Tone ...

A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System 轉去Ungian e作品 轉去本站首頁 PDF PPT 最近修改 ...
Read more

Modeling Taiwanese Southern-Min Tone Sandhi Using Rule ...

Modeling Taiwanese Southern-Min Tone Sandhi Using ... well as the results obtained from its implementation. ... Taiwanese Southern-Min tone sandhi system ...
Read more

An additional study and implementation of tone calibrated ...

An additional study and implementation of tone calibrated technique of modulation on ResearchGate, the professional network for scientists.
Read more

Selected Bibliography: The Southern Min language of Taiwan ...

Selected Bibliography: The Southern Min language of ... system. Taiwanese authors of English ... Min and a little Hakka lexical tone sandhi: Right ...
Read more