Syntactic Piece: Idea, Purpose and Application to Sentiment Analysis

100 %
0 %
Information about Syntactic Piece: Idea, Purpose and Application to Sentiment Analysis
Technology

Published on March 12, 2014

Author: jnlp

Source: slideshare.net

Kazuki TAKIGAWA and Kazuhide YAMAMOTO Department of Electrical Engineering Nagaoka University of Technology, JAPAN {takigawa,yamamoto}@jnlp.org 1

Background • bag-of-words o It is difficult to see sense of an expression. ex.)“かける[kakeru]” has some meaning; “do up”,”put on”, “take out” and so on. • word n-gram o It is often creates unnecessary elements. ex.) ”で-ある-こと[de-aru-koto](3-gram)” A processing unit which can keep meaning of expression is needed. Mainly processing units have some problems in Japanese 2

• bag-of-words o It is difficult to see sense of an expression. o ex.) 「かける」という単語 • word n-gram o It is often creates unnecessary elements. ex.) 「が,かける(2-gram)」「で,ある,こと(3-gram)」 A processing unit which can keep meaning of expression is needed. Mainly processing units in NLP We propose “syntactic piece”. 3 Background

• Syntactic piece is a minimum unit of syntactic structure. • It consists of a pair of modifier and modificand, derived from syntactic structure. • This pair is expressed as: modifier → modificand Recently, immediate noise is very big. (最近まわりの騒音がとても大きい) recently big 最近→大きい Syntactic Piece What’s Syntactic Piece? very big とても→大きい immediate noise まわりの→騒音 noise is big 騒音が→大きい 4

Advantages of Syntactic Piece Very simple It is easy to use, just like n-gram. It has syntactic structure It contains more information than n- gram. Similar to phrasal idiom It can deal with a chunk of meaning. 5

Advantages of Syntactic Piece Very simple It is easy to use, just like n-gram. It has syntactic structure It contains more information than n- gram. Similar to phrasal idiom It can deal with a chunk of meaning. But syntactic piece has some problems. 6

1) Length of syntactic piece tends to be long because syntactic piece is pair of phrase. So if we use syntactic piece, then we get many unique expressions. 2) Some phrase pairs not have meaning are included in the phrase pair generated by current method. Problem of Syntactic Piece We suggest solution of these problems. 7

Method(1) - Generalization of Same Class Expressions - We generalize “same class expressions” for decreasing unique expressions. “Same class expressions” means a set of expressions which have similar meaning even if the surface is different. 1.cake is delicious (ケーキ-が→おいしい) 2.delicious cake (おいしい→ケーキ) In these two expressions, the surface structure is different. But the meaning of both expression are very similar. These expressions, we call “same class expressions”. 8

Method(1) - Generalization of Same Class Expressions - We generalize same class expressions. Same class expressions have two criteria. 9

Method(1) - Generalization of Same Class Expressions - (1)Syntactic pieces constructed by adjective and noun with the same contents words. noun(-particle) → adjective adjective → noun 騒音-が → 大きい noise is big 大きい → 騒音 big noise We generalize same class expressions. Same class expressions have two criteria. 10

Method(1) - Generalization of Same Class Expressions - (2) Syntactic pieces constructed by verb and noun with the same contents words. noun(-particle) → verb verb → noun 子供-が → 楽しむ a child rejoice 楽しむ → 子供 rejoicing child We generalize same class expressions. Same class expressions have two criteria. 11

Method(2) - Coping with form word - I can be satisfied.(満足することができる) Some phrase pairs not have meaning. 12 be satisfied 満足する→ こと(mannzoku-suru koto) I can be こと-が → できる(koto-ga dekiru)

be satisfied 満足する→ こと(mannzoku-suru koto) Method(2) - Coping with form word - I can be satisfied.(満足することができる) Modification relation is nothing. Some phrase pairs not have meaning. 13 I can be こと-が → できる(koto-ga dekiru)

Method(2) - Coping with form word - I can be satisfied.(満足することができる) be satisfied 満足する→ こと(mannzoku-suru koto) I can be こと-が → できる(koto-ga dekiru) Modification relation is nothing. Any meaning is nothing. Some phrase pairs not have meaning. 14

Method(2) - Coping with form word - The reason of this problem is that it is treated “こと[koto]” as “form word”. Form word is a type of content word, but it is diminished original meaning and used formally in Japanese. This is similar to relation pronoun such as “which”, “who”, ”when” etc. in English. 15

• We collected form words by manual. • We treat the phrase having form word as function word for before content word. be satisfied 満足する → こと I can be こと-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 16 Method(2) - Coping with form word - conventional syntactic piece

be satisfied 満足する → こと I can be こと-が → できる I can be satisfied 満足すること-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 17 Method(2) - Coping with form word - • We collected form words by manual. • We treat the phrase having form word as function word for before content word. copying with form word

Application to Sentiment Analysis • We apply to sentiment analysis for verifying effectivity of improved syntactic piece. • Target of sentiment analysis is a sentence, and a sentence is classified into positive, negative, or other. 1. A pair of evaluative expression and semantic orientation score (SO-score) are registered in a dictionary. in this: evaluative expression = syntactic piece 2.Each expression in input sentence is given SO-score from the dictionary. 3.A sentence is classified by summation of SO-score. 18

noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative 19

noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative Syntactic pieces are obtained from input. 20

noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching input dictionary Sentence Classification noise is big:negative (騒音-が → 大きい) SO of syntactic Piece noise of fan (ファン-の → 騒音) input: negative Obtained syntactic piece and word(s) of a dictionary are matched. 21

noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) obtained syntactic piece noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input Sentence Classification noise of fan (ファン-の → 騒音) dictionary matching noise is big(騒音が大きい) negative We can treat that “noise is big” is negative. input: negative 22

noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative SO of input is negative. 23

Reason for Applying Sentiment Analysis • This method uses a dictionary, so If we have SO-score of an expression:“noise is big”, then we can give SO-score from “big noise” by same class expressions. • There should not be an expression which does not have meaning in a dictionary, such as “I can be” is “positive” by coping with form word. 24

Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ positive sentences negative sentences seed dictionary training data 25

positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary We prepare positive and negative sentences as training data. training data 26

positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary Syntactic pieces are obtained from training data, and calculated frequency. training data 27

positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training data Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary. 28

positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training dataSO-score is calculated by probability of occurrence. (Fujimura et al.[04]) 29 Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary.

Evaluation expression is more, the better. For this, we need huge training data. It is costly to prepare by manual. We want to get training data automatically. So we make expanded dictionary. Preparation for Sentence Classification - Expansion of Dictionary - 30

new training dataWe obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive negative 31

We obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary Sentences from corpus are classified positive and negative by seed dictionary. We treat the result of this as new training data. new training data seed dictionary large scale corpus positive negative 32

new training dataWe obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive positive Syntactic pieces are obtained from new training data, and calculated frequency like making a seed dictionary. 33

new training dataWe obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ large scale corpus positive positive Also semantic orientation score, and we treat the result of this as expanded dictionary. expanded dictionary 34

Experiment • We manually prepared; ● approximately 2,000 positive sentences ● approximately 1,000 negative sentences ● approximately 210,000 sentences as large scale corpus for expansion • We analyzed sentiment using the following methods for efficacy examination of each of our methods. (1) Using only generalization of same class expressions (2) Using only coping with form word (3) Combination of (1) and (2) (4) Using conventional syntactic piece (for baseline) 35

Result 78.747.7(3) (1)+(2) 75.547.1(4) Baseline 77.344.6(2) only coping with Form word 77.149.8(1) only generalization of same class expressions precision(%)recall(%)language processing units ・We can confirm the improvement of precision by all methods more than baseline. ・We can also improve recall in generalization of same class expressions. 36

Discussion - Generalization of Same Class Expression - • It turned out high in recall than baseline. We could give the semantic orientation score to more sentences, and scale of the expansion dictionary is increased. We could get approximately 14,000 sentences (approximately 5.7% of increase) as new training data greater than conventional syntactic piece. 37

Discussion - Coping with form word - 38 We tried to solve the problem of extraction of phrase pair which does not have meaning. In the result, some sentences that accidentally became the correct answer using conventional syntactic piece. In the dictionary using conventional syntactic piece • “Think that(なる,と → 思う[naru-to → omou])” is given positive score. This expression does not have semantic orientation.

Our method can treat semantic orientation of each expression. In the dictionary using our method • “think to be cumber(邪魔になる-と → 思う[jama ni naru- to → omou])” is given negative score. • “think to become a present(プレゼントになる-と → 思う [present ni naru-to omou])” is given positive score. Discussion - Coping with form word - 39

79.978.8word 2-gram 78.075.3word 3-gram 77.149.8Using same class expressions precision(%)recall(%)language processing units Recall is lower than word 2-gram and word 3-gram. Discussion - Comparison with other language processing unit - 40

Conclusion • We suggested two methods for improvement of syntactic piece. • We applied sentiment analysis to verify effectivity of improved syntactic piece. • As a result, recall and precision of improved syntactic piece increased than conventional one. • It is inferior as compared with word 2-gram or 3- gram. • In future works we intend to improve recall. 41

Thank you. 42

Add a comment

Related presentations

Related pages

Syntactic Piece:Idea, Purpose and Application to Sentiment ...

Syntactic Piece:Idea, Purpose and Application to Sentiment Analysis Kazuki TAKIGAWA Department of Electrical Engineering Nagaoka University of Technology
Read more

Syntactic piece: Idea, purpose and application to ...

Syntactic piece: Idea, purpose and application to ... derived from syntactic ... We then present evaluation results in which sentiment analysis is ...
Read more

Syntactic piece: Idea, purpose and application to ...

Page 1. Syntactic Piece:Idea, Purpose and Application to Sentiment Analysis Kazuki TAKIGAWA Department of Electrical Engineering Nagaoka University of ...
Read more

Syntactic piece: Idea, purpose and application to ...

Syntactic piece: Idea, purpose and application to sentiment analysis Takigawa, Kazuki; Yamamoto, Kazuhide We have been proposed Syntactic Piece, an unit ...
Read more

Syntactic piece: Idea, purpose and application to ...

Syntactic piece: Idea, purpose ... We then present evaluation results in which sentiment analysis ... 2011 7th International Conference on Natural Language ...
Read more

Sentiment analysis: A review and comparative analysis of ...

Sentiment analysis: A review and comparative ... purposes, into other applications ... to sentiment analysis by combining syntactic ...
Read more

Sentiment Analysis | Lexalytics

Sentiment Analysis is the process of determining whether a piece of ... Twitter sentiment analysis ... analysis tools lump together the sentiment expressed ...
Read more

Sentiment analysis in Facebook and its application to e ...

This paper presents a new method for sentiment analysis ... sentiment changes. With this purpose, ... application of sentiment analysis in e ...
Read more

Statistics.com - Sentiment Analysis

Sentiment Analysis taught by Robert Munro and Nitin Indurkhya Aim of Course: This online course, “Sentiment Analysis,” is designed to give you an ...
Read more