advertisement

Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

50 %
50 %
advertisement
Information about Formal Arguments, Preferences, and Natural Language Interfaces to...
Education

Published on September 22, 2014

Author: fcerutti

Source: slideshare.net

Description

Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Talk given during ECAI 2014
advertisement

Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation Federico Cerutti Nava Tintarev Nir Oren ECAI 2014 — Friday 22nd August, 2014

Motivation – Distributed autonomous systems increasingly used – Reasoning can be formalized as argumentation – However, if we need to explain this to people the information presentation needs to be more natural – Can we create a bridge between natural language and formal argumentation? – What kind of factors need to be considered - Preferences between arguments? - Domain specific knowledge? 2 of 31

Background The Experiment Methodology Results Conclusions 3 of 31

Background on P&S Rule-based argumentation framework Allows to express arguments in favour of preferences among rules Includes negation as failure an strong negation Although it is pre-Dung1995, it is easy to draw a correspondence with an abstract argumentation frameworks (there are some points where we should be cautious, but it is not the case of this work) 4 of 31

Crash course on P&S Each rule as a set of antecedents and a consequent Strict (they cannot contain negation as failure atoms) and defeasible rules Arguments as sequence (instead of recursive structure like in ASPIC) of rules The conclusions of an argument is the set containing each consequent of each rule of the argument Attacks: on some antecedent of some rule on some conclusion Skeptical semantics: grounded Credulous semantics: stable 5 of 31

Example S D s1 : ⇒ sAAA s2 : ⇒ sBBB s3 : ⇒ sdoc r1 : sAAA ∧ ∼ exAAA ⇒ poorer r2 : sBBB ∧ sdoc ∧ ∼ exBBB ∧ ∼ exdoc ⇒ ¬ poorer r3 : ∼ exexpert ⇒ r1 r2 A politician and an economist discuss the potential financial outcome of the independence of a region X. The politician puts forward an argument in favour of the conclusion “If Region X becomes independent, X’s citizens will be poorer than they are now”. Another argument holding a contradicting conclusion (i.e. Region X will not be poorer) is advanced by the economist. The economist’s opinion is likely to be preferred to that of the politician, and is supported by a scientific document. rgs = {a1 = 〈s1,r1〉,a2 = 〈s2,s3,r2〉,a3 = 〈r3〉}; a2 rgs-defeats a1 a2 justified 6 of 31

Background The Experiment Methodology Results Conclusions 7 of 31

The Experiment Presenting each participant with a text, written in natural language, followed by a questionnaire Between subjects design across eight texts: each participant is shown a single (randomly selected) text Four domains: 1 weather forecast 2 political debate 3 used car sale 4 romantic relationship Two KBs: base case, and extended case The base case always consider two arguments a1 and a2 with two contradicting conclusions; and a preference in favour of a2 8 of 31

The Extended Case for the Example More recent research disputes the claim of the economist S D s1 : ⇒ sAAA s2 : ⇒ sBBB s3 : ⇒ sdoc s4 : ⇒ sresearch s5 : sresearch ⇒ ¬sdoc r1 : sAAA ∧ ∼ exAAA ⇒ poorer r2 : sBBB ∧ sdoc ∧ ∼ exBBB ∧ ∼ exdoc ⇒ ¬ poorer r3 : ∼ exexpert ⇒ r1 r2 rgs = {a1 = 〈s1,r1〉,a2 = 〈s2,s3,r2〉,a3 = 〈r3〉,a4 = 〈s4,s5〉} a2 rgs-defeats a1,a2 rgs-defeats a4,a4 rgs-defeats a2, Two stable extensions: {a1,a3,a4} and {a2,a3} 9 of 31

Domain 1: weather forecast The weather forecasting service of the broadcasting company AAA says that it will rain tomorrow (a1). Meanwhile, the forecast service of the broadcasting company BBB says that it will be cloudy tomorrow but that it will not rain (a2). It is also well known that the forecasting service of BBB is more accurate than the one of AAA (a3). However, yesterday the trustworthy newspaper CCC published an article which said that BBB has cut the resources for its weather forecasting service in the past months, thus making it less reliable than in the past (a4). 10 of 31

Domain 2: political debate In a TV debate, the politician AAA argues that if Region X becomes independent then X’s citizens will be poorer than now (a1). Subsequently, financial expert (a3) Dr. BBB presents a document; which scientifically shows that Region X will not be worse off financially if it becomes independent (a2). After that, the moderator of the debate reminds BBB of more recent research by several important economists that disputes the claims in that document (a4). 11 of 31

Domain 3: buying a car You are planning to buy a second-hand car, and you go to a dealership with BBB, a mechanic whom has been recommended you by a friend (a3). The salesperson AAA shows you a car and says that it needs very little work done to it (a1). BBB says it will require quite a lot of work, because in the past he had to fix several issues in a car of the same model (a2). While you are at the dealership, your friend calls you to tell you that he knows (beyond a shadow of a doubt) that BBB made unnecessary repairs to his car last month (a4). 12 of 31

Domain 4: romance After several dates, you would like to start a serious relationship with J. but you turn to ask two friends of yours, AAA and BBB, for advice. You have known BBB for longer than you have known AAA (a3). AAA tells you that J is lovely and you should go ahead (a1), while BBB suggests that you should be very cautious because J might have a hidden agenda (a2). After some weeks, CCC, who is also a close friend of BBB, tells you that BBB has been into you for years; BBB is too shy to tell you about their feelings about you, but are still possessive of you (a4). 13 of 31

Formalisation summary Domain Base Case Extended Case Type of reinstatement 1, weather 1.B 1.E preference attack 2, politics 2.B 2.E a2 rebuttal 3, buying car 3.B 3.E preference attack 4, romance 4.B 4.E preference rebuttal 14 of 31

Background The Experiment Methodology Results Conclusions 15 of 31

Methodology Participants are asked to determine which of the following positions they think is accurate: A: I think that AAA’s position is correct (e.g. “X’s citizens will be poorer than now”) B: I think that BBB’s position is correct (e.g. “X’s citizens will not be worse off financially”) U: I cannot determine if either AAA’s or BBB’s position is correct (e.g. “I cannot conclude anything about Region X’s finances”) Rate a statements in terms of relevance (for the conclusion) and agreement on a 7 points scale from Disagree to Agree for each statement 16 of 31

Hypotheses H1: In the base cases (Scenarios 1.B, 2.B, 3.B and 4.B), the majority of participants will agree with BBB’s statement (position B) H2: In the extended cases (Scenarios 1.E, 2.E, 3.E and 4.E), the majority of participants will agree that they cannot conclude anything from the text (position U). H3: The majority of participants who view a base case scenario will agree with the preference argument, and find it relevant 17 of 31

Background The Experiment Methodology Results Conclusions 18 of 31

Hypotheses H1 and H2 0 15 30 45 60 A B U % Distribution of acceptability of actors’ positions Base cases Extended cases Distribution of the final conclusion A/ B/ U Base cases, χ 2 analysis (2, N=77)=37.74, p < 0.001; extended cases χ 2 (2, N=84)=8.0, p < 0.02 19 of 31

Hypothesis H3 Participants rate how much (on a scale of 1 to 7) they agree with the following statement (agreement), and whether it is relevant in drawing their conclusion (relevance): “BBB is more trustworthy than AAA.” Significant difference between the base and the extended cases for agreement (Mann-Whitney U(1778), Z = −5.0, p < 0.001) and relevance (Mann-Whitney U(1852), Z = −4.7, p < 0.001). In addition, the median values both for agreement and relevance are greater for the base cases than for the extended cases 20 of 31

Post Hoc: Motivations Base Cases Extended Cases A B U A B U 1, weather 5.0 50.0 45.0 15.8 21.1 63.2 2, politics 5.3 63.2 31.6 21.1 10.5 68.4 3, buying car 0.0 68.2 31.8 23.8 23.8 52.4 4, romance 12.5 68.8 18.8 48.0 36.0 16.0 Distribution of the final conclusion A/ B/ U Fisher (N = 161) = 48.756, p < 0.001, 10000 sampled tables, Monte Carlo approach with 99% confidence interval (MC99) 21 of 31

Post Hoc: Distributions of Base Cases 0 15 30 45 60 U1 U2 U3 % Distributions of motivations for U (scenarios 1.B and 3.B) 1.B 3.B Agreement with the U position in scenarios 1.B and 3.B: U1: lack of information, U2: domain specific reasons; U3: other 22 of 31

Post Hoc: Distributions between Base/Extended Cases Base Cases Extended Cases A B U A B U 1, weather 5.0 50.0 45.0 15.8 21.1 63.2 2, politics 5.3 63.2 31.6 21.1 10.5 68.4 3, buying car 0.0 68.2 31.8 23.8 23.8 52.4 4, romance 12.5 68.8 18.8 48.0 36.0 16.0 Are the distributions of choices (among A, B, and U) in the base case is significantly different from the distribution of choices in the corresponding extended case? YES for the third domain (3.B and 3.E, buying a car) — Fisher (N = 43) = 10.693, p < 0.001, 10000 sampled tables, MC99. NO for the first domain (1.B and 1.E, weather forecasts) — Fisher (N = 39) = 3.832, p = 0.187, 10000 sampled tables, MC99. 23 of 31

Post Hoc: Distributions Extended Cases Base Cases Extended Cases A B U A B U 1, weather 5.0 50.0 45.0 15.8 21.1 63.2 2, politics 5.3 63.2 31.6 21.1 10.5 68.4 3, buying car 0.0 68.2 31.8 23.8 23.8 52.4 4, romance 12.5 68.8 18.8 48.0 36.0 16.0 Domain has a significant effect on the distribution of positions — Fisher (N = 84) = 16.308, p < 0.05, 10000 sampled tables, MC99. 24 of 31

Post Hoc: Relevance and Agreement Base cases Extended cases RB † Md∗ B RE † Md∗ E C.D.‡ Relevance 1, weather 110.38 6.00 82.92 4.00 46.60 2, politics 107.45 6.00 69.45 4.00 47.19 3, buying car 118.05 6.50 67.45 4.00 44.38 4, romance 48.34 2.00 44.40 2.00 46.57 Agreement 1, weather 116.38 6.00 87.18 4.00 46.60 2, politics 103.34 6.00 65.05 4.00 47.19 3, buying car 121.93 6.50 64.33 4.00 44.38 4, romance 44.94 2.00 44.20 2.00 46.57 Statistically significant cases when |Rx − Ry| > C.D. † Mean rank as computed with the Kruskal-Wallis test ‡ Critical Difference, as computed in [Siegel and Castellan Jr., 1988] cited by [Field, 2009] with α = 0.05. 25 of 31

Post Hoc: Relevance and Agreement Scenario 3.B Scenario 4.B R3.B † Md∗ 3.B R4.B † Md∗ 4.B C.D.‡ Relevance 118.05 6.50 48.34 2.00 47.79 Agreement 121.93 6.50 44.94 2.00 47.79 Statistically significant cases when |Rx − Ry| > C.D. † Mean rank as computed with the Kruskal-Wallis test ‡ Critical Difference, as computed in [Siegel and Castellan Jr., 1988] cited by [Field, 2009] with α = 0.05. 26 of 31

Background The Experiment Methodology Results Conclusions 27 of 31

Conclusions Investigation into the relationship between formal systems of defeasible argumentation and arguments in natural language Results suggest a correspondence between the formal theory and its representation in natural language Preference generally applied “following” Prakken and Sartor: importance of being able to represent them Humans evaluate preference depending on the context Collateral knowledge Reverse of preference 28 of 31

Acknowledgement Research was sponsored by US Army Research laboratory and the UK Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the US Army Research Laboratory, the U.S. Government, the UK Ministry of Defense, or the UK Government. The US and UK Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. This research has been carried out within the project “Scrutable Autonomous Systems” (SAsSY), funded by the Engineering and Physical Sciences Research Council (EPSRC, UK), grant ref. EP/J012084/1. 29 of 31

Advert 30 of 31

References I [Field, 2009] Field, A. (2009). Discovering Statistics Using SPSS (Introducing Statistical Methods series). SAGE Publications Ltd. [Siegel and Castellan Jr., 1988] Siegel, S. and Castellan Jr., N. J. (1988). Nonparametric Statistics for The Behavioral Sciences. McGraw-Hill Humanities/Social Sciences/Languages. 31 of 31

Add a comment

Related presentations

Related pages

Formal Arguments, Preferences, and Natural Language ...

Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation Federico Cerutti and Nava Tintarev and Nir Oren1
Read more

SAsSy - Scrutable Autonomous Systems

Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation
Read more

Content By User | ITACS

Home › Content By User. ... Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation:
Read more

Transition network grammars for natural language analysis

The use of augmented transition network grammars for the ... A natural language interface for ... Formal specification of natural language ...
Read more

Performance, preference, and visual scan patterns on a ...

... and to derive design principles for menu-based natural language (MBNL) interfaces to ... J. J. Natural language human ... around a semi-formal notation ...
Read more

Encyclopedia of Cognitive Science: User Interface Design

User Interface Design Andrew Dillon ... exploit the natural human perceptual tendency to attend ... results from repeated empirical tests of evolving ...
Read more

Natural Language Interfaces to Ontologies: Combining ...

Natural Language Interfaces to Ontologies: Combining ... ies have shown user preference to Natural Language Interfaces ... human understandable ...
Read more

Towards Scrutable Autonomous Systems - ResearchGate

Towards Scrutable Autonomous Systems ... To make formal arguments useful to humans, ... this is a famous open problem in Natural Language Generation (NLG ...
Read more