CCCT University of Amsterdam Seminars 2013: Crowdsourcing Session

64 %
36 %
Information about CCCT University of Amsterdam Seminars 2013: Crowdsourcing Session
Technology

Published on March 17, 2014

Author: laroyo

Source: slideshare.net

Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty gathering gold standard annotations for relation extraction Crowd Truth Harnessing Disagreement in Crowdsourcing

Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty IBM Confidential •  Open Domain Question-Answering Machine, that given – Rich Natural Language Questions – Over a Broad Domain of Knowledge •  Won a 2-game Jeopardy match against the all-time winners –  viewed by over 50,000,000

Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty Cognitive Computing EXPANDS human cognition, makes the jobs we do easier, like a cognitive prosthesis, especially when dealing with processing massive data, or data that requires human interpretation LEARNS as you use it – most machine errors are easy for a human to detect, and we can instrument usage of systems to better understand the system and the problem it solves INTERACTS naturally. We need to bring machines closer to their users, we have adapted ourselves enough to them, they should understand natural language, spoken or written, be able to process images and videos. These simple human problems are extremely complex for machines, but are hallmarks of a new computing era.

Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty Watson MD •  Adapt Watson to Medical QA •  Mainly an NLP task •  Cognitive computing systems need human-annotated data for training, testing, evaluation the human annotation task is one of semantic interpretation Now answering medical questions!

Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis. Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention substance disorder disorder NER disorder treatment NLP Tasks Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

NLP Tasks Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis. Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base C0016911 C1408325 C0035078 C1619692 C0019004 NLP Tasks Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

NLP Tasks Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis. Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base Relation detection: find relations that are expressed in a passage between factors? cause treats treats contra- indicates NLP Tasks Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

NLP Tasks Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic fibrosis. Mention detection: find the spans (begin, end) of relevant medical terms (factors) in a passage. Factor Typing: find the type of each mention Factor (Entity) Identification: find the corresponding ids for a mentioned factor in a knowledge-base Relation detection: find relations that are expressed in a passage between factors? Coreference: Find the mentions in a sentence that refer to the same factor.

Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty Gold Standard Assumption •  Cognitive systems need to be told what is right & what is wrong •  A gold standard or ground truth •  Performance is measured on test sets vetted by human experts à never perfect, always improving against test data •  Historically, gold standards are created assuming that for each annotated instance there is a single right answer •  Gold standard quality is measured in inter-annotator agreement à does not account for perspectives, for reasonable alternative interpretations

but people don’t always agree…

Disagreement Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis. cause Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Gadolinium agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis there is a risk of nephrogenic systemic fibrosis. side-effect The human annotation task is one of semantic interpretation Disagreement Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Why do people disagree? Sentence Relation Worker Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Key Question How do we represent & measure disagreement in a way that it can be harnessed?

Why do people disagree? Sign Referent Observer Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty Triangle of Reference

Position maybe this disagreement is a signal and not noise? can we harness it?

Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty Crowd Truth Annotator disagreement is signal, not noise. It is indicative of the variation in human semantic interpretation of signs, and can indicate ambiguity, vagueness, over-generality, etc. http://www.freefoto.com/preview/01-47-44/Flock-of-Birds

Approach Principles 1. understand the range of disagreements by creating a space of possibilities with frequencies & similarities 2. tolerate, capture & exploit disagreement 3. score machine output based on where it falls in this space 4. adaptable to new annotation tasks Flickr: auroille

Crowd Watson •  Crowdsourcing gold standard data for •  Training Watson in medical domain, as well as for events extraction, image annotations, video tagging and summarization •  Crowdsourcing for Domain Adaptation •  How to rapidly acquire knowledge for new domains •  Platforms •  CrowdFlower, Amazon Mechanical Turk •  Crowdsourcing Games with a Purpose, e.g. Dr. Watson, Waisda? Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo Relation Extraction crowdsourcing gold standard data Relations overlap in meaning Sentences are vague and ambiguous Experts have different interpretations

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo In distant supervision we take arguments that are known to be related by a target relation in a knowledge base and we find all sentences in a corpus that mention both arguments.

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo Representation Worker Vector 1 1 1

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo Representation Sentence Vector 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 4 3 0 0 5 1 0

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo Feeling the way the CHEST expands (PALPATION), can identify areas of the lung that are full of fluid. ?PALPATIONIs CHEST related to diagnose location associated with is_a otherpart_of 0 0 02 3 0 0 0 1 0 0 44 1 Disagreement for Sentence Clarity Unclear relationship between the two arguments reflected in the disagreement

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo ?CONJUNCTIVITISHYPERAEMIA related toIs 0 0 0 1 0 0 0 013 0 0 0 0 0 symptomcause Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora) of the eyes are symptoms common to all forms of CONJUNCTIVITIS. Disagreement for Sentence Clarity Clearly expressed relation between the two arguments reflected in the agreement

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo Sentence-Relation Score Measures how clearly a sentence expresses a relation 0 1 1 0 0 4 3 0 0 5 1 0 Unit vector for relation R6 Sentence Vector Cosine = .55

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo Worker Disagreement Measured per worker Worker-sentence disagreement 0 1 1 0 0 4 3 0 0 5 1 0 Worker’s sentence vector Sentence Vector AVG (Cosine)

Crowd Truth Metrics Relation Extraction Three parts to understand human interpretations: §  Sentence •  How good is a sentence for relation extraction task? §  Workers •  How well does a worker understand the sentence? §  Relations •  Is the meaning of the relation clear? •  How ambiguous/confusable is it? Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Crowd Truth Metrics Based on the Triangle of Reference Three parts to understand human interpretations: §  Sign •  How good is a sign for conveying information? §  People •  How well does a person understand the sign? §  Ontology •  Are the distinctions of the ontology clear? •  How ambiguous/confusable are they? Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo The Dark Side of Crowdsourcing Disagreement • spammers generate disagreement for the wrong reasons • most spam detection requires gold standard • Worker-sentence disagreement: the average of all the cosines between each worker’s sentence vector and the full sentence vector (minus that worker). Indicates how much a worker disagrees with the crowd on a sentence basis • Worker-worker disagreement: a pairwise confusion matrix between workers and the average agreement across the matrix for each worker. Indicates whether there are consistently like-minded workers

Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo Harnessing Disagreement • Sentence-relation score: measured for each relation on each sentence as the cosine of the unit vector for relation with sentence vector • Sentence clarity: for each sentence - max relation score for that sentence. If all the workers selected the same relation for a sentence, the max score is 1, indicating a clear sentence • Relation similarity: pairwise conditional probability that if relation Ri is annotated in a sentence, then Rj is as well. Indicates how confusable linguistic expression of two relations are • Relation ambiguity: max relation similarity for a relation. If a relation is clear score is low • Relation clarity: max sentence-relation score for a relation over all sentences. If a relation has a high clarity score, it means that it is at least possible to express the relation clearly • Worker Quality: avg. cosine of worker vector with sentence vector for all sentences the worker annotated.

Disagreement metrics •  Diverging opinions cluster around the most plausible options. •  Identify workers who systematically disagree 1.  With the opinion of the majority (worker-sentence disag) o  Compare worker opinion with that of the majority 2.  With the rest of their co-workers (worker-worker disag) o  Workers with the same opinion as worker W. 3.  + Avg. number of relations / sentence Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Task completion time

Task completion time

Task completion time

Spam in a channel

Conclusions •  Crowd Truth can help us understand the diversity of interpretations •  with adequate representation & metrics •  dispense with the “one correct answer” assumption •  Disagreement metrics can be augmented by content filters for better spam detection •  explanations by workers can be useful Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

The Crew •  Lora Aroyo (VU) •  Chris Welty (IBM) •  Guillermo Soberon (VU) •  Hui Lin (IBM) •  Anca Dumitrache (VU) •  Oana Inel (VU) •  Manfred Overmeen (IBM) •  Robert-Jan Sips (IBM)

http://crowd-watson.nl

Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty Questions?

Accuracy pred. low quality (1) Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Accuracy pred. low quality (2) Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Spamming scenarios Dev. Test •  12 spammers / 110 workers •  139 "spammed" sentences out of 1302 (11%) •  100% accuracy spam detection •  20 spammers / 93 workers •  386 "spammed" sentences out of 1291 (30%) •  89% accuracy (10 spammers missed) Can we do better? Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Data collected •  Annotations o  12 relations + OTH / NON o  Behaviour with respect to the crowd Disagreement Filters Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

•  Annotations o  12 relations + OTH / NON o  Behaviour with respect to the crowd •  Explanations o  Selected Words (justify the choice) o  Explanation (for OTHER or NONE) o  Individual behaviour patterns. Disagreement Filters Explanation filters Data collected Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Relation Extraction Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Explanations analysis Four patterns in worker behaviour indicating spam: o  No Valid Words were used for the text o  Using the same text for all the annotations o  Using the same text for both "Selected words" and "Explanation" o  Bad understanding (not following) of the task instructions: §  Selecting "None" and "Other" in combination with other relations §  Including explanations when are not required. Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Spam patterns analysis None / Other Rep. Response Rep. Text No Valid Words Spam Candidates 22 8 14 12 Overlap with disagreement 18% 37% 36% 42% 30 unique workers were identified ONLY by the Explanation filters as possible low quality workers. Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Spam patterns analysis None / Other Rep. Response Rep. Text No Valid Words Spam Candidates 22 8 14 12 Overlap with disagreement 18% 37% 36% 42% 30 unique workers were identified ONLY by the Explanation filters as possible low quality workers. Explanation Filters ⊄ Disagreement metrics Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Results •  Linear combination of Disagreement metrics + Explanation filters o  "No Valid Words" and Avg. Num Relations / sent a bit more weight than the rest •  Results o  95% accuracy and .88 F1 score o  16 spammers out of 20 •  Previously, only with disagreement metrics: o  88% Accuracy, .66 F1 score o  10 spammers out of 20 Lora Aroyo Crowd Truth for Cognitive Computing Chris Welty

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

CCCT Seminar - Cross-language and Professional Search

The Barlaeus lecture is organized by Bureau Kennistransfer of the University of Amsterdam. ... Crowdsourcing for Cultural ... 2013, CCCT will organize a ...
Read more

CCCT Seminar - Linked data and computer insight

... institutes and groups of several faculties of the University of Amsterdam and the ... CCCT seminar Friday 20 ... December 2013, 16.00 ...
Read more

2010 – 2013 Seminars | Yahoo Labs

Event: 2010 - 2013 Seminars. When: ... (Free University Amsterdam, ... “Crowdsourcing Query Difficulty ...
Read more

Conferences and Seminars - ATEE - Association for Teacher ...

Conferences and Seminars. ... 10 May 2013. University ... From 22 to 26 October 2005 the Amsterdam Institute of Education and the University of Amsterdam ...
Read more

Information School - A world-class university – a unique ...

Information School. ... Dagstuhl Seminar 14282 on Crowdsourcing and the Semantic Web, 2014. ... UvA, Amsterdam, The Netherlands, 2013.
Read more

Fall 2013 Seminars | Center for Iranian Studies

COLUMBIA UNIVERSITY SEMINARS SEMINAR ON IRANIAN STUDIES FALL 2013 PROGRAM. Dr. Sean Anthony: December 11 Special Session: 150th Birthday of Abraham ...
Read more

Vrije Universiteit Amsterdam

De VU is een maatschappelijk betrokken onderzoeksuniversiteit, in Amsterdam, in het hart van het internationale zakencentrum de Zuidas.
Read more

Professional Activities - Homepages of UvA/FNWI staff

Marcel Worring's research on Multimedia Analytics @ University of Amsterdam. ... Professional Activities ... Hogeschool van Amsterdam, 2013 ;
Read more

GianlucaDEMARTINI( - Gianluca Demartini - Publication List

GianlucaDEMARTINI((Information*School! ! ! ! ! !!!! !!E#mail:g.demartini@sheffield.ac.uk* University*of*Sheffield* * * **** * ***** ***http://www ...
Read more