Using Cohen's Kappa to Gauge Interrater Reliability

50 %
50 %
Information about Using Cohen's Kappa to Gauge Interrater Reliability
Education

Published on March 15, 2014

Author: billhd

Source: slideshare.net

Description

This slide deck has been designed to introduce graduate students in Humanities and Social Science disciplines to the Kappa Coeffecient and its use in measuring and reporting inter-rater reliability. The most common scenario for using Kappa in these fields is for projects that involve nominal coding (sorting verbal or visual data into a pre-defined set of categories). The deck walks through how to calculate K by hand, helping to demystify how it works and why it may help to use it when making an argument about the outcome of a research effort.

CALCULATING COHEN’S KAPPA A MEASURE OF INTER-RATER RELIABILITY FOR QUALITATIVE RESEARCH INVOLVING NOMINAL CODING

WHAT IS COHEN’S KAPPA? COHEN’S KAPPA IS A STATISTICAL MEASURE CREATED BY JACOB COHEN IN 1960 TO BE A MORE ACCURATE MEASURE OF RELIABILITY BETWEEN TWO RATERS MAKING DECISONS ABOUT HOW A PARTICULAR UNIT OF ANALYSIS SHOULD BE CATEGORIZED. KAPPA MEASURES NOT ONLY THE % OF AGREEMENT BETWEEN TWO RATERS, IT ALSO CALCULATES THE DEGREE TO WHICH AGREEMENT CAN BE ATTRIBUTED TO CHANCE. JACOB COHEN, A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES, EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 20: 37–46, 1960.

THE EQUATION FOR K K = Pr(a) - Pr(e) N-Pr(e) PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE N = TOTAL NUMBER OF RATED ITEMS, ALSO CALLED “CASES”

THE EQUATION FOR K K = Pr(a) - Pr(e) N-Pr(e) THE FANCY “K” STANDS FOR KAPPA PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE N = TOTAL NUMBER OF RATED ITEMS, ALSO CALLED “CASES”

THE EQUATION FOR K K = Pr(a) - Pr(e) N-Pr(e) THE FANCY “K” STANDS FOR KAPPA PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE N = TOTAL NUMBER OF RATED ITEMS, ALSO CALLED “CASES”

CALCULATING K BY HAND USING A CONTINGENCY TABLE RATER 1 R A T E R 2 THE SIZE OF THE TABLE IS DETERMINED BY HOW MANY CODING CATEGORIES YOU HAVE THIS EXAMPLE ASSUMES THAT YOUR UNITS CAN BE SORTED INTO THREE CATEGORIES, HENCE A 3X3 GRID A B C A B C

CALCULATING K BY HAND USING A CONTINGENCY TABLE # of agreements on A disagreement disagreement disagreement # of agreements on B disagreement disagreement disagreement # of agreements on C RATER 1 R A T E R 2 THE DIAGONAL HIGHLIGHTED HERE REPRESENTS AGREEMENT (WHERE THE TWO RATERS BOTH MARK THE SAME THING) A B C A B C

DATA: RATING BLOG COMMENTS USING A RANDOM NUMBER TABLE, I PULLED COMMENTS FROM ENGLISH LANGUAGE BLOGS ON BLOGGER.COM UNTIL I HAD A SAMPLE OF 10 COMMENTS I ASKED R&W COLLEAGUES TO RATE EACH COMMENT: “PLEASE CATEGORIZE EACH USING THE FOLLOWING CHOICES: RELEVANT, SPAM, OR OTHER.” WE CAN NOW CALCULATE AGREEMENT BETWEEN ANY TWO RATERS

DATA: RATERS 1-5 Item # 1 2 3 4 5 6 7 8 9 10 Rater 1 R R R R R R R R R S Rater 2 S R R O R R R R O S Rater 3 R R R O R R O O R S Rater 4 R R R R R R R R R S Rater 5 S R R O R O O R R S

CALCULATING K FOR RATERS 1 & 2 6 (Item #2,3, 4-8) 0 0 1 (Item #1) 1 (Item #10) 0 2 (Item #4 & 9) 0 0 RATER 1 R A T E R 2 R S O R S O 6 2 2 9 1 0 10 ADD ROWS & COLUMNS SINCE WE HAVE 10 ITEMS, THE TOTALS SHOULD ADD UP TO 10 FOR EACH

CALCULATING K COMPUTING SIMPLE AGREEMENT 6 (Item #2,3, 4-8) 0 0 1 (Item #1) 1 (Item #10) 0 2 (Item #4 & 9) 0 0 RATER 1 R A T E R 2 R S O R S O (6+1)/10 ADD VALUES OF DIAGONAL CELLS & DIVIDE BY TOTAL NUMBER OF CASES TO COMPUTE SIMPLE AGREEMENT OR “PR(A)”

THE EQUATION FOR K: RATERS 1 & 2 K = 7 - Pr(e) 10 -Pr(e) PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE RATERS 1 & 2 AGREED ON 70% OF THE CASES. BUT HOW MUCH OF THAT AGREEMENT WAS BY CHANCE? WE ALSO SUBSTITUTE 10 AS THE VALUE OF N

THE EQUATION FOR K: RATERS 1 & 2 K = 7 - Pr(e) 10 -Pr(e) WE CAN NOW ENTER THE VALUE OF PR(A) PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE RATERS 1 & 2 AGREED ON 70% OF THE CASES. BUT HOW MUCH OF THAT AGREEMENT WAS BY CHANCE? WE ALSO SUBSTITUTE 10 AS THE VALUE OF N

THE EQUATION FOR K: RATERS 1 & 2 K = 7 - Pr(e) 10 -Pr(e) WE CAN NOW ENTER THE VALUE OF PR(A) PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE RATERS 1 & 2 AGREED ON 70% OF THE CASES. BUT HOW MUCH OF THAT AGREEMENT WAS BY CHANCE? WE ALSO SUBSTITUTE 10 AS THE VALUE OF N

CALCULATING K EXPECTED FREQUENCY OF CHANCE AGREEMENT 6 (5.4) 0 0 1 (Item #1) 1 (.2) 0 2 (Item #4 & 9) 0 0 (0) RATER 1 R A T E R 2 R S O R S O FOR EACH DIAGONAL CELL, WE COMPUTE EXPECTED FREQUENCY OF CHANCE (EF) EF = ROW TOTAL X COL TOTAL TOTAL # OF CASES EF FOR “RELEVANT” = (6*9)/10 = 5.4

CALCULATING K EXPECTED FREQUENCY OF CHANCE AGREEMENT 6 (5.4) 0 0 1 (Item #1) 1 (.2) 0 2 (Item #4 & 9) 0 0 (0) RATER 1 R A T E R 2 R S O R S O ADD ALL VALUES OF(EF) TO GET “PR(E” PR(E)= 5.4 + .2 + 0 = 5.6

THE EQUATION FOR K: RATERS 1 & 2 K = 7 - 5.6 10 - 5.6 PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE K = .3182 THIS IS FAR BELOW THE ACCEPTABLE LEVEL OF AGREEMENT, WHICH SHOULD BE AT LEAST .70

THE EQUATION FOR K: RATERS 1 & 2 K = 7 - 5.6 10 - 5.6 WE CAN NOW ENTER THE VALUE OF PR(E) & COMPUTE KAPPA PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE K = .3182 THIS IS FAR BELOW THE ACCEPTABLE LEVEL OF AGREEMENT, WHICH SHOULD BE AT LEAST .70

THE EQUATION FOR K: RATERS 1 & 2 K = 7 - 5.6 10 - 5.6 WE CAN NOW ENTER THE VALUE OF PR(E) & COMPUTE KAPPA PR(A) = SIMPLE AGREEMENT AMONG RATERS PR(E) = LIKLIHOOD THAT AGREEMENT IS ATTRIBUTABLE TO CHANCE K = .3182 THIS IS FAR BELOW THE ACCEPTABLE LEVEL OF AGREEMENT, WHICH SHOULD BE AT LEAST .70

DATA: RATERS 1& 2 Item # 1 2 3 4 5 6 7 8 9 10 Rater 1 R R R R R R R R R S Rater 2 S R R O R R R R O S K = .3182 How can we improve? •LOOK FOR THE PATTERN IN DISAGREEMENTS CAN SOMETHING ABOUT THE CODING SCHEME BE CLARIFIED? •TOTAL # OF CASES IS LOW, COULD BE ALLOWING A FEW STICKY CASES TO DISPROPORTIONALLY INFLUENCE AGREEMENT

DATA: RATERS 1-5 Item # 1 2 3 4 5 6 7 8 9 10 Rater 1 R R R R R R R R R S Rater 2 S R R O R R R R O S Rater 3 R R R O R R O O R S Rater 4 R R R R R R R R R S Rater 5 S R R O R O O R R S CASE 1 SHOWS A PATTERN OF DISAGREEMENT BETWEEN “SPAM” & “RELEVANT,” WHILE CASE 4 SHOWS A PATTERN OF DISAGREEMENT BETWEEN RELEVANT & OTHER

EXERCISES & QUESTIONS Item # 1 2 3 4 5 6 7 8 9 10 Rater 1 R R R R R R R R R S Rater 2 S R R O R R R R O S Rater 3 R R R O R R O O R S Rater 4 R R R R R R R R R S Rater 5 S R R O R O O R R S 1. COMPUTE COHEN’S K FOR RATERS 3 & 5 2. REVISE THE CODING PROMPT TO ADDRESS PROBLEMS YOU DETECT; GIVE YOUR NEW CODING SCHEME TO TWO RATERS AND COMPUTE K TO SEE IF YOUR REVISIONS WORKED; BE PREPARED TO TALK ABOUT WHAT CHANGES YOU MADE 3. COHEN’S KAPPA IS SAID TO BE A VERY CONSERVATIVE MEASURE OF INTER-RATER RELIABILITY...CAN YOU EXPLAIN WHY? WHAT ARE ITS LIMITATIONS AS YOU SEE THEM?

DO I HAVE TO DO THIS BY HAND? NO, YOU COULD GO HERE: http://faculty.vassar.edu/lowry/kappa.html

Add a comment

Related presentations

Related pages

Kappa in SPSS to compute agreement for categorical data

Hier sollte eine Beschreibung angezeigt werden, diese Seite lässt dies jedoch nicht zu.
Read more

Cohen's Kappa Index of Inter-rater Reliability

Cohen's Kappa Index of Inter-rater ... inter-rater reliability when observing or otherwise coding qualitative/ categorical variables. Kappa is considered ...
Read more

Cohen’s kappa using SPSS - Laerd Statistics

Cohen's kappa using SPSS Statistics Introduction. In research designs where you have two or more raters (also known as "judges" or "observers") who are ...
Read more

Cohen’s Kappa | Real Statistics Using Excel

Cohen’s kappa is a measure of the agreement between two raters who determine which category a finite number of subjects belong to whereby agreement due ...
Read more

Interrater reliability: the kappa statistic | Biochemia Medica

Interrater reliability: the kappa ... for measurement of interrater reliability, Cohen’s kappa ... ulcer risk using the Braden scale and ...
Read more

Kappa - SPSS (part 1) - YouTube

I demonstrate how to perform and interpret a Kappa analysis (a.k.a., Cohen's Kappa) in SPSS. I also demonstrate the usefulness of Kappa in ...
Read more

WPE WebPsychEmpiricist

WPE Computing Kappa 2 Understanding and Computing Cohen’s Kappa: A Tutorial Why Percentage Agreement is Not a Good Measure of Interrater Reliability
Read more

Title stata.com kappa — Interrater agreement

kappa — Interrater agreement SyntaxMenuDescriptionOptions ... by using kapwgt; wgt() then specifies the name of the user-defined matrix. For instance, you
Read more