advertisement

Regression1

75 %
25 %
advertisement
Information about Regression1
Entertainment

Published on January 13, 2008

Author: Candelora

Source: authorstream.com

advertisement

Linear and Logistic Regression:  Linear and Logistic Regression Where Are We Going Today?:  Where Are We Going Today? A Linear regression example Data how to obtain & manipulate it Cleaning the data - Splus/R Analysis Issues Interpretation How to present the results meaningfully Application Description forecasting/prediction Traps for the unwary Logistic regression Conclusions An example?:  An example? Insurance company claims satisfaction Background: :  Background: Top secret company - insurance Claims satisfaction 546 persons asked to rate aspects of service and then overall satisfaction/likelihood to recommend – 5 point scale We recommend 10 point scale - as more natural to respondents (1-10) Major ‘storm in a teacup’ Questionnaire – explanatory variables:  Questionnaire – explanatory variables Thinking firstly about the service you received from (top secret). I am going to read you some statements about this service and as I read you each statement, please give your opinion using a five-point scale where 1 is extremely dissatisfied and 5 extremely satisfied (read, rotate (start at x). write in (one digit) per statement) How satisfied or dissatisfied are you with:. ... everything being kept straightforward ... being kept in touch while the claim was being processed ... the general manner and attitude of the staff you dealt with ... your claim being dealt with promptly ... being treated fairly Questionnaire – dependent variables:  Questionnaire – dependent variables 4a Using the same five-point scale as previously where 1 is extremely dissatisfied and 5 extremely satisfied, how satisfied or dissatisfied were you with the overall service you received from (Top secret) ? write in (one digit)   4b And, using a five-point scale where 1 is extremely unlikely and 5 extremely likely, how likely or unlikely are you to recommend (Top secret) insurance to others? write in (one digit) Data:  Data Get DP to create an Excel file with all the data Make your self familiar with Excel formats Clean data Then start analysing the data Use data to describe each aspect of service:… the time taken to get an appointment with the loss adjustor the convenience of meeting with the loss adjustor the general manner and attitude of the loss adjustor you dealt with being kept in touch while your claim was processed... the time taken for repairs to be completed Data:  Data Some Code for cleaning / inspecting:  Some Code for cleaning / inspecting ### cleaning the Regress.eg[,-1][Regress.eg[,-1]==6]<-NA sum(is.na(Regress.eg)) [1] 49 mn<-apply(Regress.eg,2,mean,na.rm=T) ## replace with mean valuesdata- assumes MCAR for (i in 2:ncol(Regress.eg)){ id<-is.na(Regress.eg[,i]) Regress.eg[id,i]<-mn[i] } dimnames(Regress.eg) id<-c("Satisfaction","Straight","touch","manner","prompt","fairly","LTR") pairs.20x(Regress.eg[,id]) ## let's look at this with a bit of jitter Regress.eg2<-Regress.eg+ matrix(rnorm(nrow(Regress.eg)*ncol(Regress.eg),0,.1),ncol=ncol(Regress.eg)) pairs.20x(Regress.eg2[,id]) Matrix plot (with jitter):  Matrix plot (with jitter) More Code:  More Code ## let’s analyse this data apply(Regress.eg,2,mean) cor(Regress.eg) Regress.eg.coeff<-NULL for (i in 2:6){ Regress.eg.coeff<-c(Regress.eg.coeff, lm(Regress.eg[,7]~Regress.eg[,i])$coeff[2]) } Regress.eg.mlr<-lm(formula = Satisfaction ~ Straight + touch + manner + prompt + fairly, data = Regress.eg, na.action = na.exclude) Regress.eg.mlr$coeff Output Code:  Output Code > Regress.eg.mlr.coeff (Intercept) Straightforward kept.in.touch -0.08951399 0.3802814 0.1624232 manner.attitude prompt fairly 0.08986848 0.2199223 0.1567801 > cbind(apply(Regress.eg, 2, mean)[2:6], cor(Regress.eg)[ 2:6, 7], Regress.eg.coeff, Regress.eg.mlr.coeff[ -1]) Regress.eg.coeff Straightforward 4.329650 0.7982008 0.8010022 kept.in.touch 4.394834 0.7280380 0.7185019 manner.attitude 4.021359 0.6524997 0.5399704 prompt 4.544280 0.6774585 0.8653943 fairly 4.417440 0.7017079 0.6902109 Straightforward 0.38031150 kept.in.touch 0.16243157 manner.attitude 0.08982245 prompt 0.21992244 fairly 0.15680394 Some issues:  Some issues 5 point scale so definitely not normal Note that the data is very left skew Regression/correlation assumptions may not hold, except… CLT may kick in (546 obsn’s) Not probably the best - but still useful Challenge: can anyone transform y (satisfaction) so it looks vaguely normal If so how do we interpret these results? Any other solutions? Questions:  Questions With respect to overall satisfaction: What are the relationships, if any ? Which are the most important? What can I tell management? Can I predict future scores? Modelling is the answer… So what is modelling?:  Modelling is the answer… So what is modelling? Essence of Modelling:  Essence of Modelling Relationships Understanding causation Understanding the past Predicting the future A correlation does not imply Causation A relationship:  A relationship See Excel spreadsheet Interpretation:  Interpretation Correlation/R2/Straight line equation For one aspect of service (variable) at a time correlation measures strength of straight line relationship between -1 and 1 0 = no straight line relationship (slr) NB: may not imply no relationship, just not slr!! -1 perfect -ve slr, +1 perfect -ve slr R2 = corr. squared .7982012 = .6371 100* R2 = % VARIATION EXPLAINED BY SLR Interpretation...:  Interpretation... Correlation/R2 measure strength of slr not the actual relationship Regression equation measures size of slr relationship Satis = 0.8561 + 0.801x (straight forward score) e.g. if respondent gives a 3; we predict satis= .8561+ 0.801x ( 3 ) =3.3 Can use this to predict and set targets for KPI’s or key performance indicators Multiple linear regression:  Multiple linear regression SLR except more than one input ie: more than one input Correlation not applicable R2 same interpretation eg: 72% versus 64% for just Straightforward only as an input Can predict in same way - more inputs satis = -0.08951399+ 0.3802814 x Straightforward 0.1624232 x kept in touch 0.08986848 x manner/attitude 0.2199223 x prompt 0.1567801 x fairly Traps for young players:  Traps for young players All models are wrong, some are just more useful than others Don’t always assume it is a slr Multiple regression may not help you much more problems of multicollinearity ( MC) -redundancy of variables Correlation does not imply causality Predicting away from region you have analysed will probably be wrong!! Anyone thought of a solution(s) yet? Output Code:  Output Code > Regress.eg.mlr.coeff (Intercept) Straightforward kept.in.touch -0.08951399 0.3802814 0.1624232 manner.attitude prompt fairly 0.08986848 0.2199223 0.1567801 > cbind(apply(Regress.eg, 2, mean)[2:6], cor(Regress.eg)[ 2:6, 7], Regress.eg.coeff, Regress.eg.mlr.coeff[ -1]) Regress.eg.coeff Straightforward 4.329650 0.7982008 0.8010022 kept.in.touch 4.394834 0.7280380 0.7185019 manner.attitude 4.021359 0.6524997 0.5399704 prompt 4.544280 0.6774585 0.8653943 fairly 4.417440 0.7017079 0.6902109 Straightforward 0.38031150 kept.in.touch 0.16243157 manner.attitude 0.08982245 prompt 0.21992244 fairly 0.15680394 More code:  More code > summary(lm(formula = Satisfaction ~ Straightforward + kept.in.touch + manner.attitude + prompt + fairly, data = Regress.eg, na.action = na.exclude)) Call: lm(formula = Satisfaction ~ Straightforward + kept.in.touch + manner.attitude + prompt + fairly, data = Regress.eg, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -3.687 -0.08301 0.04314 0.133 1.924 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -0.0895 0.1369 -0.6540 0.5134 Straightforward 0.3803 0.0404 9.4127 0.0000 kept.in.touch 0.1624 0.0370 4.3937 0.0000 manner.attitude 0.0899 0.0270 3.3274 0.0009 prompt 0.2199 0.0415 5.3045 0.0000 fairly 0.1568 0.0345 4.5487 0.0000 Residual standard error: 0.5175 on 540 degrees of freedom Multiple R-Squared: 0.7217 F-statistic: 280 on 5 and 540 degrees of freedom, the p-value is 0 So what do we conclude?:  So what do we conclude? Note in this case all the MLR estimates are +ve Not always the case because of MC Using the KISS approach SLR is still useful but note that not much difference between these values So ‘stretch out’ differences by looking at Index= slr coeff. x corr. Coeff Presentation of results:  Presentation of results Invented the Importance Index individual regressions avoids problems that can occur with multi-collinearity adjusted by correlation allows for level of explanation produce performance by importance matrix Presentation of results:  Presentation of results Strengths Maintain or divert Secondary drivers Concern Interpretation of plot:  Interpretation of plot Four quadrants ‘Strengths’ – high performance /high importance – keep up the good work ‘Maintain’ – high performance/low importance – don’t let down your guard, maintain where possible ‘Secondary drivers’ – low performance / low importance - keep an eye on but not too important ‘Concern’ – low value/high importance – this should be the priority area of improvement Logistic Regression:  Logistic Regression Logistic regression:  Logistic regression Suppose we wish look at the proportion of people who give a ‘top box’ score for the satisfaction Here we have a variable that is binary. Let 0=a 1-4 score and 1 = ‘top box’ or 5 Natural regression is now logistic as we have binary response We are now in the wonderful world of generalised linear models Logistic regression:  Logistic regression With Linear regression our mean structure linear depends on the explanatory variables: m=XTb With logistic regression we have a non-linear response m=exp(XTb)/(1+ exp(XTb)) Note that this is a good way of getting around the ‘left skew ness’ of the data Let’s analyse this data again:  Let’s analyse this data again ## Logistic regression code Regress.eg.logistic<-glm(formula = 1*(Satisfaction==5)~ Straight + touch + manner + prompt + fairly, data = Regress.eg, na.action = na.exclude,family=binomial) Let’s analyse this data again…:  Let’s analyse this data again… > cbind(Regress.eg.coeff, Regress.eg.mlr.coeff[-1], Regress.eg.logistic$coeff[-1]) Straight 0.8010022 0.38028138 1.1928456 touch 0.7185019 0.16242318 0.6297301 manner 0.5399704 0.08986848 0.4143086 prompt 0.8653943 0.21992225 1.0494582 fairly 0.6902109 0.15678007 1.0760604 Note that ‘fairly’ comes up as being more important - ie: this is more high associated with top box figures. More details:  More details summary(glm(formula = 1 * (Satisfaction == 5) ~ Straight + touch + manner + prompt + fairly, data = Regress.eg, na.action = na.exclude, family = binomial)) Deviance Residuals: Min 1Q Median 3Q Max -2.252605 -0.3172882 0.4059497 0.4059497 2.825783 Coefficients: Value Std. Error t value (Intercept) -19.3572967 1.7395651 -11.127665 Straightforward 1.1928456 0.2674028 4.460857 touch 0.6297301 0.2404842 2.618593 Manner 0.4143086 0.1567237 2.643560 prompt 1.0494582 0.2813209 3.730467 fairly 1.0760604 0.2524477 4.262509 (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: 744.555 on 545 degrees of freedom Residual Deviance: 358.4669 on 540 degrees of freedom Number of Fisher Scoring Iterations: 5

Add a comment

Related presentations

Related pages

Regression – Wikipedia

Regression (von lateinisch regredi „umkehren, zurückgehen“; Adjektiv: regressiv) steht für: Regression (Geologie), Zurückweichen einer Küstenlinie
Read more

Duden | Re­gres­si­on | Rechtschreibung, Bedeutung ...

Definition, Rechtschreibung, Synonyme und Grammatik von 'Regression' auf Duden online nachschlagen. Wörterbuch der deutschen Sprache.
Read more

Regression (Psychoanalyse) – Wikipedia

Regression beschreibt innerhalb der psychoanalytischen Theorie einen psychischen Abwehrmechanismus. Mit dem Ziel der Trieb-Impuls-Abwehr oder der ...
Read more

Filmtipp: REGRESSION mit Emma Watson und Ethan Hawke

In Regression gerät Emma Watson in die Fänge einer Satans-Sekte. Ethan Hawke übernimmt den mysterösen Fall. Der neue Thriller vom Regisseur von THE ...
Read more

Regression - Film 2015 - FILMSTARTS.de

Regression, Ein Film von Alejandro Amenábar mit Emma Watson, Ethan Hawke. Übersicht und Filmkritik. Der Polizist Bruce Kenner (Ethan Hawke) ermittelt im ...
Read more

EMP

Band/Marke: Regression Filter entfernen; Neuheiten. Letzter Monat (2) Filter entfernen; Letzte 3 Monate (2) Filter entfernen; Preis. 5 bis 15 ...
Read more

Regression analysis - Wikipedia, the free encyclopedia

In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for ...
Read more

REGRESSION | Tobis

Mit dem Suspense-Thriller REGRESSION kehrt der spanische Regie-Star Alejandro Amenábar auf vertrautes Terrain zurück: Der Ausnahmeregisseur knüpft hier ...
Read more

Tumorregression - DocCheck Flexikon

Englisch: tumor regression. Definition. Als Tumorregression bezeichnet man die Rückentwicklung eines Tumors, z.B. als Folge einer onkologischen Therapie ...
Read more

Regression | Film 2015 | moviepilot.de

Alle Infos zum Film Regression (2015): In Alejandro Amenábars Thriller Regression beschuldigt Emma Watson ihren Vater eines Verbrechens,...
Read more