Information about Applied Psych Test Design: Part G: Psychometric/technical statistical...

Published on July 8, 2009

Author: iapsych

Source: slideshare.net

The Art and Science of Applied Test Development. This is the fifth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.

“In god we trust….all others must show data” (unknown source) Test authors and publishers have standards-based responsibility to provide supporting psychometric technical information re: tests and battery Typically in the form of a series of technical chapters in manual or a separate technical manual

Calculate psychometric/measurement statistics for technical manual/chapters With external measures Use Joint Test Standards as a guide

External evidence is Theoretical Domain - CHC g focused on relations between test battery variables (measures or latent constructs) and other external (outside of Gf Gv Glr Gs battery) constructs, measures, or criteria Gc Gsm Ga Measurement or empirical domain

External Stage of Test Development Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network) Method and concepts • Group differentiation • Structural equation modeling • Correlation of observed measures with other measures • Multitrait-Multimethod matrix Characteristics of • Focal constructs vary in theorized ways with other constructs strong test validity • Measures of the constructs differentiate existing groups that program are known to differ on the constructs • Measures of focal constructs correlate with other validated measures of the same constructs • Theory-based hypotheses are supported, particularly when compared to rival hypotheses

External Stage of Test Development Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network) Method and concepts • Correlation of observed measures with other measures Characteristics of • Measures of focal constructs correlate with other validated strong test validity measures of the same constructs program

Concurrent external validity example: WJ III GIA clusters correlations with other IQ battery full scale scores Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples

Concurrent external validity example: WJ III Achievement (reading, math, writing) cluster correlations with measures from other (external) ach. batteries Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples

Concurrent external validity example: Comparative predictive validity (of achievement) Comparisons of correlations (across reading, math, written language, and total achievement domains) of the average WJ III GIA and Predicted Achievement score options and full scale scores from other (external) major intelligence batteries Other Battery WJ III WJ III WJ III Total (Full Pred. GIA- GIA- Scale) Score Ach. Extended Standard DAS .41 -- .52 .47 WPPSI-R .37 -- .52 .47 WISC-III .50 .68 .67 .63 WAIS-III .39 .56 -- .56 KAIT .53 .56 -- .56 Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples

External Stage of Test Development Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network) Method and concepts • Correlation of observed measures with other measures Characteristics of • Focal constructs vary in theorized ways with other constructs strong test validity • Measures of focal constructs correlate with other validated program measures of the same constructs

• Focal constructs vary in theorized ways with other constructs • Measures correlate with other validated measures of the same constructs (select illustrative examples—concurrent external validity correlations) ?

• Focal constructs vary in theorized ways with other constructs • Measures correlate with other validated measures of the same constructs (select illustrative example— exploratory factor analysis of select WJ III and WISC-III tests)

• Focal constructs vary in theorized ways with other constructs • Measures correlate with other validated measures of the same constructs (select illustrative example—WJ III Block Rotation [Gv-Vz] correlation with WISC-III tests in grade 3-5 sample) WJ III BLKROT WISC-III Tests Information 0.27 Coding 0.08 Similarities 0.29 Picture Arangment 0.14 Arithmetic 0.09 Block Design 0.38 Vocabulary 0.23 Object Assembly 0.31 Comprehension 0.15 Symbol Search 0.23 Digit Symbol 0.08 Note: Absolute magnitude of correlations artificially low due to sample range restriction. Important observation is relative magnitude of correlations

• Focal constructs vary in theorized ways with other constructs • Measures correlate with other validated measures of the same constructs (select illustrative example— confirmatory factor analysis of select WJ III and WISC-III tests) Phelps et al. (2005) WISC-III/WJ III cross- battery (joint) CFA

Phelps et al. (2005) WISC-III/WJ III cross-battery (joint) CFA

RDGFLZ LWIDNTZ PSGCMPZ r42 r44 r43 .24 .36 .26 .50 .64 r22 r1 .19 WMATRSS KAUDCSS .77 .69 .69 KLOGSTSS r23 r2 .76 KDEFSS KMYSCSS r24 .21 Grw r3 Gf .67 KDOUBMSS .30 r25 .53 ANLSYNZ .50 r4 CONFRMZ r26 VRBCMPZ f8 .67 .85 f9 .69 .70 r27 r5 Gc WPICCSS WCOMPSS .52 .83 .89 .47 WPICASS r28 r6 WVOCSS f1 .70 .73 .80 r29 WBDSS r7 WINFOSS g f7 r30 f10 .66 .64 SPARELZ r8 r31 WSIMSS .59 .72 .60 BLKROTZ Gv .71 r9 WARITHSS Gq r32 .90 VISCLOZ .32 .24 .51 r10 PICRECZ r33 MEMSENZ .47 .21 .36 DECSPDZ r34 r11 MEMWRDZ .80 .73Joint WJ III/WAIS-III/WMS-III/KAIT CFA r35 .45 CRSOUTZ r12 .55 Gregg/Hoy College LD/NLD (n=200) Sample—Analysis by K. McGrew AWKMEMZ r36 .38 Gsm .54 .69 VISMAT2Z r13 .80 (This is NOT the complete model..only portion that NUMREVZ r37 .66 RETFLUZ includes Gv factor information) r14 f2 .35 WLNSSS Gs r38 .67 RPCNAMZ .45

External Stage of Test Development Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network) Method and concepts • Structural equation modeling Characteristics of • Theory-based hypotheses are supported, particularly when strong test validity compared to rival hypotheses program

Structural equation modeling external validity evidence example

Structural equation modeling external validity evidence example

Structural equation modeling external validity evidence example f1 r17 Visual Matching .82 r20 .78Mem for Sentences r18 .64 .44 Mem .69 Decision Speed Gs Span r21 .73 Mem for Words r19 .62 Cross Out .46 f9 r22 .62 Aud Working Mem Work .67 Mem r23 Numbers Reversed f2 r1 Block Rotation .40 r2 .44 f5 r11 .89 Spatial Relations Verbal Comp Gv 3 .35 .9 r3 Picture Recognition .78 Oral Comp r12 Gc .8 f3 .79 5 .87 r4 Memory for Names .52 General Information r13 r5 .48 Retrieval Fluency .93 g f6 .07 r14 r6 .69 Glr .94 .63 Analysis-Synthesis DR: Vis-Aud Lrng 8 .74 .83 .7 f7 Concept Formation r15 Gf .84 r7 Vis-Aud Learning .63 f4 r16 r8 Numerical Reas Sound Blending .64 .19 r9 .49 .36 Incomplete Words Ga r10 .45 Sound Patterns .27 .96 Word Attack r24 WA Ages 6-8

External Stage of Test Development Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network) Method and concepts • Group differentiation Characteristics of • Measures of the constructs differentiate existing groups that strong test validity are known to differ on the constructs program

Group differentiation external validity evidence example: LD vs Non-LD university samples

Group differentiation external validity evidence example: Normal/Gifted/LD/MR samples

Group differentiation external validity evidence example— discriminant function analysis (Normal/Gifted/LD/MR samples)

Group differentiation external validity evidence example— discriminant function analysis classification accuracy (Normal/Gifted/LD/MR samples—grade 3-4)

Group differentiation external validity evidence example (variety of “clinical disorder groups”) (continued on next slide)

Group differentiation external validity evidence example (cont.) variety of “clinical disorder groups”)

(Note: The following information is almost identical to that presented in Part F—Internal psychometric/statistical analysis) Lack of rigor and quality control in all prior/earlier stages will “rattle through the data” and rear its ugly head when performing the final statistical analysis, especially multivariate validity analyses (SEM, DF, multiple regression, EFA, CFA) Shorts cuts in prior stages will “bite you in in the ____” as you attempt to perform final statistical analysis Data screening, data screening, data screening!!!!……. prior to do performing final statistical analysis • Compute extensive descriptive statistical analysis for all variables (e.g., histograms, scatterplots, box-whisker plots, etc.) • More than means and SD’s. Also calculate median, skew, kurtosis, n-tiles, etc. Deliberately planned and sophisticated “front end” data collection short-cuts (e.g., matrix sampling) introduce an extreme level of “back end” complexity to routine statistical/psychometric analysis Know your limits, level of expertise, and skills. Even those with extensive test development experience often need access to trusted measurement/statistical consultants (cont. next slide)

Published statistics/psychometric information needs to be based on final publication length tests • Often need to use test-length correction formula’s (e.g., KR-21) for test reliabilities • Correlations between short /and or long norming versions of a test and other tests, that differ in test length (number of items) from publication length test, may need special adjustments/corrections. Back up, back up, back up!!!!!!!!!! Don’t let a dead hard drive or computer destroy your work and progress. Do it constantly. Build redundancy into your files and people skill sets Sad fact: Majority of test users do NOT pay attention to the fancy and special psychometric/statistical analysis you report in technical chapters or manuals. Be prepared for post-publication education via other methods. Post-manual publication technical reports of special/sophisticated analyses are good when publication time-line pressures dictate making difficult decisions.

Most test developers are stuck in a methodological rut. There is much that can be learned about the internal and external validity of a test battery using lesser-used statistical methods. • Multidimensional scaling (MDS); cluster analysis, CART (classification and regression tree analysis), MARS (multivariate applied regression splines) Use of curve smoothing procedures to better estimate population parameters from statistical analyses across age groups. Multiple group CFA (planned incomplete data) reference variable validity designs and methods (Jack McArdle).

End of Part G Additional steps in test development process will be presented in subsequent modules as they are developed

Intelligent Insights on Intelligence Theories and Tests

Read more

... performance on later tests (e.g., ... applied to other people or ... design and statistical analysis that are ...

Read more

... and application of factor analysis, a statistical method developed ... operations including test design and ... Applied behavior analysis;

Read more

Applied Test Development Series ... Part G: Psychometric/technical and statistical analysis: External;

Read more

Institute for Applied Psychometrics ... Part F: Psychometric/technical and statistical analysis: Internal; Part G: ...

Read more

Rejecting or disproving the null hypothesis is done using statistical tests that ... statistical methods were applied to ... (statistical analysis ...

Read more

## Add a comment