Published on July 1, 2009
The Art and Science of Test Development—Part D Develop norm (standardization) sample plan Kevin S. McGrew, PhD. Educational Psychologist Research Director Woodcock-Muñoz Foundation The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
The Art and Science of Test Development The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence. Part A: Planning, development frameworks & domain/test specification blueprints Part B: Test and Item Development Part C: Use of Rasch Technology Part D: Develop norm (standardization) plan Part E: Calculate norms and derived scores Part F: Psychometric/technical and statistical analysis: Internal Part G: Psychometric/technical and statistical analysis: External The current module is designated by red bold font lettering
Develop norm (standardization) sampling plan The goal: A nationally representative sample from which to develop test norms
Implementation of sampling plan in practical test development framework Gather the norm (standardization) data and use for Rasch scaling, norm development, psychometric, and statistical analyses
Implementation of sampling plan in Conceptual Psychometric Validity Framework Gather the norm (standardization) data and use for Rasch scaling, norm development, psychometric, and statistical analyses
Develop norm (standardization) sample specifications for country/nation where test is intended to be used The goal: A nationally representative sample from which to develop test norms
Sampling plan should be based on best available source of national statistics (select examples below)
Three-stage sampling plan strategy used for WJ batteries in United States Stage 1: Sampling of communities The goal: Stage 2: Sampling of schools A nationally representative sample from which to develop test norms Stage 3: Sampling of subjects • School-age subjects – random sampling in grades • Preschool, university and adult subjects - quotas selected as per selected US Census variables
Goal is a nationally representative sample upon which to base the norms for the measures (tests, clusters)
No sampling plan is perfect: “Tweak” final norm data via subject weighting A few tips/cautions •Oversample small groups and then down-weight •No amount of creative weighting can “fix” a poorly executed sampling plan •Continuous sample analysis: Make sure to frequently monitor data as it is collected to see how close the “fit” between the sampling plan and actual subject characteristics. Important so you can make adjustments during the data collection (before it is too late)
WJ III Three-stage sampling plan strategy: Stage 1 Sampling of communities – communities sampled according to 10 different community characteristics
Important question to think about $$ $$$$$$$$ Are students who are categorized as “low SES” (at parent/family level) from $$ Community likely to be having the same community/life/family/educational experiences as “low SES” students (at parent/family level) from Community $$$$$$$$ ?
Illustration of potential threat to representative sample when community SES is ignored and only family/parent SES is used in sampling plan Level of abilities of selected students Cmnty C High Community SES classification Cmnty B Middle Cmnty A Low Family/Parent SES classification Low Middle High
Community SES is one of the most important, yet most frequently overlooked variables in a test standardization sampling plan
Sample research/norm demographic and data file variable coding sheet: WJ III Tips/Cautions • Spend considerable time designing this data collection form (be OCD) • Think of all future analyses when deciding what to include/code • If in doubt about a variable…include it (you typically can’t go back later to get information) • Create coding system with quality data entry procedures in mind • You can’t analyze what you haven’t gathered and coded from the beginning
Development and implementation of sampling plan is probably the most critical phase of test development • GIGO – garbage ingarbage out Retain as much control over all aspects of data collection and data entry as possible Quality of data is only as good as your pool of examiners • Recruiting, training, supervising, and retaining good examiners requires major attention and is very important • Examiners need supervision • Terminate poor examiners as soon as you can • Central office should review every single piece of information on submitted test records, especially at the beginning of an examiners testing (you don’t want them practicing errors) • Run special Rasch “person fit” reports to flag test records that look suspicious (and see if they come from certain examiners) • Best examiners are not necessarily psychologists or doctoral students. (cont. next slide)
Spend considerable time developing the test record and variable coding sheets Run “continuous sample analysis reports” to monitor sampling plan adherence or drift – so you can make changes quickly before it is too late There is no single-purpose system, or collection of published software programs, that can handle the detailed and fluid work of data entry, editing and monitoring. Be prepared to develop and pay for custom software. Dedicated professional data-entry software is a must. SPSS, Excel, etc. won’t cut it. • Double data entry verification a must • Only have a few well trained and diligent individuals enter the data Don’t succumb to samples of convenience Oversample small groups – and then down weight Seriously consider three-stage sampling plan that controls for community SES
Central Europe Rep B sample Central Europe Rep A sample US norming US norming Poor sampling plan and data collection Central Europe Rep B sample cannot hide from during data analysis: Select example Rasch item (W-difficulties) for a WJ III test in US norming sample and two neighboring (and very similar) Central European Republics Something is wrong with data from Reb B. Reb A is similar to US data; Rep A and Rep B, which are similar, are not even Central Europe Rep A sample similar in plots
US norming Central Europe Rep A sample Poor sampling plan and data collection cannot hide from good data analysis: Select example Distribution of W-abilities. Something is wrong with data from Reb B. Rep B data is negatively skewed Central Europe Rep B sample
Planned “incomplete” (missing) data collection Matrix sampling (Partial) Multiple and EM-based data imputation
Recent example: Australian WJ IIII standardization partial matrix sampling plan Test 1 A B C D E X Total n 1. Verbal Comprehension C C C C C C 1346 2. Visual-Auditory Learning C C C C C C 1336 3. Spatial Relations C C C C C C 1375 4. Sound Blending C C C C C C 1382 Yellow column (1) represents 5. Concept Formation C C C C C C 1300 initial “core” 6. Visual Matching C C C C C C 1364 battery prior to 7. Numbers Reversed C C C C C C 1325 implementation of “core + 8. Incomplete Words C M 627 matrix” battery 9. Auditory Working Memory C M 548 Mixed batteries plan 11. General Information M M 389 12. Retrieval Fluency M M 388 13. Picture Recognition M M 390 14. Auditory Attention M M 388 15. Analysis-Synthesis M M 362 16. Decision Speed M M 376 17. Memory for Words M M 364 1. Letter-Word Identification C C C C C C 1323 2. Reading Fluency C M 588 5. Calculation C C C C C C 1210 6. Math Fluency C M 588 7. Spelling C M 557 8. Writing Fluency M M 352 9. Passage Comprehension C C C C C C 1203 10. Applied Problems C C C C C C 964 11. Writing Samples M M 342 Totals 450 102 117 127 108 120 372 1396
End of Part D Additional steps in test development process will be presented in subsequent modules as they are developed
The fourth and fifth in the series Art and Science of Applied Test Development is now available. The fourth module (Part D--Develop norm [standardization ...
The Art and Science of Applied Test ... Use of Rasch TechnologyPart D: Develop norm (standardization) ... Applied Psych Test Design: Part D--Develop ...
View 58056 Test Design posts, presentations, experts, and more. ... Test Cases (6,215 members) Test Automation (5,703 members) System Testing ...
Share Applied Psych Test Design: Part A ... Use of Rasch TechnologyPart D: Develop norm (standardization) ... Part D--Develop norm (standardization) plan.
... then they must enroll in a Part D plan ... Applied Psych Test Design: Part D--Develop norm ... Understanding Your Medicare Options From Part A To Part D.
Cattell-Horn-Cattell (CHC) Intelligence Theory Timeline Project ... Part D: Develop norm (standardization) plan; ... Applied Psych Test Design Part G: ...
Design; More Topics. Search; Home; Documents; Beep Test Norm; Download. of 1 ... Download Beep Test Norm. Transcript ...
Special Reports and Publications ... The Applied Test Development Series is a sequence of on-line ... Part D: Develop norm (standardization) plan; Part E: ...