larsen jsm2003

60 %
40 %
Information about larsen jsm2003

Published on October 29, 2007

Author: Arley33



Comparison of Alternative Latent Class Clusterings of Survey Data:  Comparison of Alternative Latent Class Clusterings of Survey Data Michael D. Larsen University of Chicago/ Iowa State University Outline:  Outline Survey and variables Latent class models Comparing clusterings Some comparisons Conclusions and future plans Survey:  Survey 1997 Survey of Doctoral Recipients NSF survey every 2 years 1 of 3 surveys in SESTAT database Respondents PhDs 1990-1996 Physical (n=2216) and biological (n=1019) sciences, engineering (n=516) Work in higher educational institutions Variables:  Variables Demographics: Sex, Race, Ethnicity, Age, etc. %F: biology (49%), physical (33%), eng. (23%) Several sets on career preparation Limitations on career path job searches Work activities Job search resources (which used?) Adequacy of PhD program career preparation Assorted other questions (e.g., postdoc?) One set of variables example:  One set of variables example Adequacy of career preparation Very adequate vs. Somewhat or not adeq. 11 areas (211 table) Biology, 3 significant differences, F vs. M Communication (F>M) z= 2.73 Ethics (F>M) z= 2.48 Computer (M>F) z= -2.58 Why cluster?:  Why cluster? Interest in clusters themselves Are there identifiable groups? Are clusters stable over time? Are the clusters related to demographic subpopulations? How do outcomes vary across clusters? Latent Class Models:  Latent Class Models G latent classes (subpopulations) K categorical variables define contingency table, each person in one cell of table Observed pattern of responses in table is mixture of patterns from latent classes. Response probability on each variable (conditionally) independent within each class (prob’s differ across classes). Latent Class Models, cont.:  Latent Class Models, cont. P(response pattern) = sum over classes of [ P(class) P(response pattern | class) ] EM algorithm (Dempster, Laird, Rubin 1977) Compute P(class | response pattern). Comparing clusterings:  Comparing clusterings Different sets of variables will group respondents differently. Cross tabulations Adjusted Rand Index (ARI) Rand Index = # of pairs in same cluster ARI = (Rand – Exp.)/(Max –Exp.) -- assumes hyper geometric distribution Calibrating the ARI (or other):  Calibrating the ARI (or other) Simulation Generate 1000 samples from the hyper geometric distribution, which corresponds to null of no association Compute ARI for 1000 samples Report # of samples >= ARIobserved A comparison:  A comparison Biology, Adequacy of Career Preparation Communication, ARI = 0.002, tail = 0.015 Ethics, ARI = 0.039, tail = 0.039 Computer, ARI = 0.002, tail = 0.021 4 latent classes (interesting patterns) ARI value is lower, tail area is larger Comments:  Comments ARI values are not large (not near 1) for tables with large n Simulated values are similar to P-values from standard tests Small ARI values can be significant in the way that small log odds (near 0) can be significant for large n Latent classes fit better than simple classifications, but ARI doesn’t increase. More on comment 4.:  More on comment 4. Two classes (females, males) and CI. vs. Four latent classes (based on BCI) and CI. Latter fits (much) better. ARI not larger than largest on individual variables. Future plans:  Future plans 1. Repeat on next waves (1999, 2001) 2. Additional comparison methods: Diversity measures Slight modification of ARI Machine Learning, Stats, Discovery, 2003, Marina Meila, U of Washington 3. Missing data (DK, RF, Missing) References:  References Larsen, Statistics in Transition, 2003 Larsen, submitted to “Retaining Women in Early Academic SMET Careers,” 2002, under revision Hubert and Arabie, 1985, J. of Classification NSF, EIA-0089930, ITWF Contact Information:  Contact Information Mike Larsen, U of Chicago, Statistics Email for contact at Iowa State University, Statistics

Add a comment

Related presentations

Related pages

Research Issues in Swedish Road Traffic Surveys

RESEARCH ISSUES IN SWEDISH ROAD TRAFFIC SURVEYS ... Larsson, 2003). According to the model, the ob-served traffic flow t f is a Poisson-distributed
Read more

Wheels and tuning — Super cars » Blog Archive ...

montieturnbull@ lucwarem@ surfbeauty@ RobinsonTxHoney@ esafe@ legolas37@ gian2001@ estagdon@ ppyufgytdfgfd@ jen.rye@ galexmark20000@ littlebigtoe1@ ieeek@
Read more

JSM 2003 (50m)

... Nilsson 1988 Linköpings Allmänna SS 29.76 10 Nathalie Larsson 1985 Malmö Kappsimningsklubb 29.82 11 Maria Hjärne 1986 Linköpings Allmänna ...
Read more

13 Moa Larsson Tärna IK Fjällvinden 171 02:21,65 00:11,47 14 Kristina Larsson IFK Umeå 151 02:22,98 00:12,80
Read more

JuniorSM och RiksM U25

20 Johan Larsson Gullabo (HS) 230 76 306 Shooter Sida 1 av 1 2003-09-14. JuniorSM och RiksM U25
Read more

Resultat Totalt - Officiell JSM (25m), 2003.11.27-30

Erik Larsson : 1987 : Uddevalla Simförening : 29.60 : 34 : Magnus Nilsson : 1984 : Malmö Kappsimningsklubb : 29.65 : 35 : Oscar Thomasson : 1986 :
Read more

Larsson Axelsson Löfström Gustavsson Ingemarsdotter Roos Hansson Westlund Mattsson Herrman Rikardsson Östlund Ideborg Seppas Sundbaum Arvola Fahlksson ...
Read more