self reflection based on class materials

1 Models vs. Experts BUS143: Judgment and Decision Making Ye L i Agenda •Why is prediction important?

•What aremodels?

•How do expert judges compare to models?

•Can even “improper” models do pretty well? –Bootstrapping –Random models •What’s the problem with expert judgment?

•Do we need experts at all? Predictions in firms (and in life) •Job candidate selection. What else?

•Companies (and people) must make many predictions –Sales projections –Price of inputs (raw materials, labor, etc.) –Consumer tastes and trends –Political conditions (higher minimum wage? Tax?) –Weather conditions (effect on crops, shipping?) 2 X1 X2 X3 X4 A general framework for thinking about judgment: The Lens Model Accuracy Normative Side Descriptive Side Example: Guessing someone’s age X1 X2 X3 X4 Accuracy X1 = Presence/Absence wrinkles X2 = Hair Color X3 = Height X4 = Income Review of Linear Regression Problem: Predict outcome Yusing predictor variables X 1, …, X n Solution: Use least squares method to estimate linear regression (AKA, actuarial model) ; L  0 E  1 :1 E  2 :2 E … E  J : J •Goodness of model fit is measured by correlation, N – N2measures the proportion of the variance of Yexplained by the model. –The partial N 2= the additional explanatory power of a single variable. 3 Why? X1 X2 X3 X4 Achievement (accuracy) Multiple  Regression Human Judgment ???

??? Normative Side Descriptive Side How do linear models compare to experts? Dawes & Corrigan 1974 •Compared predictive validity (r) of judgments –Experts vs. Linear Model Predictions –The model and the judge use the same data •Example 1: 29 clinical psychologists judged whether patients are neurotic/psychotic (as diagnosed later) using 11 MMPI scores.

0 0.1 0.2 0.3 0.4 0.5 0.6 Psychosis Grades Grad Student Prediction Validity (r) ExpertsModel .14 .39 .28.46 partial r = .05 Example 2: Grades •80 students try to predict first year grades for graduate students using 10 predictor variable including undergrad GPA, GRE score, etc.

0 0.1 0.2 0.3 0.4 0.5 0.6 Psychosis Grades Grad Student Prediction Validity (r) ExpertsModel .14 .39 .28.46 .07 .48 .33.57 partial r = .01 4 Example 3: Graduate School Success •Faculty admissions committee predicting success in graduate school as measured by subsequent faculty ratings.

0 0.1 0.2 0.3 0.4 0.5 0.6 Psychosis Grades Grad School Prediction Validity (r) ExpertsModel .14 .39 .28.46 .07 .48 .33.57 .54 .19 Bootstrapping:

Building a Model of the Expert X1X1 X2X2 X3X3 X4X4 CaseActual  GPAExpert’s  GuessBootstrap Model‐ Predicted Guess Student 1 3.0 4.0 3.40 Student 2 2.5 2.5 2.43 Student 3 2.8 1.0 1.60 Student 4 2.4 1.0 1.57 Student 5 3.4 3.0 2.88 Student 6 3.2 2.5 2.55 ……… … Student n 3.5 1.5 2.12 What if true outcome values are hard to measure? •E.g., if nobody knows the true values or event hasn’t happened yet ; L  0 E  1 :1 E  2 :2 E … E  J : J What happens when we compare the expert to a modelof the expert?

1. Estimate a bootstrapped model of the expert’s own decisions 2. Compare these model predictions to expert –Results for three examples considered earlier: Study Expert Avg Bootstrap Regression MMPI‐based judgment of neurosis/psychosis  (N=29) 0.28 0.31 0.46 Students predicting first year grades using 10  predictor variables (N=80) 0.33 0.50 0.57 Faculty admissions committee predicting  success in graduate school as measured by  subsequent faculty ratings. 0.19 0.25 0.54 5 How consistent are experts? •Different experts apply different (sometimes wrong) cues and they do so inconsistently!

Radiologist ABCDE A B ‐0.02 C 0.37‐0.07 D 0.24 0.02 0.20 E ‐0.11 0.47‐0.01 0.46 0.70 0.60 0.83 0.73 0.92 Five experienced radiologists judged disease severity based on scans: What’s the problem with experts? •People are inconsistent, including experts –Lots of biases arising from heuristic usage (e.g., representativeness and availability) –Even worse when tired •Hard time distinguishing between valid and invalid cues –More information increases confidence, but no increase in accuracy (Oskamp 1965) •Experts often do not outperform novices (Goldberg 1965) –Training but not experience improves judgments (Garb 1989) •People use configuralrules that tend to be incorrect –E.g., 750 GMAT, 3.3 GPA vs. 680 GMAT, 3.7 GPA –Configural rules don’t explain much beyond simple cues •Feedback is slow, infrequent, or unclear –Weather forecasting vs. economic forecasting Objections to use of models •Technical: Research does not prove that there are no true experts (“The research only proves that you used poor judges”) •Psychological: –Availability, confirmation bias, self-serving biases •Ethical? 6 Using models to select applicants?

•Use models to select applicants for schools, jobs, scholarships, etc. ? –Humans are highly influenced by extraneous factors Symphony auditions using screens raised from 5%36% women Jamal vs. Greg (Bertrand & Mullainathan 2004) White names got 50% more call-backs •Interviews have surprisingly little predictive validity (McDaniel et al 1994) –Structured vs. unstructured interviews –Even structured interviews have minimal additional validity beyond test of general mental ability (Schmidt & Hunter 1998) Why bother with experts at all?

•Models work because experts can (usually) pick the right predictors and code them the right way!

•Model do not include rare cues (“broken leg cues”) •Criteria other than accurate forecasts –Determining the appropriateness of using the forecast (“sanity check”) When doesexpert intuition excel?

•Contexts for which System 1 was specifically adapted –Facial recognition –Captchas •When there’s no objective measure –Interviews where main criterion is personal “fit” (e.g., personal trainer, assistant) –Emotions –Humor 7 Whenshould we use models? •“Common cases”: large amounts of base rate data (and assuming constant environment) –e.g., credit ratings •Cases with excessively distracting data –e.g., interviews where criterion is objective task performance •Cases where neither models nor expert prediction works very well –Illusion of control Compromise between models and experts?

•Have experts add what they can, but help them with what they cannot –Build a regression that includes the expert’s opinions Y = Model + Expert’s Judgment for rare events –Anchor the expert on the model “Here is our best estimate, tell me how you might change it.” Uses anchoring to correct for expert overconfidence Ye ’s K e y s 24. People use cues inconsistently and don’t weigh them appropriately.

25. Models (even improper ones) are consistent, and therefore generally make better prediction than experts.

26. Experts are still needed to generate the model and evaluate the output.