Literature review, Questionnaire preparation, Data collection, Data analysis in SPSS, Report writing

QUANT DATA ANALYSIS QUANT DATA ANALYSIS Measurement Inference Description 2 We analyze quant data using statistics Statistics are not reality, truth, or “proof” of something!

Statistics are MODELS: a representation of reality A way to learn ROUGHLY how customers’ behavior looks/feels is not is a MODEL of The data from questionnaires measures responses on variables, possibly at different levels Railway services in NL Company Speed Speed (km/h) Arriva Spurt DB Slow 120 Intercity NS Medium 150 IC Direct NS Hispeed Medium 200 ICE DB Fast 300 NOMINAL ORDINAL SCALE This flowchart helps determine level of measurement NOMINAL LEVEL SCALE LEVEL ORDINAL LEVEL Are the distances between the successive values equal? Yes No Is it possible to order the different values? Yes No Finding level of measurement for each variable is step 0 in analysis Analyze variables first one at a time, by describing the response to each What was the average response on this variable? How spread out were the responses on this variable? Days per week checking online news average in this sample = 3.5 A rough MODEL for how often your population checks online news Your choice of correct descriptive statistics depends on level of measurement Variable is…… Central tendency (average) Spread Nominal Mode Frequency Ordinal Median Range Scale Mean Standard deviation* What was the average response on this variable?

How spread out were the responses on this variable? *Standard deviation represents the differences between the mean and each response An average (mean) of 3.5 days per week checking online news does not imply that everyone checked 3.5 days per week. If the responses are distributed normally (“bell curve”), then 2/3 of responses are between mean +/ - std deviation 0 7 (max) 3.5 (average) Analyze variables first one at a time, by describing the response to each What was the average response on this variable? How spread out were the responses on this variable? Days per week checking online news average in this sample = 3.5 std. dev. in this sample = 0.8 So a larger standard deviation (on the same scale) = responses were more spread out Lloret de Mar Ibiza Descriptive statistics describe the sample and model the population average and spread Based on assumptions of a probability sample and normal -distributed (“bell curve”) population How about a model of the effect of on variable on another? Then we could (tentatively) answer our research questions online behavior personality For two scale or ordinal variables, we use models called correlation Extraversion +/- correlation indicates the direction of the relationship Extraversion POSITIVE NEGATIVE Age The further a correlation is from 0, the stronger the relationship Age 0.9 -0.9 0.1 -0.4 -0.1 0 Some rules of thumb for correlation strength Correlation Positive Negative Small 0 to 0.3 -0.3 to 0 Medium 0.3 to 0.5 -0.5 to -0.3 Strong 0.5 to 1 -1 to -0.5 When researching people, “strong” correlations are rare For variables with 2 categories like gender (nominal), code categories as 0 and 1 to use correlation 3 tweets/week Male=0 Here one expects a positive correlation between gender tweets/week. As gender increases (1 instead of 0), tweets/week also increase. Conclusion: women tweet more Two categories coded as 0 and 1 work like a scale variable in correlation Female =1 2 tweets/week 3 tweets/week 7 tweets/week So, choose model based on level of measurement Variable 1 Variable 2 Appropriate model Nominal (2 categories coded 0 and 1) Scale!

Correlation (Pearson) Ordinal Correlation (Spearman) Scale Correlation (Pearson) Furthermore we need to know the chances that our model reflects a real effect in the population Suppose you have a tour operator with 10 customers, 5 male and 5 female. To decide on marketing through a new women- oriented online portal, you need to know if there is a difference in satisfaction between the men & women Satisfaction from a sample 4 8 4 6 mean=6 mean=5 Does this difference reflect a difference in the population (all 10 customers) or is it a coincidence? Could be a real difference or could be a coincidence of the sample you took 4 8 4 6 8 6 4 5 6 4 mean=6 mean=5 4 8 4 6 8 6 4 8 6 6 mean=6 mean=6 Sig. = % chance that an effect in the sample is coincidence and not from an effect in the population 4 8 4 6 8 6 4 8 6 6 mean=6 mean=6 4 8 4 6 mean=6 mean=5 Again, sig. = chance that you found an effect in the sample… …even though there is NO effect in your population When interpreting output, first look if sig.< 0.05 95% probability that effect did not happen by chance FIRST look at the sig: Is there an effect? ONLY if sig. <0.05 conclude, YES THEN look at correlation / means: How strong and in which direction is the effect?

Example correlation (1) Number of facebook friends Extraversion Pearson correlation Sig (2-tailed) N .619** .001 763 The significance level is lower than .05 (i.e. .001). This means we conclude that extraversion is significantly related to the number facebook friends people had. Example correlation (2) The significance level is higher than .05 (i.e. .583). This means we conclude that calories eaten/day is not significantly related to the number of trips people made in 2015. # of trips taken in 2015 Calories eaten/day Pearson correlation Sig (2-tailed) N .003 .583 763