These are writing assignments for a graduate-level applied research methods statistics 1 class. Must be knowledgable in the system software stata. Please find attached.
Writing Assignment 7:
Generalizing Differences to the Population
Due date: Friday, March 5, 6:00 a.m.
Purpose: This assignment reinforces your understanding of confidence intervals and hypothesis tests.
Task: Write a one-page paper using the General Social Survey or one of the American Community Survey data sets (all available in the Stata Data Sets folder) to analyze the impact of at least three independent variables on one dependent variable, both in the sample and in the population.
Choose a dependent variable that interests you.
The 2018 GSS includes questions on legalization of marijuana (grass), capital punishment (cappun), same-sex marriage (marhomo), abortion (variables that start with ab), government spending preferences (variables that start with nat), confidence in institutions (variables that start with con), and on various types of sex outside marriage (premarsx, teensex, xmarsex).
It also asked whether respondents voted for Clinton or Trump in 2016 (pres16).
The GSS also has a number of personal and behavioral questions (e.g., about gun ownership (owngun), happiness (happy, hapmar), divorced (divorce), church attendance (attend), prayer (pray), Bible beliefs (bible), number of sex partners (partners), among many other possibilities).
Previous years of the GSS also include questions about drinking, smoking, and illegal drug use. (You could use the 1972-2018 GSS, then just use the most recent year that asks the question.)
The GSS also includes a couple of interval-level variables (years of education (educ) and score on a 10-word vocabulary test (wordsum)).
The federal employees and ACS data sets include earnings measures, hours worked, and supervisory status, among other possibilities.
Think about independent variables that might influence your dependent variable.
Obvious choices in the GSS include sex, race, age, education (educ), religion (use relig), religious attendance (attend), party identification (party7), and liberalism-conservatism (polviews).
Obvious choices in the ACS and OPM data sets include race, sex, age, education, and occupation. The federal employees data set also includes work experience (yos) and receipt of veterans’ preference. The ACS data set includes hours worked per week (uhrswk); marital, veteran, and citizenship status (marst, vetstat, citizen), English ability (speaking), field of study in college (degfield), and relationship type (reltype), which allows identifying people in same-sex and different-sex couples.
Be sure you understand the coding of all your variables.
If you are using GSS data, go to GSS Data Navigator (https://gssdataexplorer.norc.org/variables/vfilter), type in the variable name, then click on the variable name on the next screen to see the exact question wording. Include at least part of that wording, rather than just the variable label, in your paper.
Create appropriate versions of your variables.
Your dependent variable must be either a dummy or an interval-level variable.
You can either code a dummy dependent variable 0-1 or 0-100.
If you code it 0-1, use prtest and remember to multiply all proportions times 100 to turn them into percentages.
If you code it 0-100, you must use ttest.
If your dependent variable is ordinal level, recode it into a dummy variable.
If you use a dummy dependent variable, discuss differences in percentage points.
If you use an interval-level dependent variable, use ttest and discuss differences in the units of that variable (e.g., dollars, hours, words).
Your independent variables must be dummy variables to use prtest or ttest.
You can recode any variable into a dummy variable, but you should justify your decision of where to divide it.
Sometimes you can justify it by your policy issue (e.g., comparing whites to minorities, or government employees to those in the private sector, or part-time to full-time employees).
Sometimes you can justify it based on patterns in the data.
If you are working with an ordinal-level variable, you might use crosstabs to see if there are obvious breaks in the data.
If you are inspired to run a multivariate analysis (this is not required), you can make your second-level variable ordinal rather than dummy.
The command would be bysort indvar2: prtest depvar, by(indvar1)
If your dependent variable is interval level, use ttest instead.
Let’s say you wanted to know whether women were more likely than men with the same party identification to vote for Clinton.
First, recode your dependent variable into a 0-1 depvar.
I did: codebook pres16
gen Clinton = pres16
recode Clinton (1=1) (0=.) (2=0) (3/max=.)
Note that I made third-party candidates missing; I could have made them 0’s.
tab pres16 Clinton, missing
Next, determine whether women were more likely than men to vote for Clinton:
prtest Clinton, by(sex)
Next, determine whether women were more likely than men with the same party identification to vote for Clinton:
bysort party7: prtest Clinton, by(sex)
This might be more detail than I need (it gives me 7 gender comparisons). If so, I could recode party7 into three values (Democrat, independent, Republican).
Write approximately one page (double-spaced, 12-point font, in Word, plus tables) describing your findings.
In the first paragraph, briefly state what you are going to do (e.g., I am examining how these three independent variables affect this dependent variable).
You might want to briefly explain why this topic is worth investigating.
You might want to briefly present your expectations about differences across groups (e.g., I hypothesize that men will be more likely than women to …, because …).
Present the mean or percentage for the data set as a whole and use a 95% confidence interval to generalize to the population. (Think about what that population is – all Americans, all federal employees, etc.?)
Describe how that mean or percentage varies across the values of your independent variables.
Discuss differences in the population as well as the sample.
If a difference is not statistically significant, it is probably enough to say that and not present the 95% confidence interval, unless you want to emphasize how small the difference must be in the population.
Present at least two 95% confidence intervals. You might want to save these for your most important findings. Authors tend to talk more about whether difference are statistically significant rather than to present confidence intervals, so you might want to follow that pattern except for important differences. (You do want to show me that you know how to interpret a 95% confidence interval, of course, so do at least a couple.)
If you decide to do a multivariate analysis, look at how the main difference changes once you add your control variable (e.g., how does the gender difference in support for Clinton change when you compare people with the same level of liberalism or conservatism?). Think about both the sample and population differences. Remember that the width of confidence intervals depends importantly on sample size, and some of your sub-groups may be much smaller than your overall sample.
In your conclusion, briefly refer back to your hypotheses/expectations. Were you correct?
Include your prtest or ttest tables after your text but before your appendix.
Add an appendix that includes your Stata commands. This should be included within the same Word document as your writing assignment; do not submit extra Word or Stata files. Be sure to show that you checked your work when you recoded a variable.
Try to make this as easy and pleasant as possible for your supervisor to read. Use some logical order to present the variables. Use simple, active-voice language. Be grammatically correct.
Criteria: This is a low-stakes exercise, worth 2% of your final grade, so it is primarily an opportunity to practice your understanding of this module. Writing assignments are worth 10 points. To get the full 10 points,
Correctly code your variables.
Correctly interpret the sample percentage or mean and the 95% confidence interval for the full sample.
Correctly interpret your differences of sample means or proportions/percentages.
Correctly interpret at least two 95% confidence intervals.
When you do not include a 95% confidence interval, correctly state whether the difference is statistically significant or not. Briefly explain how you know whether it is statistically significant once.
Include all necessary tables.
Provide an appropriate appendix.
I will also provide feedback on your writing. Try to make this as easy and pleasant to read as possible. Use some logical order in your presentation. Use simple, active-voice language. Be grammatically correct.