These are writing assignments for a graduate-level applied research methods statistics 1 class. Must be knowledgable in the system software stata. Please find attached.

Writing Assignment 5:

Multivariate Analysis

Due date: Friday, February 19, 6:00 a.m.

Purpose: This assignment applies your understanding of using contingency tables, bar charts, and/or line graphs to perform bivariate and multivariate analyses. It reinforces concepts and coding skills that help prepare you for the first paper assignment.

Knowledge: This exercise

  • Reinforces your understanding of contingency tables, bar charts, and/or line graphs

  • Tests your ability to run analyses using one and two independent variables

Skills: This exercise also

  • Uses Stata to create and label variables and to run bar charts or line graphs

  • Develops your ability to write about statistics

Choose one of these two options

Option 1: Write a one-page paper on factors that influenced earnings or managerial status among full-time workers in the U.S. in 2018. Use the “ACS 2018, fulltime” data set, which is part of the American Community Survey, which the U.S. Census Bureau collects annually. This subset is a random sample of employees who worked at least 30 hours per week in 2018. Use bar charts as well as (or in place of) tables of numbers.

  • Your dependent variable should be either incwage or MANAGER.

    • Incwage is annual earnings from wages, an interval-level variable measured in dollars.

    • MANAGER is a 0-100 dummy variable, coded 100 for those who are managers and 0 for everyone else.

  • Report the mean earnings or the percentage who were managers for the data set as a whole.

  • A wide variety of independent variables could affect earnings. Obvious examples are educational attainment (you should probably use educ5), sex, race, citizenship, sector of employment (classwkrd), age and hours worked (you would need to convert age and uhrswork into ordinal-level variables), English ability (speakeng), and occupation (probably use occcat or occgrp).

  • Look at the impact of at least three independent variables on average earnings or the probability of being a manager.

    • Include at least one table of means (from tab indvar, sum(incwage)) or percentages (using tab indvar, sum(MANAGER)).

    • Use at least one bar chart that reports the mean of your dependent variable for each value of your independent variable.

  • Choose one key independent variable. Does the impact of that variable remain about the same when you add a second independent variable?

    • For instance, does the impact of working for government or the private sector (classwkrd) shrink when you make comparisons among people with the same level of education (educ5)?

    • Does the impact of citizenship change when you add English ability (speakeng) or race/ethnicity (race) as a second independent variable?

    • Do race or gender gaps in earnings or managerial authority change when you add age or education or average week work-hours (uhrswork) or sector of employment or occupation as a second independent variable?

  • Create a table of means using tab indvar2 indvar2, sum(incwage/MANAGER) nofreq nost.

    • This table will probably be too complicated to include in your paper (unless you copy it into Excel and simplify it).

    • You might just include it in your appendix and pull a few numbers for your text.

  • Create at least one bar chart with two independent variables.

    • The basic structure of the command should be:

      • graph bar depvar, over(keyindvar) over(indvar2) asyvar

    • Since I want you to focus on how the impact of your key independent variable changes when you add a second independent variable, you should put your key independent variable in the first over() and the control (second independent) variable in the second over().

    • You might want to use hbar instead of bar to make your bar chart look better.

    • Add titles and edit labels as necessary to make your graph as attractive as possible.

    • If you are feeling adventurous, you can add a second bar chart controlling for a different second independent variable – or even add a third variable to the bar chart (it would go in a third over().

Write approximately one page (double-spaced, 12-point font, in Word), plus tables and graphs.

  • In the first paragraph, briefly state what you are going to do (e.g., I am examining the impact of these three independent variables on this dependent variable and then I’m going to see whether the race/gender/citizenship/education gaps shrink when I compare people who have the same (second independent/control variable).

    • You might want to briefly present your expectations (e.g., I hypothesize that earnings will be higher for men than women and for private-sector than for government workers; I also expect pay to rise with education and hours worked; finally, I expect gender differences to shrink when I compare men and women with the same level of education).

  • Present the mean or percentage for the data set as a whole.

  • Describe how the mean or percentage varies across the values of your three independent variables.

    • For dummy and nominal-level independent variables, how does the mean or percentage differ across groups?

    • For ordinal-level independent variables, does the mean or percentage steadily increase or decrease as the independent variable increases?

    • Refer to your tables and bar charts in describing your findings and include a few numbers.

      • You might want to number your tables and bar chart to make it easier to refer to them.

  • Describe what happens to the relationship between your key independent variable and your dependent variable when you add a second independent (control) variable. Do the differences shrink, grow, or stay about the same?

  • Save all your commands in a do-file and copy it into your Word document at the end of the paper as an appendix.

  • Insert your tables and bar chart in the text or add them at the end of the paper but before the appendix.

Criteria: This is a low-stakes exercise, worth approximately 1% of your final grade, so it is primarily an opportunity to practice your understanding of this module. Writing assignments are worth 10 points. To get the full 10 points,

  • Create at least one table of means.

  • Create at least one bivariate bar chart (that is, one dependent and one independent variable).

  • Create at least one multivariate bar chart (that is, one dependent and two independent variables).

  • Present the overall mean or percentage.

  • Interpret the relationships shown in the tables and bar charts.

  • Provide an appropriate appendix.

I will also provide feedback on your writing.

Option 2: It is the fourth day of your internship at the congressional campaign. Your supervisor wants a new analysis of public opinion, but this time she wants you to look at how public opinion has changed, using the GSS 1972-2018 data set. She also wants to know whether some groups favor the policy more than others and whether support is changing at different rates for different groups.

  • You can continue with a previous topic or choose a different one. Some possibilities

    • legalization of marijuana (grass),

    • capital punishment (cappun),

    • gun control (gunlaw),

    • same-sex marriage (marhomo),

    • abortion (any of the variables that start with ab),

    • birth control (pillok)

    • government spending preferences (the variables that start with nat),

    • confidence in various institutions (any variable that starts with con),

    • the acceptability of police hitting people (variables starting with pol),

    • political tolerance (whether members of unpopular/dangerous groups (atheists, communists, homosexuals, militarists, racists) should be allowed to teach college, give speeches, and have their books in the public library (the variables start with col, lib, and spk)]

    • various types of sex outside marriage (premarsx, teensex, xmarsex, homosex).

  • Search GSS Data Explorer (https://gssdataexplorer.norc.org/ ) to search for variables and question wording.

    • Click on SEARCH VARIABLES

    • Put a variable name (e.g., cappun) or search term (e.g., death penalty) in the Keyword box.

    • Click on variable name to find exact question wording, to see what years the question was asked, and to see the absolute frequency who gave each response in each year. (The question wording is all you really need, but if you see that a question was only asked a couple of times, it’s probably not a good choice.)

  • Create a new version of the opinion variable.

    • Dichotomize it so that the value you are interested in is coded 100 and the other values are coded 0.

    • Any values that were missing should stay missing. (If the variable is coded “.” in the GSS, the respondent was probably not asked the question.)

    • Decide what you want to do with “Don’t Know” or “No Answer” responses. You can make them missing values or code them 0. (In my research on same-sex marriage, I decided that it was the percentage of the population (not the percentage of those with an opinion) that mattered. I coded SSM supporters as 100 and both opponents and those without an opinion as 0.)

  • Using bysort year: egen pctvar = mean(DEPVAR), calculate the percentage who said Yes in each year.

  • Using twoway (line pctvar year), create a line graph showing how opinion on this issue changed over time. Use appropriate titles and labels.

    • You can cut, paste, and edit from the lecture do-file.

  • Choose one or two independent variables that you think might affect opinion on this topic. Obvious choices include sex, race, age, education (educ), religion (use relig), religious attendance (attend), party identification (partyid), and liberalism-conservatism (polviews).

    • Make sure that you understand how each variable is coded. Some coding is different than in the 2018 General Social Survey.

    • If the independent variable has more than 3 or 4 values, you may want to recode it to something simpler (e.g., convert polviews into a variable with the values Liberal, Moderate, and Conservative).

  • Using bysort year indvar: egen pctvar = mean(DEPVAR), calculate the percentage of each group who said Yes in each year.

  • Create a new variable for each value of your independent variable.

    • Again, copy, paste, and edit from the lecture do-file. For instance, I used gen Men = cappct if sex==1 . Do something similar.

  • Using twoway (line pctvar1 year) (line pctvar2 year), etc., create a line graph showing how opinion on this issue changed for each value in your independent variable.

    • If sex is your independent variable, you will have two lines – one for men and one for women. The pctvars will be Men and Women.

    • If a recoded version of polviews is your independent variable, you will have three lines – one each for Liberals, Moderates, and Conservatives.

  • Describe your findings. What percentage of GSS respondents have that opinion? Has the level of support changed over time? Does opinion vary across groups? Have all the groups been changing their opinions at reasonably similar rates?

  • Write as much as you need to, but I am still thinking approximately one page (double-spaced, 12-point font, in Word), not including your graphs.