********************************************** * * * Tanya Byker * * Economics 211 Fall 2024 * * Lab #10 * * Tables and Figures * * * ********************************************** snapshot erase _all ** Set you working directory and open the data: cd filepath use filepath.file.dta, clear *** Descriptives *** * A trend graph by [education] category gen inlf=empstat<3 gen edcat=. replace edcat=1 if educ<6 replace edcat=2 if educ==6 replace edcat=3 if educ>6 & educ<10 replace edcat=4 if educ==10 replace edcat=5 if educ==11 label define education 1 "Less Than HS" 2 "High School" 3 "Some College" 4 "Bachelors" 5 "Graduate", replace label val edcat education * Including weights tab year edcat if sex==1 [w=perwt], sum(inlf) mean noobs tab year edcat if sex==2 [w=perwt], sum(inlf) mean noobs * procedure: * Copy > Table * Paste into excel * Insert Line graph * Copy/Paste excel graph into Document * A (small) summary stats table gen LTHS=educ<6 gen HS=educ==6 gen SC=educ>6 & educ<10 gen BA=educ==10 gen GRAD=educ==11 sum LTHS HS SC BA GRAD inlf [w=perwt] sum LTHS HS SC BA GRAD inlf [w=perwt] if year==1970 sum LTHS HS SC BA GRAD inlf [w=perwt] if year==2017 * another strategy for making a summary table -- learning some more advanced code: collapse! * collapsing will literally collapse your data, so if you want to be able to get it back, you need to "save" it in memory snapshot save collapse (mean) LTHS HS SC BA GRAD inlf [w=perwt], by(year) snapshot restore 1 * you could also do a two way collapse (by year and sex) to produce summary stats by gender snapshot save collapse (mean) LTHS HS SC BA GRAD inlf [w=perwt], by(year sex) snapshot restore 1 * Descriptive Table considerations * ABSOLUTELY NO STATA OUTPUT COPIED INTO YOUR PAPER * Don't need 8+ decimal places * Don't need to list the sample size over and over * A scatter plot: * examples in: * Lab 7: Philips Curve (labeling points) * Lab 9: Income and Democracy * Export Stata graphs as .png to easily insert into document graph export scatter.png, replace *** Regression Results *** gen fulltime_year= 0 replace fulltime_year=1 if uhrswork>=40 & wkswork2>=4 replace fulltime_year=1 if hrswork2>=5 & year==1970 & wkswork2>=4 replace fulltime_year=0 if hrswork2<5 & year==1970 & wkswork2>=4 gen ln_wage=ln(incwage) gen male=sex==1 gen age2=age^2 tab edcat, gen(ed) ** LOOPS are SUPER useful! forval i=1/10 { di `i' } forval i=1970(10)2010 { di `i' } foreach i of numlist 1970 1980 1990 2000 2010 2017 { di `i' } *ssc install outreg2 local r replace foreach y of numlist 1970 1980 1990 2000 2010 2017 { reg ln_wage male if fulltime_year==1 & year==`y' [w=perwt], robust outreg2 using test1, excel noaster `r' ctitle(`y') estimates store _`y'_n local r append reg ln_wage male ed2-ed5 age age2 if fulltime_year==1 & year==`y' [w=perwt], robust outreg2 using test1, excel noaster `r' ctitle(`y') estimates store _`y' } ** Getting Excel Tables into Document * Click view > Unclick Gridlines * Copy * In Word, paste special > Microsoft Excel Binary Worksheet Object *** Checklist of things in the notes of a Figure or Table: * What is going on in this figure or table * Source of data * Age range? years? * what is in parentheses (standard errors?) * what do the stars mean ** Other (cool) visualizations of regression results ** continuing with the regression above label var male "gender gap coefficient" ** coefplot is a relatively new user written command that makes visualizations from regression results ** more info here: http://repec.sowi.unibe.ch/stata/coefplot/getting-started.html ** you may need to install it: *ssc install coefplot coefplot _1970 _1980 _1990 _2000 _2010 _2017, keep(male) vertical #delimit ; coefplot _1970 _1980 _1990 _2000 _2010 _2017, keep(male) vertical legend(col(1)) title("Evolution of the US Gender Gap") graphregion(color(white)) bgcolor(white); #delimit cr * the notes to this figure should explain what variables are controlled for in the regression * and note that there are 95% confidence intervals shown ** For this example let's code some variables for race and ethnicity (this is code from Lab 4): **** Coding categorical variable for race and ethnicity using Census variables. * NOTE: you need to use both the race and the hispan variables * gen white_nh=0 replace white_nh=1 if race==1 & hispan==0 gen black_nh=0 replace black_nh=1 if race==2 & hispan==0 gen other_nh=0 replace other_nh=1 if race>2 & hispan==0 gen hispanic=0 replace hispanic=1 if hispan>0 ** we should check that the proportions of the groups sum to one sum white_nh black_nh other_nh hispanic gen racecat=. replace racecat=1 if white_nh==1 replace racecat=2 if black_nh==1 replace racecat=3 if other_nh==1 replace racecat=4 if hispanic==1 label define racecat 1 "white_nh" 2 "black_nh" 3 "other_nh" 4 "hispanic", replace label values racecat racecat gen college=edcat>=4 ** the following command give the predicted line for each race group reg ln_wage i.college##i.racecat if year==2017 margins college, over(racecat) marginsplot, name(predicted_lines, replace) title("Predicted Ln(wages) by BA status and Race") subtitle(US ACS 2017) graph export predicted_lines.png, replace reg ln_wage i.college##i.racecat if year==2017 ** the following command gives us the marginal effect of college for each race group - it adds beta_1 to beta_interaction for each race margins, dydx(college) over(racecat) marginsplot, recast(scatter) horizontal title("Marginal effects (returns to college) by Race") subtitle(US ACS 2017) xtitle(return to a college education) ytitle("") name(marginal_effects, replace) graph export returns.png, replace