Borders with st()

Question

Borders with st()

opened this issue 4 years ago · comments

title: "sectoral wage gap"
author: "Merfeld"
date: "5/18/2020"
output: pdf_document

rm(list = ls())

library(AER)
library(plm)
library(stargazer)
library(haven)
library(plm)
library(tidyverse)
library(broom)
library(knitr)
library(pastecs)
library(ggplot2)
library(estimatr)
library(lfe)  # panel data
library(vtable)
library(ggpubr)
library(kableExtra)

setwd("/Users/Josh/Dropbox/Papers/Sectoral Wage Gap") # sets the working directory
# Tibble!
data <- as_tibble(read_dta("Clean/Employment.dta"))
# Make some of these into factors
data <- mutate(data, monthly_asfac = as.factor(monthly))
data$pidfe <- as.factor(data$pidfe)
data$nonfarm <- as.factor(data$nonfarm)
data$month <- as.factor(data$month)
data <- mutate(data, pidfe_monthly = interaction(pidfe, monthly_asfac), hhfe_monthly = interaction(hhid, monthly_asfac), log_inv_unemp = log(inv_unemp_days + 1))
data$both_as <- as.factor(data$both)
# I want to create a variable that is the cumulative sum of how many months the individuals has been working in each job type
data <- data %>%
         arrange(pidfe, monthly) %>%
         group_by(pidfe, nonfarm) %>%
         mutate(nonfarm_count = cumsum(!is.na(hourly_wage)))
# And this variable will be the number of times the individual has worked BOTH in the same month
data <- data %>%
         arrange(pidfe, monthly) %>%
         group_by(pidfe, nonfarm) %>%
         mutate(nonfarm_both_count = cumsum(!is.na(hourly_wage) & both==1))
# Want it to be missing
data$nonfarm_both_count[data$both==0] <- NA

#Let's get rid of the code in the output
knitr::opts_chunk$set(echo = FALSE)


# Some different labels just for the output
data <- data %>% mutate(nonfarmfac = NA)
data$nonfarmfac[data$nonfarm=="0"] <- "Agriculture"
data$nonfarmfac[data$nonfarm=="1"] <- "Non-farm"

labs <- data.frame(nonfarmfac = "Job type", job = "Worked job type in month", wage_income = "Total income (log Rs)", 
                   wage_income_cash = "Total income in cash (log Rs)", wage_income_inkind = "Total income inkind (log Rs)",
                   daily_wage = "Daily wage (log Rs)", hourly_wage = "Hourly wage (log Rs)", wage_days = "Total work days (log)", 
                   wage_hours = "Total work hours (log)", age = "Age", educ = "Education", female = "Female (yes = 1)")

st(data, 
   vars = c('job', 'wage_income', 'wage_income_cash', 'wage_income_inkind', 'daily_wage', 'hourly_wage', 'wage_days', 
            'wage_hours', 'age', 'educ', 'female'),
   labels = labs,
   summ = c('mean(x)', 'sd(x)', 'notNA(x)'),
   summ.names = c("Mean", "SD", "N"),
   group = 'nonfarmfac',
   digits = 3,
   title = 'All Observations by Non-Farm Status') %>% kable_styling()

st(filter(data, both==1), 
   vars = c('wage_income', 'wage_income_cash', 'wage_income_inkind', 'daily_wage', 'hourly_wage', 'wage_days', 
            'wage_hours'),
   labels = labs,
   summ = c('mean(x)', 'sd(x)', 'notNA(x)'),
   summ.names = c("Mean", "SD", "N"),
   group = 'nonfarmfac',
   digits = 3,
   title = "Observations with Both Types of Employment in Same Month")

NickCH-K · Answer 1 · Thu Jun 04 2020 13:13:39 GMT+0800 (China Standard Time)

Fixed with 7ae301c

Grant McDermott · Answer 2 · Thu Jun 04 2020 14:23:55 GMT+0800 (China Standard Time)

@josh-merfeld

Nick's latest commit should solve the issue for you. (I just tried the dev version with your code and the tables look good to me.)

But since you're here, I'll just quickly say there are a bunch of things in your code that you probably want to avoid.

For example, you're loading the plm package twice (and then, later, lfe which is also a dedicated panel regression pacakge). In general, try only to load packages you actually need, lest you trigger an unexpected namespace conflict. Other examples:

I'd strongly recommend avoiding rm(list=ls) and setwd(). The former is redundant in an R Markdown doc (which creates a separate session for knitting anyway) and the latter is dangerous from a reproducibility standpoint. You might not be able to get around this for your current project but the here package provides an elegant alternative using only relative paths. See this post.
data <- as_tibble(read_dta("Clean/Employment.dta")): The read_dta() function already coerces to tibble so you shouldn't need to convert it again. (Even better if you can use read_dta(here(...)) as per the above.)
I'd normally recommend putting any knitr::opts_chunk options at the very top of your document in its own chunk. But in this case it looks like you just want to turn of the code echo off for that chunk, right? If so, it's better to specify this in the chunk header, e.g. {r chunk_name, echo = FALSE}.
I'm not sure why you're creating interactions manually. But if you just want to run them in a regression then it's better to use one of the formula shorthands (e.g. x1*x2). See here.
The latest dplyr release (you'll need to update the tidyverse to get it) will make help you to cut down on writing. E.g. You could start your conversion to factors section as:

 data <-
  data %>%
  mutate(monthly_asfac = as.factor(monthly)) %>%
  mutate(across(c(pidfe, nonfarm, month), as.factor)) %>%
  ...

Borders with st()

title: "sectoral wage gap" author: "Merfeld" date: "5/18/2020" output: pdf_document

title: "sectoral wage gap"
author: "Merfeld"
date: "5/18/2020"
output: pdf_document