Stata与R语言等价命令

发布时间:2022-10-13 阅读 951

Stata连享会   主页 || 视频 || 推文 || 知乎 || Bilibili 站

温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。

New! lianxh 命令发布了:
随时搜索推文、Stata 资源。安装:
. ssc install lianxh
详情参见帮助文件 (有惊喜):
. help lianxh
连享会新命令:cnssc, ihelp, rdbalance, gitee, installpkg

课程详情 https://gitee.com/lianxh/Course

课程主页 https://gitee.com/lianxh/Course

⛳ Stata 系列推文:

PDF下载 - 推文合集

作者:任建辉

编者按:本文参考自 RStudio Cheat Sheets,特此致谢!


目录


1. 引言

本文总结了常见的 Stata 计量经济分析命令,并提供它们在 R 中的等效命令。更多关于导入/清理数据、变量转换和其他基本命令可参考 Hanck 等 (2019) 的《Econometrics with R》,以及 Wickham 和 Grolemund (2017) 的《R for Data Science》。

本文示例数据选自 wooldridge 《计量经济学导论:现代观点》。其中 Stata 数据集可通过命令 bcuse 调用,R 数据集可通过安装 wooldridge 包来获取。当然,数据集也可以通过点击「Wooldridge data sets」下载。

除了特别说明外,所有 R 命令都源自基础 R 包。在下文中,我们都是分两部分代码段来展开,前一段为 Stata 代码块,后一段为等效的 R 代码块。

2. 安装

在 Stata 中,通过 log 文件来储存命令和输出结果。在 R 中,使用由谢益辉编写的 Rmarkdown 语法,通过创建 R-markdown 文件来捕获代码和输出结果。

安装 outreg2 包。需要注意的是,Stata 安装包不需要每次使用时调用。但是,在 R 中每次使用相应的包,需要输入 library(packages name) 来调用。

ssc install outreg2 
install.packages("wooldridge") 
# install wooldridge package
data(package = "wooldridge") 
# list datasets in wooldridge package
load (wage1)                 
# load wage1 dataset into session
?wage1                        
# consult documentation on wage1 dataset

3. 基本绘图

基础绘图部分主要演示了直方图、散点图、散点图加拟合线以及分组箱线图,示例数据为 wage1.dta

. bcuse wage1, clear
. hist(wage)                                  // histogram of wage 
. scatter (wage edu)                          // scatter plot of wage by educ
. twoway (scatter wage educ) (lfit wage educ) // scatter plot with fitted line
. graph box wage, by(nonwhite)                // boxplot of wage by nonwhite
library(wooldridge)
# 其余部分 R 代码块的运行,都是提前加载 wooldridge 包,不再进一步重复。
hist(wage1$wage)                                 
# histogram of wage
plot(y = wage$1wage, x = wage1$educ)            
abline(lm(wage1$wage~wage1$educ),col=“red”)    
# add fitted line to scatterplot
boxplot(wage1$wage~wage1$nonwhite)               
# boxplot of wage by nonwhite

4. 汇总数据

Stata 的劣势是每次只能使用一个数据集,不过 R 可以同时调入多个数据集,因此必须在每个函数调用中指定。R 没有等同于 Stata 中 codebook 的命令。在 R 中,安装 AER 包时,会自动安装其他有用的附属包:carlmtestsandwich

. browse               // open browser for loaded data
. describe             // describe structure of loaded data
. summarize            // display summary statistics for all variables in dataset
. list in 1/6          // display first 6 rows
. tabulate educ        // tabulate educ variable frequencies
. tabulate educ female // cross-tabulate educ and female frequencies
View(wage1)                                               
# open browser for loaded wage1 data
str(wage1)                                                
# describe structure of wage1 data
summary(wage1)                                            
# display summary statistics for wage1 variables
head(wage1)                                               
# display first 6 (default) rows data
tail(wage1)                                               
# display last 6 rows
table(wage1$educ)                                         
# tabulate educ frequencies
table(“yrs_edu” = wage1$educ, “female” =wage1$female) 
# tabulate educ frequencies name table columns

5. 生成或编辑变量

本部分涉及生成新变量、计算变量的均值、选取部分变量、生成虚拟变量等相关内容

. gen exper2 = exper^2                 // create exper squared variable
. egen wage_avg = mean(wage)           // create average wage variable
. drop tenursq                         // drop tenursq variable
. keep wage educ exper nonwhite numdep // keep selected variables
. tab numdep, gen(numdep)              // create dummy variables for numdep
. recode exper (1/20 = 1 "1 to 20 years") (21/40 = 2 "21 to 40 years") ///
>     (41/max = 3 "41+ years"),gen(experlvl) // recode exper and gen new variable
wage1$exper2 <- wage1$exper^2                                      
# create exper squared variable
wage1$wage_avg <- mean(wage1$wage)                                  
# create average wage variable
wage1$tenursq <- NULL                                               
# drop tenursq
wage1 <- wage1[ , c(“wage”, “educ”,“exper”, “nonwhite”)]    
# keep selected variables
wage1 <-fastDummies::dummy_cols(wage1,select_columns = “numdep”)  
# create dummy variables for numdep, use {fastDummies} package
wage1$experlvl <- 3                                                 
# recode exper
wage1$experlvl[wage1$exper < 41] <- 2
wage1$experlvl[wage1$exper < 21] <- 1

6. 估计模型 (横截面数据)

6.1 OLS

. reg wage educ                // simple regression of wage by educ
. reg wage educ if nonwhite==1 // add condition with if statement
. reg wage educ exper, robust  // multiple regression using HC1 robust standard errors
. reg wage educ exper,cluster(numdep) // use clustered standard errors
mod1 <- lm(wage ~ educ, data =wage1)                                              
# simple regression of wage by educ, store results in mod1
summary(mod1)                                                                      
# print summary of mod1 results
mod2 <- lm(wage ~ educ, data =wage1[wage1$nonwhite==1, ])                          
# add condition with if statement
mod3 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1, se_type= “stata”) 
# multiple regressionwith HC1 robust standard errors, use {estimatr} package
mod4 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1,clusters = numdep)   
# use clustered standard errors.

6.2 MLE (Logit/Probit/Tobit)

. bcuse mroz, clear  
. logit inlf nwifeinc educ         // estimate logistic regression
. probit inlf nwifeinc educ        // estimate logistic regression
. tobit hours nwifeinc educ, ll(0) // estimate tobit regression
mod_log <- glm(inlf~nwifeinc + educ+ family=binomial(link="logit"),data=mroz)     
# estimate logistic regression
mod_pro <- glm(inlf~nwifeinc + educ+ family=binomial(link="probit"),data=mroz)   
# estimate logistic regression
mod_tob <- AER::tobit(hours ~ nwifeinc + educ, left = 0, data = mroz)               
# estimate tobit regression

7. 统计检验与诊断

本部分主要涉及异方差检验、遗漏变量检验和组间 t 检验。

. bcuse wage1, clear
. reg lwage educ exper     // estimation used for examples below
. estat hettest // Breusch-Pagan/Cook-Weisberg test for heteroskedasticity
. estat ovtest             // Ramsey RESET test for omitted variables
. ttest wage, by(nonwhite) //compare means of same variable between groups
mod <-lm(lwage ~ educ exper, data =wage1)        
# estimate used for examples below
lmtest::bptest(mod)                               
# Breusch-Pagan/Cook-Weisberg test for 
# heteroskedasticity using the {lmtest} package
lmtest::resettest(mod)                            
# Ramsey RESET test
t.test(wage ~ nonwhite, data =wage1)              
# independent group t-test

8. 交互项,类别/连续变量

在 Stata 中,通常使用特殊运算符指代变量为连续变量 (c.) 或类别变量 (i.)。 同样, 运算符表示不同的方式来返回它们之间的交互变量。在这里,我们展示了这些运算符的常见用法及其 R 等效处理方式。

. reg lwage i.numdep          // treat numdep as a factor variable
. reg lwage c.educ#c.exper    // return interaction term only
. reg lwage c.educ##c.exper   // return full factorial specification
. reg lwage c.exper##i.numdep // return full, interact continuous and categorical
lm(lwage ~ as.factor(numdep), data= wage1)         
# treat numdep as factor
lm(lwage ~ educ:exper, data =wage1)                 
# return interaction termonly
lm(lwage ~ educ*exper, data =wage1)                 
# return full factorial specification
lm(wage ~ exper*as.factor(numdep),data = wage1)     
# return full, interact continuous and categorical

9. 估计模型 (面板数据)

9.1 面板回归

. bcuse murder, clear 
. xtset id year // set id as entities (panel) and year as time variable     
. xtdescribe             // describe pattern of xt data
. xtsum                  // summarize xt data
. xtreg mrdrte unem, fe  // fixed effects regressi
plm::is.pbalanced(murder$id,murder$year)
# check panel balancewith {plm} package
modfe <- plm::plm(mrdrte ~ unem,index = c("id", "year"),model ="within", data = murder)
# estimatefixed effects (within) model
summary(modfe)
# display results

9.2 工具变量

. bcuse mroz, clear 
. ivreg lwage (educ = fatheduc), first // show results of firststage regression
. ivreg lwage (educ = fatheduc)        // show results of 2SLS directly
modiv <-AER::ivreg(lwage ~ educ |fatheduc, data = mroz)              
# estimate 2SLS with {AER} package
summary(modiv, diagnostics = TRUE)                                    
# get diagnostic tests of IV andendogenous variable

10. 后续估计

在 Stata 中,后续估计必须紧接着回归估计,而 R 是面向对象编程,不存在这样的困扰。本部分主要涉及回归结果输出和边际效应展示。

. reg lwage educ exper##exper  // estimation used for following postestimation commands
. estimates store mod1         // stores inmemory the last estimation resultsto mod1
. margins                      // get average predictive
. margins, dydx(*)             // get average marginal effects for all variables
. marginsplot                  // plot marginal effects
. margins, dydx(exper)         // average marginal effects of experience
. margins, at(exper=(1(10)40)) // average predictive margins over exper range at 10-year increments
. est restore mod1             // loads mod1 back into working memory
. estimates table mod1         // display table with stored estimation results
mod1 <- lm(lwage ~ educ + exper + I(exper^2), data = wage1)
# Note: in R, mathematical expressions inside a formula call must be isolated with I()
margins::prediction(mod1)                                     
# get average predictive margins with {margins} package
m1 <- margins::margins(mod1)                                  
# get average marginal effects for all variables
plot(m)                                                       
# plot marginal effects
summary(m)                                                    
# get detailed summary of marginal effects
margins::prediction(mod1, at = list(exper = seq(1,51,10)))    
# predictive margins over exper range at 10-year increments
stargazer::stargazer(mod1, mod2, type = “text”)             
# use {stargazer} package, with type=text to display results within R. 
# Note: type=  also can be changed for LaTex and HTML output.

11. 相关推文

Note:产生如下推文列表的 Stata 命令为:
lianxh R语言, m
安装最新版 lianxh 命令:
ssc install lianxh, replace

相关课程

免费公开课

最新课程-直播课

专题 嘉宾 直播/回看视频
最新专题 文本分析、机器学习、效率专题、生存分析等
研究设计 连玉君 我的特斯拉-实证研究设计-幻灯片-
面板模型 连玉君 动态面板模型-幻灯片-
面板模型 连玉君 直击面板数据模型 [免费公开课,2小时]
  • Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。

课程主页

课程主页

关于我们

  • Stata连享会 由中山大学连玉君老师团队创办,定期分享实证分析经验。
  • 连享会-主页知乎专栏,700+ 推文,实证分析不再抓狂。直播间 有很多视频课程,可以随时观看。
  • 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标,输入简要关键词,以便快速呈现历史推文,获取工具软件和数据下载。常见关键词:课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法

连享会小程序:扫一扫,看推文,看视频……

扫码加入连享会微信群,提问交流更方便

✏ 连享会-常见问题解答:
https://gitee.com/lianxh/Course/wikis

New! lianxhsongbl 命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh