Stata连享会 主页 || 视频 || 推文 || 知乎 || Bilibili 站
温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。
New!
lianxh
命令发布了:
随时搜索推文、Stata 资源。安装:
. ssc install lianxh
详情参见帮助文件 (有惊喜):
. help lianxh
连享会新命令:cnssc
,ihelp
,rdbalance
,gitee
,installpkg
⛳ Stata 系列推文:
作者:任建辉
编者按:本文参考自 RStudio Cheat Sheets,特此致谢!
目录
本文总结了常见的 Stata 计量经济分析命令,并提供它们在 R 中的等效命令。更多关于导入/清理数据、变量转换和其他基本命令可参考 Hanck 等 (2019) 的《Econometrics with R》,以及 Wickham 和 Grolemund (2017) 的《R for Data Science》。
本文示例数据选自 wooldridge 《计量经济学导论:现代观点》。其中 Stata 数据集可通过命令 bcuse
调用,R 数据集可通过安装 wooldridge
包来获取。当然,数据集也可以通过点击「Wooldridge data sets」下载。
除了特别说明外,所有 R 命令都源自基础 R 包。在下文中,我们都是分两部分代码段来展开,前一段为 Stata 代码块,后一段为等效的 R 代码块。
在 Stata 中,通过 log 文件来储存命令和输出结果。在 R 中,使用由谢益辉编写的 Rmarkdown 语法,通过创建 R-markdown 文件来捕获代码和输出结果。
安装 outreg2
包。需要注意的是,Stata 安装包不需要每次使用时调用。但是,在 R 中每次使用相应的包,需要输入 library(packages name)
来调用。
ssc install outreg2
install.packages("wooldridge")
# install wooldridge package
data(package = "wooldridge")
# list datasets in wooldridge package
load (wage1)
# load wage1 dataset into session
?wage1
# consult documentation on wage1 dataset
基础绘图部分主要演示了直方图、散点图、散点图加拟合线以及分组箱线图,示例数据为 wage1.dta
。
. bcuse wage1, clear
. hist(wage) // histogram of wage
. scatter (wage edu) // scatter plot of wage by educ
. twoway (scatter wage educ) (lfit wage educ) // scatter plot with fitted line
. graph box wage, by(nonwhite) // boxplot of wage by nonwhite
library(wooldridge)
# 其余部分 R 代码块的运行,都是提前加载 wooldridge 包,不再进一步重复。
hist(wage1$wage)
# histogram of wage
plot(y = wage$1wage, x = wage1$educ)
abline(lm(wage1$wage~wage1$educ),col=“red”)
# add fitted line to scatterplot
boxplot(wage1$wage~wage1$nonwhite)
# boxplot of wage by nonwhite
Stata 的劣势是每次只能使用一个数据集,不过 R 可以同时调入多个数据集,因此必须在每个函数调用中指定。R 没有等同于 Stata 中 codebook
的命令。在 R 中,安装 AER
包时,会自动安装其他有用的附属包:car
、lmtest
、sandwich
。
. browse // open browser for loaded data
. describe // describe structure of loaded data
. summarize // display summary statistics for all variables in dataset
. list in 1/6 // display first 6 rows
. tabulate educ // tabulate educ variable frequencies
. tabulate educ female // cross-tabulate educ and female frequencies
View(wage1)
# open browser for loaded wage1 data
str(wage1)
# describe structure of wage1 data
summary(wage1)
# display summary statistics for wage1 variables
head(wage1)
# display first 6 (default) rows data
tail(wage1)
# display last 6 rows
table(wage1$educ)
# tabulate educ frequencies
table(“yrs_edu” = wage1$educ, “female” =wage1$female)
# tabulate educ frequencies name table columns
本部分涉及生成新变量、计算变量的均值、选取部分变量、生成虚拟变量等相关内容
. gen exper2 = exper^2 // create exper squared variable
. egen wage_avg = mean(wage) // create average wage variable
. drop tenursq // drop tenursq variable
. keep wage educ exper nonwhite numdep // keep selected variables
. tab numdep, gen(numdep) // create dummy variables for numdep
. recode exper (1/20 = 1 "1 to 20 years") (21/40 = 2 "21 to 40 years") ///
> (41/max = 3 "41+ years"),gen(experlvl) // recode exper and gen new variable
wage1$exper2 <- wage1$exper^2
# create exper squared variable
wage1$wage_avg <- mean(wage1$wage)
# create average wage variable
wage1$tenursq <- NULL
# drop tenursq
wage1 <- wage1[ , c(“wage”, “educ”,“exper”, “nonwhite”)]
# keep selected variables
wage1 <-fastDummies::dummy_cols(wage1,select_columns = “numdep”)
# create dummy variables for numdep, use {fastDummies} package
wage1$experlvl <- 3
# recode exper
wage1$experlvl[wage1$exper < 41] <- 2
wage1$experlvl[wage1$exper < 21] <- 1
. reg wage educ // simple regression of wage by educ
. reg wage educ if nonwhite==1 // add condition with if statement
. reg wage educ exper, robust // multiple regression using HC1 robust standard errors
. reg wage educ exper,cluster(numdep) // use clustered standard errors
mod1 <- lm(wage ~ educ, data =wage1)
# simple regression of wage by educ, store results in mod1
summary(mod1)
# print summary of mod1 results
mod2 <- lm(wage ~ educ, data =wage1[wage1$nonwhite==1, ])
# add condition with if statement
mod3 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1, se_type= “stata”)
# multiple regressionwith HC1 robust standard errors, use {estimatr} package
mod4 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1,clusters = numdep)
# use clustered standard errors.
. bcuse mroz, clear
. logit inlf nwifeinc educ // estimate logistic regression
. probit inlf nwifeinc educ // estimate logistic regression
. tobit hours nwifeinc educ, ll(0) // estimate tobit regression
mod_log <- glm(inlf~nwifeinc + educ+ family=binomial(link="logit"),data=mroz)
# estimate logistic regression
mod_pro <- glm(inlf~nwifeinc + educ+ family=binomial(link="probit"),data=mroz)
# estimate logistic regression
mod_tob <- AER::tobit(hours ~ nwifeinc + educ, left = 0, data = mroz)
# estimate tobit regression
本部分主要涉及异方差检验、遗漏变量检验和组间
. bcuse wage1, clear
. reg lwage educ exper // estimation used for examples below
. estat hettest // Breusch-Pagan/Cook-Weisberg test for heteroskedasticity
. estat ovtest // Ramsey RESET test for omitted variables
. ttest wage, by(nonwhite) //compare means of same variable between groups
mod <-lm(lwage ~ educ exper, data =wage1)
# estimate used for examples below
lmtest::bptest(mod)
# Breusch-Pagan/Cook-Weisberg test for
# heteroskedasticity using the {lmtest} package
lmtest::resettest(mod)
# Ramsey RESET test
t.test(wage ~ nonwhite, data =wage1)
# independent group t-test
在 Stata 中,通常使用特殊运算符指代变量为连续变量 (c.
) 或类别变量 (i.
)。 同样,#
运算符表示不同的方式来返回它们之间的交互变量。在这里,我们展示了这些运算符的常见用法及其 R 等效处理方式。
. reg lwage i.numdep // treat numdep as a factor variable
. reg lwage c.educ#c.exper // return interaction term only
. reg lwage c.educ##c.exper // return full factorial specification
. reg lwage c.exper##i.numdep // return full, interact continuous and categorical
lm(lwage ~ as.factor(numdep), data= wage1)
# treat numdep as factor
lm(lwage ~ educ:exper, data =wage1)
# return interaction termonly
lm(lwage ~ educ*exper, data =wage1)
# return full factorial specification
lm(wage ~ exper*as.factor(numdep),data = wage1)
# return full, interact continuous and categorical
. bcuse murder, clear
. xtset id year // set id as entities (panel) and year as time variable
. xtdescribe // describe pattern of xt data
. xtsum // summarize xt data
. xtreg mrdrte unem, fe // fixed effects regressi
plm::is.pbalanced(murder$id,murder$year)
# check panel balancewith {plm} package
modfe <- plm::plm(mrdrte ~ unem,index = c("id", "year"),model ="within", data = murder)
# estimatefixed effects (within) model
summary(modfe)
# display results
. bcuse mroz, clear
. ivreg lwage (educ = fatheduc), first // show results of firststage regression
. ivreg lwage (educ = fatheduc) // show results of 2SLS directly
modiv <-AER::ivreg(lwage ~ educ |fatheduc, data = mroz)
# estimate 2SLS with {AER} package
summary(modiv, diagnostics = TRUE)
# get diagnostic tests of IV andendogenous variable
在 Stata 中,后续估计必须紧接着回归估计,而 R 是面向对象编程,不存在这样的困扰。本部分主要涉及回归结果输出和边际效应展示。
. reg lwage educ exper##exper // estimation used for following postestimation commands
. estimates store mod1 // stores inmemory the last estimation resultsto mod1
. margins // get average predictive
. margins, dydx(*) // get average marginal effects for all variables
. marginsplot // plot marginal effects
. margins, dydx(exper) // average marginal effects of experience
. margins, at(exper=(1(10)40)) // average predictive margins over exper range at 10-year increments
. est restore mod1 // loads mod1 back into working memory
. estimates table mod1 // display table with stored estimation results
mod1 <- lm(lwage ~ educ + exper + I(exper^2), data = wage1)
# Note: in R, mathematical expressions inside a formula call must be isolated with I()
margins::prediction(mod1)
# get average predictive margins with {margins} package
m1 <- margins::margins(mod1)
# get average marginal effects for all variables
plot(m)
# plot marginal effects
summary(m)
# get detailed summary of marginal effects
margins::prediction(mod1, at = list(exper = seq(1,51,10)))
# predictive margins over exper range at 10-year increments
stargazer::stargazer(mod1, mod2, type = “text”)
# use {stargazer} package, with type=text to display results within R.
# Note: type= also can be changed for LaTex and HTML output.
Note:产生如下推文列表的 Stata 命令为:
lianxh R语言, m
安装最新版lianxh
命令:
ssc install lianxh, replace
免费公开课
最新课程-直播课
专题 | 嘉宾 | 直播/回看视频 |
---|---|---|
⭐ 最新专题 | 文本分析、机器学习、效率专题、生存分析等 | |
研究设计 | 连玉君 | 我的特斯拉-实证研究设计,-幻灯片- |
面板模型 | 连玉君 | 动态面板模型,-幻灯片- |
面板模型 | 连玉君 | 直击面板数据模型 [免费公开课,2小时] |
⛳ 课程主页
⛳ 课程主页
关于我们
课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法
等
连享会小程序:扫一扫,看推文,看视频……
扫码加入连享会微信群,提问交流更方便
✏ 连享会-常见问题解答:
✨ https://gitee.com/lianxh/Course/wikis
New!
lianxh
和songbl
命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh