# Stata与R语言等价命令

Stata连享会   主页 || 视频 || 推文 || 知乎 || Bilibili 站

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`

⛳ Stata 系列推文：

## 2. 安装

``````ssc install outreg2
``````
``````install.packages("wooldridge")
# install wooldridge package
data(package = "wooldridge")
# list datasets in wooldridge package
# load wage1 dataset into session
?wage1
# consult documentation on wage1 dataset
``````

## 3. 基本绘图

``````. bcuse wage1, clear
. hist(wage)                                  // histogram of wage
. scatter (wage edu)                          // scatter plot of wage by educ
. twoway (scatter wage educ) (lfit wage educ) // scatter plot with fitted line
. graph box wage, by(nonwhite)                // boxplot of wage by nonwhite
``````
``````library(wooldridge)
# 其余部分 R 代码块的运行，都是提前加载 wooldridge 包，不再进一步重复。
hist(wage1\$wage)
# histogram of wage
plot(y = wage\$1wage, x = wage1\$educ)
abline(lm(wage1\$wage~wage1\$educ),col=“red”)
# add fitted line to scatterplot
boxplot(wage1\$wage~wage1\$nonwhite)
# boxplot of wage by nonwhite
``````

## 4. 汇总数据

Stata 的劣势是每次只能使用一个数据集，不过 R 可以同时调入多个数据集，因此必须在每个函数调用中指定。R 没有等同于 Stata 中 `codebook` 的命令。在 R 中，安装 `AER` 包时，会自动安装其他有用的附属包：`car``lmtest``sandwich`

``````. browse               // open browser for loaded data
. describe             // describe structure of loaded data
. summarize            // display summary statistics for all variables in dataset
. list in 1/6          // display first 6 rows
. tabulate educ        // tabulate educ variable frequencies
. tabulate educ female // cross-tabulate educ and female frequencies
``````
``````View(wage1)
# open browser for loaded wage1 data
str(wage1)
# describe structure of wage1 data
summary(wage1)
# display summary statistics for wage1 variables
# display first 6 (default) rows data
tail(wage1)
# display last 6 rows
table(wage1\$educ)
# tabulate educ frequencies
table(“yrs_edu” = wage1\$educ, “female” =wage1\$female)
# tabulate educ frequencies name table columns
``````

## 5. 生成或编辑变量

``````. gen exper2 = exper^2                 // create exper squared variable
. egen wage_avg = mean(wage)           // create average wage variable
. drop tenursq                         // drop tenursq variable
. keep wage educ exper nonwhite numdep // keep selected variables
. tab numdep, gen(numdep)              // create dummy variables for numdep
. recode exper (1/20 = 1 "1 to 20 years") (21/40 = 2 "21 to 40 years") ///
>     (41/max = 3 "41+ years"),gen(experlvl) // recode exper and gen new variable
``````
``````wage1\$exper2 <- wage1\$exper^2
# create exper squared variable
wage1\$wage_avg <- mean(wage1\$wage)
# create average wage variable
wage1\$tenursq <- NULL
# drop tenursq
wage1 <- wage1[ , c(“wage”, “educ”,“exper”, “nonwhite”)]
# keep selected variables
wage1 <-fastDummies::dummy_cols(wage1,select_columns = “numdep”)
# create dummy variables for numdep, use {fastDummies} package
wage1\$experlvl <- 3
# recode exper
wage1\$experlvl[wage1\$exper < 41] <- 2
wage1\$experlvl[wage1\$exper < 21] <- 1
``````

## 6. 估计模型 (横截面数据)

### 6.1 OLS

``````. reg wage educ                // simple regression of wage by educ
. reg wage educ if nonwhite==1 // add condition with if statement
. reg wage educ exper, robust  // multiple regression using HC1 robust standard errors
. reg wage educ exper,cluster(numdep) // use clustered standard errors
``````
``````mod1 <- lm(wage ~ educ, data =wage1)
# simple regression of wage by educ, store results in mod1
summary(mod1)
# print summary of mod1 results
mod2 <- lm(wage ~ educ, data =wage1[wage1\$nonwhite==1, ])
# add condition with if statement
mod3 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1, se_type= “stata”)
# multiple regressionwith HC1 robust standard errors, use {estimatr} package
mod4 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1,clusters = numdep)
# use clustered standard errors.
``````

### 6.2 MLE (Logit/Probit/Tobit)

``````. bcuse mroz, clear
. logit inlf nwifeinc educ         // estimate logistic regression
. probit inlf nwifeinc educ        // estimate logistic regression
. tobit hours nwifeinc educ, ll(0) // estimate tobit regression
``````
``````mod_log <- glm(inlf~nwifeinc + educ+ family=binomial(link="logit"),data=mroz)
# estimate logistic regression
mod_pro <- glm(inlf~nwifeinc + educ+ family=binomial(link="probit"),data=mroz)
# estimate logistic regression
mod_tob <- AER::tobit(hours ~ nwifeinc + educ, left = 0, data = mroz)
# estimate tobit regression
``````

## 7. 统计检验与诊断

``````. bcuse wage1, clear
. reg lwage educ exper     // estimation used for examples below
. estat hettest // Breusch-Pagan/Cook-Weisberg test for heteroskedasticity
. estat ovtest             // Ramsey RESET test for omitted variables
. ttest wage, by(nonwhite) //compare means of same variable between groups
``````
``````mod <-lm(lwage ~ educ exper, data =wage1)
# estimate used for examples below
lmtest::bptest(mod)
# Breusch-Pagan/Cook-Weisberg test for
# heteroskedasticity using the {lmtest} package
lmtest::resettest(mod)
# Ramsey RESET test
t.test(wage ~ nonwhite, data =wage1)
# independent group t-test
``````

## 8. 交互项，类别/连续变量

``````. reg lwage i.numdep          // treat numdep as a factor variable
. reg lwage c.educ#c.exper    // return interaction term only
. reg lwage c.educ##c.exper   // return full factorial specification
. reg lwage c.exper##i.numdep // return full, interact continuous and categorical
``````
``````lm(lwage ~ as.factor(numdep), data= wage1)
# treat numdep as factor
lm(lwage ~ educ:exper, data =wage1)
# return interaction termonly
lm(lwage ~ educ*exper, data =wage1)
# return full factorial specification
lm(wage ~ exper*as.factor(numdep),data = wage1)
# return full, interact continuous and categorical
``````

## 9. 估计模型 (面板数据)

### 9.1 面板回归

``````. bcuse murder, clear
. xtset id year // set id as entities (panel) and year as time variable
. xtdescribe             // describe pattern of xt data
. xtsum                  // summarize xt data
. xtreg mrdrte unem, fe  // fixed effects regressi
``````
``````plm::is.pbalanced(murder\$id,murder\$year)
# check panel balancewith {plm} package
modfe <- plm::plm(mrdrte ~ unem,index = c("id", "year"),model ="within", data = murder)
# estimatefixed effects (within) model
summary(modfe)
# display results
``````

### 9.2 工具变量

``````. bcuse mroz, clear
. ivreg lwage (educ = fatheduc), first // show results of firststage regression
. ivreg lwage (educ = fatheduc)        // show results of 2SLS directly
``````
``````modiv <-AER::ivreg(lwage ~ educ |fatheduc, data = mroz)
# estimate 2SLS with {AER} package
summary(modiv, diagnostics = TRUE)
# get diagnostic tests of IV andendogenous variable
``````

## 10. 后续估计

``````. reg lwage educ exper##exper  // estimation used for following postestimation commands
. estimates store mod1         // stores inmemory the last estimation resultsto mod1
. margins                      // get average predictive
. margins, dydx(*)             // get average marginal effects for all variables
. marginsplot                  // plot marginal effects
. margins, dydx(exper)         // average marginal effects of experience
. margins, at(exper=(1(10)40)) // average predictive margins over exper range at 10-year increments
. est restore mod1             // loads mod1 back into working memory
. estimates table mod1         // display table with stored estimation results
``````
``````mod1 <- lm(lwage ~ educ + exper + I(exper^2), data = wage1)
# Note: in R, mathematical expressions inside a formula call must be isolated with I()
margins::prediction(mod1)
# get average predictive margins with {margins} package
m1 <- margins::margins(mod1)
# get average marginal effects for all variables
plot(m)
# plot marginal effects
summary(m)
# get detailed summary of marginal effects
margins::prediction(mod1, at = list(exper = seq(1,51,10)))
# predictive margins over exper range at 10-year increments
stargazer::stargazer(mod1, mod2, type = “text”)
# use {stargazer} package, with type=text to display results within R.
# Note: type=  also can be changed for LaTex and HTML output.
``````

## 11. 相关推文

Note：产生如下推文列表的 Stata 命令为：
`lianxh R语言, m`

`ssc install lianxh, replace`

## 相关课程

### 最新课程-直播课

• Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

### 关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。
• 连享会-主页知乎专栏，700+ 推文，实证分析不再抓狂。直播间 有很多视频课程，可以随时观看。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会-常见问题解答：
https://gitee.com/lianxh/Course/wikis

New！ `lianxh``songbl` 命令发布了：

`. ssc install lianxh`

`. help lianxh`