Stata:输出漂亮的LaTeX表格-T222

发布时间:2021-04-30 阅读 11676

Stata连享会   主页 || 视频 || 推文 || 知乎

温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。

New! lianxh 命令发布了:
随时搜索推文、Stata 资源。安装命令如下:
. ssc install lianxh
详情参见帮助文件 (有惊喜):
. help lianxh

课程详情 https://gitee.com/lianxh/Course

课程主页 https://gitee.com/lianxh/Course

⛳ Stata 系列推文:

作者: 袁子晴 (香港大学)
邮箱: yzq0612@foxmail.com


目录


1. 问题背景

LaTex 在科学排版领域有其独特的魅力,输出的 PDF 文档在美观之余,还可以实现版本控制和自动更新,在计量经济学领域,如果能够直接从 Stata 输出可供 Latex 编译的 tex 文档,那么我们在修改回归设定之后,只需要重新运行 Stata 代码和 Latex 编译,就能自动得到更新后的PDF文档,这样就省去了在后续修改中的繁琐过程。

2. Stata 案例

2.1 包含交乘项的回归表格

Stata 命令estout/esttab

案例来源Lindsey and Stein (2019 WP)

该案例展示了我们如何输出包含交乘项的回归结果,如何用对号来简洁地表示控制了哪几类变量,以及如何加入统计检验的结果。由于该案例暂无公布源数据,我们着重分析利用 esttab 回归结果输出部分的代码:

⏬ 点击下方「代码展开」⛳

Stata 代码展开
eststo  clear
eststo: areg empend_normsqi               after##c.frac lnpop lnpercap lnvc chHPI i.yq i.industry [weight=pa]  if ${SAMPLEIF} & age_buckets == 1, absorb(state) cluster(state)

eststo: areg empend_normsqi  lowsectorvc##after##c.frac lnpop lnpercap lnvc chHPI i.yq i.industry [weight=pa]  if ${SAMPLEIF} & age_buckets == 1, absorb(state) cluster(state)
test 1.after#c.frac + 1.lowsectorvc#1.after#c.frac = 0
estadd scalar sum_afterfrac_p = r(p)

eststo: areg empend_normsqi    empconc50##after##c.frac lnpop lnpercap lnvc chHPI i.yq i.industry [weight=pa]  if ${SAMPLEIF} & age_buckets == 1, absorb(state) cluster(state)
test 1.after#c.frac + 1.empconc50#1.after#c.frac = 0
estadd scalar sum_afterfrac_p = r(p)

eststo: areg empend_normsqi      highcap##after##c.frac lnpop lnpercap lnvc chHPI i.yq i.industry [weight=pa]  if ${SAMPLEIF} & age_buckets == 1, absorb(state) cluster(state)
test 1.after#c.frac + 1.highcap#1.after#c.frac = 0
estadd scalar sum_afterfrac_p = r(p)

*------------ 回归结果的输出 ---------------------*
esttab using "${OUTPATH}emp_entrants_industry", ///
nomtitles booktabs replace ///
order(1.after#c.frac ///
	1.lowsectorvc#1.after  1.lowsectorvc#c.f rac 1.lowsectorvc#1.after#c.frac ///
	1.empconc50#1.after  1.empconc50#c.frac 1.empconc50#1.after#c.frac ///
	1.highcap#1.after  1.highcap#c.frac 1.highcap#1.after#c.frac ///
	lnpop lnpercap lnvc chHPI) ///
drop(frac 1.after  1.empconc50 1.lowsectorvc 1.highcap) ///
indicate("Annual state-level controls = lnpop lnpercap lnvc chHPI"   "State FE = _cons" "Quarterly FE = *.yq" "Industry FE = *.industry", labels("\checkmark" "")) ///
stats(N sum_afterfrac_p, labels("Observations" "\$p\$-val: \$\beta_{\text{Aft}\times\text{Frac}} + \beta_{\text{\ldots industry}\times\text{Aft}\times\text{Frac}} = 0 \$")) ///
label nobaselevels interaction("\$\times\$") substitute("=1" "") nonotes se star(* 0.10 ** 0.05 *** 0.01)
  • nomtitles 选项代表每列表头不显示被解释变量;
  • 加入 booktabs 选项代表生成 LaTeX 格式的表格,供 LaTeX 的 booktabs 宏包使用进行编译,在生成的 tex 文档中添加加载 booktabs 宏包的代码;
  • order 选项指定回归表格的变量顺序,drop 表示部分变量系数不显示在回归结果中;
  • indicate 指定是否控制了某类变量,本案例是指固定效应,, labels("\checkmark" "") 这里代表用对号来表示,因为在 LaTeX 中会把 \checkmark 编译为对号;
  • stats 指定需要输出的统计量,并在 , labels() 选项中输入 LaTex 数学公式;
  • label 输出变量的标签值;
  • nobaselevels 删掉离散变量 (factor variables) 的基准水平 (base levels);
  • interaction("\$\times\$") 代表交乘项的符号为乘号,$\times\$ 在 LaTeX 中编译为乘号, 需要注意的是需要加 $...$,代表数学环境;

2.2 工具变量法 2SLS 和 OLS 回归结果对比

该案例使用工具变量法估计大学教育回报,利用NLS调查对象附近是否有2年制或4年制的大学 ( "college in the county" ) 作为是否就读大学的工具变量,使用 card.dta 数据集,点击 码云下载链接

首先通过ssc install estout 安装外部命令 estout ,其选项 prehead(strlist)posthead(strlist)prefoot(strlist)postfoot(strlist) 分别表示在表格标题前、标题后、表格页脚前、页脚后添加文本,所以可以利用这个功能自定义设定 LaTex 表格的布局。

在具体案例中,作者基于此实现了表头合并单元格、添加新行和尾注,Stata 代码如下,LaTeX 中用 \begin{}\end{} 声明环境,本例中用到了 table, tabular, threeparttable 等环境来编译表格,运行后在工作路径下会得到一个 tex 文档。

⏬ 点击下方「代码展开」⛳

Stata 代码展开
* This file will estimate the returns to college education using the "college in the county" instrumental variable
* which instruments for college attendance with whether there is a 2 or 4 year college near the respondent of the NLS.
* You will first need to install estout (ssc install estout), and you will need to pull the card data from my Mixtape
* in the cloud.  

copy "https://file.lianxh.cn/data/card.dta" card.dta, replace
use "card", replace
*---------------- 回归分析 -------------------*
cap n tempvar tempsample
cap n local specname=`specname'+1
reg  lwage  educ  exper black south married   smsa
cap n estadd ysumm
cap n estimates store ols_`specname'

cap n local specname=`specname'+1
reg educ nearc4 exper black south married   smsa
cap n local biv = _b[nearc4]
cap n local seiv = _se[nearc4]
cap n unab ivs: nearc4
cap n local xlist: colnames e(b)
cap n local ivs: list ivs & xlist
cap n test `ivs'
cap n local F_iv=r(F)
cap n local specname=`specname'+1

cap n ivregress 2sls lwage (educ=nearc4) exper black south married   smsa, first
cap n estadd ysumm
cap n estadd scalar biv  = `biv'
cap n estadd scalar seiv = `seiv'
cap n estadd scalar F_iv = `F_iv'
cap n rivtest
n return list
cap n local ar_chi2=r(ar_chi2)
cap n local ar_p=r(ar_p)
cap n estadd scalar ar_chi2 = `ar_chi2'
cap n estadd scalar ar_p = `ar_p'
cap n estimates store tsls_`specname'

*---------------- 输出回归表格 -------------------*
#delimit ;
	cap n estout * using card.tex,
		style(tex) label notype
		cells((b(star fmt(%9.3f))) (se(fmt(%9.3f)par))) 		
		stats(biv seiv F_iv ar_p N ymean ysd, star(biv)
		labels("College in the county" "Robust standard error " "F statistic for IV in first stage" "Anderson-Rubin test" "N" "Mean Dependent Variable" "Std. Dev. Dependent Variable")
			fmt(3 3 3 2 %9.0fc 3 3))
		keep(educ exper black south married smsa) replace noabbrev starlevels(* 0.10 ** 0.05 *** 0.01)
		title(OLS and 2SLS regressions of Log Earnings on Schooling)   
		collabels(none) eqlabels(none) mlabels(none) mgroups(none)
		prehead("\begin{table}[htbp]\centering" "\scriptsize" "\caption{@title}" "\label{2sls_1}" "\begin{center}" "\begin{threeparttable}" "\begin{tabular}{l*{@E}{c}}"
"\toprule"
"\multicolumn{1}{l}{\textbf{Dependent variable}}&"
"\multicolumn{2}{c}{\textbf{Log wage}}\\"
"\multicolumn{1}{c}{}&"
"\multicolumn{1}{c}{OLS}&"
"\multicolumn{1}{c}{2SLS}\\")
		posthead("\midrule")
		prefoot("\\" "\midrule" "\multicolumn{1}{c}{First Stage Instrument}\\")  
		postfoot("\bottomrule" "\end{tabular}" "\begin{tablenotes}" "\tiny" "\item Standard errors in parenthesis. * p$<$0.10, ** p$<$0.05, *** p$<$0.01" "\end{tablenotes}" \end{threeparttable} \end{center} \end{table});
#delimit cr

然后在 LaTex 中编译该 tex 文档之前需要加载相应的宏包,用于编译的 main.tex 文件如下:

% 声明文章类型
\documentclass{article}
% 加载所需宏包
\usepackage[utf8]{inputenc}
\usepackage{booktabs}
\usepackage{threeparttable}
% 题目 作者 日期
\title{Stata-LaTex Workflow}
\author{}
\date{}
% 正文部分
\begin{document}

\maketitle

\section{Introduction}
% 插入 Stata 输出的回归表格的 tex 文件
\input{card}
\end{document}

最终编译呈现的效果可以通过在线 LaTex 编辑器 Overleaf 进行查看。

2.3 描述性统计表格

Stata 中输出描述性统计表格,基本思路是首先通过 estpost 将描述性统计结果伪装成回归结果储存在内存中,然后使用 esttab 将其输出,通过 label 选项输出表头文本,需要注意的是在 LaTex 中 输出百分号 % 的时候,前面需要加反斜杠 \ 进行转义。

⏬ 点击下方「代码展开」⛳

Stata 代码展开
gen t_entry_norm1 = entry_norm1 * 100
label var t_entry_norm1	"Firm entry rate (\%)"

gen t_frac = frac * 100
label var t_frac	"Frac (\%)"

gen t_chHPI = chHPI * 100
label var t_chHPI	"House price index change (\%)"

eststo clear
eststo: quietly estpost summarize	t_entry_norm1 ///
									t_frac lnpop lnpercap lnvc t_chHPI ///
								if ${SAMPLEIF} & (age_buckets == 1) & (pa > 0), detail

esttab using "${OUTPATH}summstat_bds_sy.tex", replace ///
	cells("mean(fmt(2)) sd(fmt(2)) p50(fmt(2)) p25(fmt(2)) p75(fmt(2))") label booktab nonumber nomtitles
eststo clear

2.4 包含自定义列的描述性统计表格

Stata命令estpost, esttab

案例来源Doleac and Stein (2013)

⏬ 点击下方「代码展开」⛳

Stata 代码展开
label var responses		"Responses"
label var scams			"Scams"
label var nonscams		"Non-scams"
label var offers		"Offers"


estpost tabstat anyresponse anyscam anynonscam anyoffer [aw = stateweight], statistics(mean) columns(statistics)
matrix anys = e(mean)
matrix colnames anys = responses scams nonscams offers	// Get column in same rows as responses scams nonscams offers
matrix rownames anys = anys

eststo clear
estpost tabstat responses scams nonscams offers [aw = stateweight], statistics(mean sd p25 p50 p75 p95 max count) columns(statistics)

estadd matrix anys

esttab using "${OUTPATH}numberresponsesw.tex", ///
	cells("mean(fmt(a2) label(Mean)) sd(fmt(a2) label(Std.\ Dev.)) p25(fmt(a2) label(25\%)) p50(fmt(a2) label(50\%)) p75(fmt(a2) label(75\%))  p95(fmt(a2) label(95\%)) max(fmt(a2) label(Max.)) anys(fmt(a2) label(Frac.\ $>0$))") ///
	nostar nonumbers nomtitle label booktabs width(38em) replace

2.5 tabout 命令输出 LaTex 表格

2.5.1 包含百分比的双向交叉表

Stata命令tabout

案例来源Magdalena Bennett

estoutcl1cl2 只适用于 LaTeX 输出,而且要求在LaTeX文档中加载 booktabs 宏包。cl1 选项可以用来在第一行和第二行标题之间绘制横线,cl2 可以在第二行和第三行标题之间绘制横线,我们需要在括号内输入想跨越的列号,例如第2列和第3列下面绘制横线,则输入 cl2(2-3)

⏬ 点击下方「代码展开」⛳

Stata 代码展开
global ADCHARS			"highquality price"
label var highquality "Ad.\ quality"

tabout ${ADCHARS} type using ${OUTPATH}adchars.tex, c(freq col) f(0 1) ///
	cl1(2-11) cl2(2-3 4-5 6-7 8-9 10-11) topstr(Advertisement Characteristics\label{tab:adchars}|\textwidth) ///
	replace style(tex) bt font(bold) topf(top.tex) botf(bot.tex)

2.5.2 模拟数据演示 tabout 用法

Stata命令tabout

案例来源Magdalena Bennett

⏬ 点击下方「代码展开」⛳

Stata 代码展开
/**********************************************************************************************
* 主题:使用texdoc输出LaTeX表格的实例
* 创建者:M: M. Bennett
* 创建于:04/02/19
* 目的:dofile生成模拟数据,展示如何使用 texdoc 输出不同的 LaTex 表格
**********************************************************************************************/

*------- 生成模拟数据 -------------*
* Start with an empty dataset
clear

* Set a seed so results are replicable (I'm using Stata 15, just in case)
set seed 123

* Set number of obs
set obs 1000

* Generate an error term u normally distributed
gen u=rnormal(0,100)

* Generate a random treatment variable:
gen treat = runiform(0,1)>0.5
label var treat "Treatment"

* Generate a binary covariate (50% of the population)
gen x1=runiform(0,1)>0.7
label var x1 "High income"

* Generate a continuous covariate
gen x2=rnormal(400,110)
label var x2 "Test score in 4th grade"

* Generate an outcome:
gen y= 100 + 50*x1 + 1*x2 + 20*treat + u
label var y "Test score in 8th grade"

* Generate a simple covariate table (there are different ways to run this, but here's one):
foreach var of varlist x1 x2{
	* Run a regression (using , robust to account for heteroskedasticity)
	reg `var' treat, robust

	* Store whatever you want (in this case, we are going to save the means for each group and the difference)
	global m`var'_0: di %6.3fc _b[_cons]
	global m`var'_1: di %6.3fc _b[_cons] + _b[treat]
	global dif_`var': di %6.3fc _b[treat]

	* Store the label of the variable, because it's easier then to put it in the tables:
	global lbe_`var' : var label `var'

	* Because I also want to know if the difference is significant, I'm going to store the p-value (and the star level)
	qui test treat=0
	global p_`var': di %12.3fc r(p)
	glo star_`var'=cond(${p_`var'}<.01,"***",cond(${p_`var'}<.05,"**",cond(${p_`var'}<.1,"*","")))
}

*----------- 输出描述性统计分析表格 ---------------*
* Now, lets generate our balance table! (heads up, if you haven't set a current directory (cd), it's going to store this table wherever is the default).
* PS: It's not necessary to do this with a loop, but if you have a lot of covariates, might be useful.

texdoc init balance_table.tex, replace force
tex \begin{tabular}{lccc} \toprule \toprule
tex Variable			& 	Control	& Treatment & Difference \\
tex \addlinespace \hline \\
foreach var of varlist x1 x2{
tex ${lbe_`var'} & ${m`var'_0} & ${m`var'_1} & ${dif_`var'}${star_`var'}\\
}
tex \hline \hline
tex \end{tabular}

texdoc close

*------------- 回归分析-------------*
* Let's run a regression now! (with and without controls)

global covs0 "treat"
global covs1 "treat x*"

global lbe_y : var label y
global lbe_treat : var label treat

forvalues spec=0(1)1{

	*Run the regression
	reg y ${covs`spec'}, robust
	*Save some outputs
	scalar N=e(N)
	scalar r2=e(r2)
	global N_`spec'=N
	global r2_`spec': di %6.3fc r2

	*Store the variable of interest
	global b_`spec': di %6.3fc _b[treat]
	global se_`spec': di %6.3fc _se[treat]

	qui test treat=0
	global p_`spec': di %12.3fc r(p)
	glo star_`spec'=cond(${p_`spec'}<.01,"***",cond(${p_`spec'}<.05,"**",cond(${p_`spec'}<.1,"*","")))
}

*----------- 输出回归表格 ---------------*
texdoc init treatment_effect.tex, replace force
tex \begin{tabular}{lcc} \toprule \toprule
tex Variable			& 	${lbe_y}	& ${lbe_y} \\
tex \addlinespace \hline \\
tex ${lbe_treat} & ${b_0}${star_0} & ${b_1}${star_1}\\
tex 			& (${se_0}) & (${se_1})\\ \addlinespace
tex Controls & No & Yes \\
tex Observations & $N_0 & $N_1 \\
tex R-square & $r2_0 & $r2_1 \\
tex \hline \hline
tex \end{tabular}

texdoc close

最终编译呈现的效果可以通过在线 LaTex 编辑器 Overleaf 进行查看。

3. 参考资料和相关推文

Note:产生如下推文列表的 Stata 命令为:
lianxh latex
安装最新版 lianxh 命令:
ssc install lianxh, replace

相关课程

免费公开课

最新课程-直播课

专题 嘉宾 直播/回看视频
最新专题 文本分析、机器学习、效率专题、生存分析等
研究设计 连玉君 我的特斯拉-实证研究设计-幻灯片-
面板模型 连玉君 动态面板模型-幻灯片-
面板模型 连玉君 直击面板数据模型 [免费公开课,2小时]
  • Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。

课程主页

课程主页

关于我们

  • Stata连享会 由中山大学连玉君老师团队创办,定期分享实证分析经验。
  • 连享会-主页知乎专栏,400+ 推文,实证分析不再抓狂。直播间 有很多视频课程,可以随时观看。
  • 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标,输入简要关键词,以便快速呈现历史推文,获取工具软件和数据下载。常见关键词:课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法

连享会小程序:扫一扫,看推文,看视频……

扫码加入连享会微信群,提问交流更方便

✏ 连享会-常见问题解答:
https://gitee.com/lianxh/Course/wikis

New! lianxh 命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh