温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。
4 June 2020
Enrique Pinzon, Associate Director Econometrics
I care about reproducible research. Anyone who has ever been a research assistant or tried to follow the path set by other researchers also cares. Sometimes, reproducing others’ results is a frustrating task; sometimes, it is outright impossible. Yet sometimes, it is satisfyingly simple. In my experience, reproducing results is easy when it involves a Stata do-file. I believe this is true even beyond my personal bias (I work for Stata and used the software regularly before that). A recent article published by the American Economic Association (AEA), Vilhuber, Turrito, and Welch (2020), shows that Stata is the preferred package among economists, and I believe reproducibility is a big reason why.
The AEA established reproducibility guidelines in 2008. Recently, it updated its guidelines to require authors not only to make data and analysis available but also to provide the code used to clean the data and the raw data, whenever feasible. Now, the editorial process includes an AEA data editor who verifies that the information provided by the authors is sufficient to replicate the results in the paper.
Vilhuber, Turrito, and Welch (2020) show that since the inception of the policy, Stata has been used in 73% of the supplements provided by the authors. The usage has been increasing over the span of the policy. The graph below shows the percentage of data supplements in which different software packages are used. These percentages may add up to more than 100% because content from more than one software package may be submitted in each supplement.
This is not a surprise to anyone who has used Stata. I believe one important reason researchers choose Stata is that reproducing your results is easy. Case in point is the graph above. To get the data and reproduce the graph, you just need to run the do-file, which I discuss in Appendix I. If you want to create a reproducible report, see my discussion in Appendix II.
The do-file below mainly uses three commands: import delimited, which I use to get the comma-separated value dataset used for the graph; xtline, which I use to generate the graph; and egen, which helps me to generate a numeric categorical variable from a string variable using the group function. The other commands I use simplify readability, allow me to modify the code, and help me display results.
Line 1 is there for reproducibility. Stata is the only package I am aware of with integrated version control ensuring that scripts written as long ago as 2008, and indeed, even earlier, can still be used to reproduce their results in the modern version of the software. Lines 5 to 7 create locals for the location of the files. I use them in my call to import delimited. Line 16 creates a categorical variable from a string variable. Each category corresponds to a software package. In the next line, I keep a subset of the data. The data I keep correspond to the time period since the AEA implemented its data policy. In line 21, I use xtset to declare data as a panel and then to be able to use xtline. In line 22, I change the default Stata graphic scheme. Stata has multiple schemes, and you could even write one that best represents your preferences. I like the simplicity of s1color. The remaining code reproduces figure 1.
version 16
// Simplyfing readability and allowing for possibly other data downloads
local where "https://raw.githubusercontent.com/AEADataEditor"
local folder "aea-supplement-migration/master/data/generated"
local data "table.aea.software_by_year.csv"
// Importing dataset
import delimited using "`where'/`folder'/`data'", ///
clear varnames(1) colrange(2:9)
// Generating a numeric variable to graph
egen software_num = group(software_collapsed), label
keep if year > 2007
// Graphing
xtset software_num year
set scheme s1color
xtline percent, overlay legend(position(10) ring(0) rows(2) ///
region(lstyle(none)) order(9 3 4 7 5 6 8 1 2)) ///
ytitle("") xtitle("") xlabel(2008(3)2017 2019) ///
ylabel(0(20)100) plotregion(margin(medium)) ///
plot9(lcolor(blue)) ///
note("Note: A supplement can be used in more than one software." ///
"Note: Data for 2019 ends in October.")
You can create a reproducible report in Word, Excel, PDF, and HTML using Stata. By reproducible, I mean that you can run a do-file that produces Stata results and outputs them into the format you want.
I want to create an HTML document. I save a do-file as a text file and give it the name aea.txt. Then, I type dyndoc “aea.txt”, replace. I generate an HTML document. Aside from aea.txt, I use two other files to produce my HTML document. First, I create a header.txt file. The header.txt file contains HTML code to include at the top of the target HTML file. The header refers to the second file, stmarkdown.css, which is a style sheet that defines how the HTML document is to be formatted.
Now that I have my HTML documents, with Stata’s reporting tools, I can migrate from HTML to Word using one line of code, html2docx aea.html, saving(aea.docx) replace, and from Word to PDF, again, with one line of code, docx2pdf aea.docx, saving (aea.pdf) replace.
连享会-直播课 上线了!
http://lianxh.duanshu.com
免费公开课:
直击面板数据模型 - 连玉君,时长:1小时40分钟 Stata 33 讲 - 连玉君, 每讲 15 分钟. 部分直播课 课程资料下载 (PPT,dofiles等)
支持回看,所有课程可以随时购买观看。
专题 | 嘉宾 | 直播/回看视频 |
---|---|---|
⭐ Stata暑期班 | 连玉君 江艇 |
线上直播 9 天 2020.7.28-8.7 |
效率分析-专题 | 连玉君 鲁晓东 张 宁 |
视频-TFP-SFA-DEA 已上线,3天 |
文本分析/爬虫 | 游万海 司继春 |
视频-文本分析与爬虫 已上线,4天 |
空间计量系列 | 范巧 | 空间全局模型, 空间权重矩阵 空间动态面板, 空间DID |
研究设计 | 连玉君 | 我的特斯拉-实证研究设计,-幻灯片- |
面板模型 | 连玉君 | 动态面板模型,-幻灯片- |
直击面板数据模型 [免费公开课,2小时] |
Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。
关于我们
课程, 直播, 视频, 客服, 模型设定, 研究设计, 暑期班
stata, plus,Profile, 手册, SJ, 外部命令, profile, mata, 绘图, 编程, 数据, 可视化
DID,RDD, PSM,IV,DID, DDD, 合成控制法,内生性, 事件研究
, 交乘, 平方项, 缺失值, 离群值, 缩尾, R2, 乱码, 结果
Probit, Logit, tobit, MLE, GMM, DEA, Bootstrap, bs, MC, TFP
, 面板, 直击面板数据, 动态面板, VAR, 生存分析, 分位数
空间, 空间计量, 连老师, 直播, 爬虫, 文本, 正则, python
Markdown, Markdown幻灯片, marp, 工具, 软件, Sai2, gInk, Annotator, 手写批注
, 盈余管理, 特斯拉, 甲壳虫, 论文重现
, 易懂教程, 码云, 教程, 知乎
连享会小程序:扫一扫,看推文,看视频……
扫码加入连享会微信群,提问交流更方便
✏ 连享会学习群-常见问题解答汇总:
✨ https://gitee.com/arlionn/WD