温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。
New!
lianxh
命令发布了:
随时搜索推文、Stata 资源。安装命令如下:
. ssc install lianxh
详情参见帮助文件 (有惊喜):
. help lianxh
⛳ Stata 系列推文:
作者:黄国宾 (广州大学)
邮箱:HuangGuobin@gzhu.edu.cn
目录
Duflo (2001) 在研究印度尼西亚学校建设对教育和工资的影响时,提出了队列 DID (Cohort DID),也称截面 DID。随后,在 Chen 和 Zhou (200)、程令国和张晔 (2011)、以及 Chen 等 (2020) 的研究中,该方法被广泛的应用。除此之外,连享会也有较多推文对这一方法进行了介绍。
与以往推文不同的是,本文主要是从模型设定、安慰剂检验、以及平行趋势检验三个方面介绍 Chen 等 (2020) 的文章,即针对知识青年 “上山下乡” 对教育水平影响的研究,并附上相关 Stata 操作。
传统 DID 一般有地区和时间两个维度。但是,截面数据由于缺少时间维度,只能由个体的出生队列 (年份) 代替,这也是 “队列 DID” 名称的由来。
其中,
接下来,我们利用 Chen 等 (2020) 提供的数据和代码,来展示队列 DID 的 Stata 操作。具体数据和代码,可从 「OPENICPSR」 下载。
. use census_1990_clean.dta, clear
. global var_abs_cohort "region1990 prov#year_birth
c.primary_base#year_birth c.junior_base#year_birth"
. gen treat = inrange(year_birth,1956,1969) if ///
inrange(year_birth,1946,1969)
. reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==1, ///
absorb($var_abs_cohort) cluster(region1990)
. est store m1
. reghdfe yedu c.sdy_density#c.treat male han_ethn if rural==0, ///
absorb($var_abs_cohort) cluster(region1990)
. est store m2
. esttab m1 m2
--------------------------------------------
(1) (2)
yedu yedu
--------------------------------------------
c.sdy_dens~t 3.237*** 0.151
(4.62) (0.29)
male 1.874*** 0.668***
(66.05) (26.08)
han_ethn 0.150** 0.0000334
(2.66) (0.00)
_cons 5.436*** 9.564***
(98.56) (121.74)
--------------------------------------------
N 2775858 417883
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
可以看出,估计结果与 Chen 等 (2020) 的 Table 3 中列 (1) 和列 (2) 是一致的。列 (1) 中
Chen 等 (2020) 通过改变控制组与处理组设计了两种安慰剂检验:
这两种检验分别用了 1990 年和 2000 年的人口普查数据。两种检验的计量模型分别如下:
. keep if rural==1
. drop rural
. gen treat_placebo = inrange(year_birth,1951,1955) ///
if inrange(year_birth,1946,1955)
. reghdfe yedu c.sdy_density#c.treat_placebo male han_ethn, ///
absorb($var_abs_cohort) cluster(region1990)
. est store m1
. use census_2000_clean.dta, clear
. global var_abs_cohort "region2000 prov#year_birth c.primary_base#year_birth c.junior_base#year_birth"
. gen treat_placebo = inrange(year_birth,1975,1979) ///
if inrange(year_birth,1970,1979)
. reghdfe yedu c.sdy_density#c.treat_placebo male han_ethn, ///
absorb($var_abs_cohort) cluster(region2000)
. est store m2
. esttab m1 m2
--------------------------------------------
(1) (2)
yedu yedu
--------------------------------------------
c.sdy_dens~o -0.817 -0.432
(-1.42) (-1.36)
male 2.286*** 0.665***
(76.09) (44.31)
han_ethn 0.0802 0.477***
(1.45) (11.89)
_cons 4.148*** 7.161***
(76.46) (193.19)
--------------------------------------------
N 960123 947025
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
可以看出,估计结果与 Chen 等 (2020) 的 Table 3 中列 (7) 和列(8) 是一致的。两种检验中
为了进行平行趋势检验,Chen 等 (2020) 设计了如下动态的 Reduced-Form 队列 DID 模型 (Duflo,2001):
其中,
Chen 等 (2020) 分别使用了 1982、1990 年和 2000 年的人口普查数据,来进行平行趋势检验。
. *1990人口普查数据
. use census_1990_clean.dta, clear
. global var_abs_cohort "region1990 prov#year_birth ///
c.primary_base#year_birth c.junior_base#year_birth"
. gen treat = inrange(year_birth,1956,1969) if inrange(year_birth,1946,1969)
. keep if rural==1
. drop rural
. compress
. forvalues y = 1946/1969 {
gen I`y' = sdy_density*[year_birth==`y']
} //生成知青密度与各个出生队列(年份)虚拟变量交乘项
. reghdfe yedu I1946-I1969 male han_ethn, ///
absorb($var_abs_cohort) cluster(region1990)
. outreg2 using "Figure3.txt", replace sideway ///
noparen se nonotes nocons noaster ///
nolabel text keep(I1946-I1969) ///
sortvar(I1946-I1969) //导出回归系数到Figure3.txt
. *1982人口普查数据
. use census_1982_clean.dta, clear
. global var_abs_cohort1 "region1982 prov#year_birth ///
c.primary_base_older#year_birth c.junior_base_older#year_birth"
. forvalues y = 1946/1962 {
gen I`y' = sdy_density*[year_birth==`y']
}
. reghdfe yedu I1946-I1962 male han_ethn, ///
absorb($var_abs_cohort1) cluster(region1982
. outreg2 using "Figure3.txt", append sideway ///
noparen se nonotes nocons noaster ///
nolabel text keep(I1946-I1962) ///
sortvar(I1946-I1962) //导出回归系数到Figure3.txt
. *2000人口普查数据
. use census_2000_clean.dta, clear
. global var_abs_cohort2 "region2000 prov#year_birth ///
c.primary_base_older#year_birth c.junior_base_older#year_birth"
. forvalues y = 1946/1979 {
gen I`y' = sdy_density*[year_birth==`y']
}
. reghdfe yedu I1946-I1979 male han_ethn, ///
absorb($var_abs_cohort2) cluster(region2000)
. outreg2 using "Figure3.txt", append sideway ///
noparen se nonotes nocons noaster ///
nolabel text keep(I1946-I1979) ///
sortvar(I1946-I1979) //导出回归系数到Figure3.txt
*绘图
insheet using "Figure3.txt", clear
keep if inrange(_n,5,38)
gen year = substr(v1,2,4)
rename (v2 v3 v4 v5 v6 v7)(coef1990 se1990 coef1982 se1982 coef2000 se2000)
destring, force replace
keep year coef* se*
reshape long coef se, i(year) j(data)
drop if coef == .
gen lb = coef - 1.96*se
gen ub = coef + 1.96*se
gen y_overlap = min(max(year-1955,0),max(1970-year,0),6)
sort data year
twoway line lb year if data==1982, sort lpattern(dash) ///
lcolor(gs8) yaxis(1) || line ub year if data==1982, ///
sort lpattern(dash) lcolor(gs8) || line coef year ///
if data==1982, lwidth(thick) lcolor(black) yaxis(1) ///
|| line y_overlap year if data==1982, sort ///
lpattern(dash_dot) lwidth(thick) lcolor(gs8) ///
yaxis(2) ||, graphregion(fcolor(gs16) lcolor(gs16)) ///
plotregion(lcolor(gs16) margin(zero)) ylabel(-4(2)8, ///
labsize(small) angle(0) format(%12.0f) axis(1)) ///
ytitle("Coefficients", size(small) axis(1)) ///
ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) ///
axis(2)) ytick(-6 0(1)6 12,axis(2)) ///
ytitle("Years of Overlap", size(small) axis(2)) ///
xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) ///
xtitle("Birth Cohort", size(small)) xline(1955 1970, ///
lpattern(solid) lwidth(thin) lcolor(black)) ///
title("Panel A - Census 1982", size(small) margin(small)) ///
yline(0, lpattern(solid) lwidth(thin) lcolor(black)) ///
legend(off) fxsize(70) fysize(60)
graph save a,replace //Figure3-PanelA: Census 1982
twoway line lb year if data==1990, lpattern(dash) lcolor(gs8) ///
yaxis(1) || line ub year if data==1990, lpattern(dash) ///
lcolor(gs8) || line coef year if data==1990, lwidth(thick) ///
lcolor(black) yaxis(1) || line y_overlap year if data==1990, ///
lpattern(dash_dot)lwidth(thick) lcolor(gs8) yaxis(2) ||, ///
graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) ///
margin(zero)) ylabel(-4(2)8,labsize(small) angle(0) ///
format(%12.0f) axis(1)) ytitle("Coefficients", size(small) ///
axis(1)) ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) ///
axis(2)) ytick(-6 0(1)6 12,axis(2)) ytitle("Years of Overlap", ///
size(small) axis(2)) xlabel(1945(5)1980, labsize(small)) ///
xtick(1945(5)1980) xtitle("Birth Cohort", size(small)) ///
xline(1955 1970,lpattern(solid) lwidth(thin) lcolor(black)) ///
title("Panel B - Census 1990", size(small) margin(small)) ///
yline(0, lpattern(solid) lwidth(thin) lcolor(black)) ///
legend(off) fxsize(70) fysize(60)
graph save b,replace //Figure3-PanelB: Census 1990
twoway line lb year if data==2000, lpattern(dash) lcolor(gs8) yaxis(1) ///
|| line ub year if data==2000, lpattern(dash) lcolor(gs8) ///
|| line coef year if data==2000, lwidth(thick) lcolor(black) ///
yaxis(1) || line y_overlap year if data==2000, lpattern(dash_dot) ///
lwidth(thick) lcolor(gs8) yaxis(2) ||, graphregion(fcolor(gs16) ///
lcolor(gs16)) plotregion(lcolor(gs16) margin(zero)) ylabel(-3(1)6, ///
labsize(small) angle(0) format(%12.0f) axis(1)) ///
ytitle("Coefficients", size(small) axis(1)) ///
ylabel(0(2)6, labsize(small) angle(0) format(%12.0f) ///
axis(2)) ytick(-6 0(1)6 12,axis(2)) ///
ytitle("Years of Overlap", size(small) axis(2)) ///
xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) ///
xtitle("Birth Cohort", size(small)) legend(order(3 1 4) ///
label(3 "Coefficient") label(1 "95% CI") ///
label(4 "Overlapped Years in""Primary Schools") col(2) ///
size(small) margin(tiny)) xline(1955 1970, lpattern(solid) ///
lwidth(thin) lcolor(black)) title("Panel C - Census 2000", ///
size(small) margin(small)) yline(0, lpattern(solid) lwidth(thin) ///
lcolor(black)) fxsize(65) fysize(80)
graph save c,replace //Figure3-PanelC: Census 2000
twoway || connected coef year if data==1982, lwidth(medthick) ///
msymbol(triangle) color(black) || line coef year if data==1990, ///
lwidth(medthick) color(gs6) || connected coef year if data==2000, ///
lwidth(medthick) msymbol(square) color(gs12) ||, ///
graphregion(fcolor(gs16) lcolor(gs16)) plotregion(lcolor(gs16) ///
margin(zero)) ylabel(-2(1)5, labsize(small) angle(0) ///
format(%12.0f)) ytitle("Coefficients", size(small)) ///
xlabel(1945(5)1980, labsize(small)) xtick(1945(5)1980) ///
xtitle("Birth Cohort", size(small)) legend(label(1 "Census 1982") ///
label(2 "Census 1990") label(3 "Census 2000") col(2) size(small)) ///
xline(1955 1970, lpattern(solid) lwidth(thin) lcolor(black)) ///
title("Panel D - Three Censuses in One Graph", size(small) ///
margin(small)) yline(0, lpattern(solid) lwidth(thin) ///
lcolor(black)) fxsize(70) fysize(80)
graph save d,replace //Figure3-PanelD: Three censuses in one graph
graph combine a.gph b.gph c.gph d.gph, ///
graphregion(fcolor(gs16) lcolor(gs16)) //Figure3
可以看出,在 1956 之前的出生队列,
在 1956 之后的出生队列,
Note:产生如下推文列表的 Stata 命令为:
lianxh DID, m
安装最新版lianxh
命令:
ssc install lianxh, replace
专题:Stata命令
专题:倍分法DID
专题:内生性-因果推断
免费公开课
最新课程-直播课
专题 | 嘉宾 | 直播/回看视频 |
---|---|---|
⭐ 最新专题 | 文本分析、机器学习、效率专题、生存分析等 | |
研究设计 | 连玉君 | 我的特斯拉-实证研究设计,-幻灯片- |
面板模型 | 连玉君 | 动态面板模型,-幻灯片- |
面板模型 | 连玉君 | 直击面板数据模型 [免费公开课,2小时] |
⛳ 课程主页
⛳ 课程主页
关于我们
课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法
等
连享会小程序:扫一扫,看推文,看视频……
扫码加入连享会微信群,提问交流更方便
✏ 连享会-常见问题解答:
✨ <https://gitee.com/lianxh/Course/wikis
New!
lianxh
命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh