Stata连享会 主页 || 视频 || 推文 || 知乎 || Bilibili 站
温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。
New!
lianxh
命令发布了:
随时搜索推文、Stata 资源。安装:
. ssc install lianxh
详情参见帮助文件 (有惊喜):
. help lianxh
连享会新命令:cnssc
,ihelp
,rdbalance
,gitee
,installpkg
⛳ Stata 系列推文:
作者:丁雅文 (北京大学)
邮箱: 1901111380@pku.edu.cn
编者按:本文部分内容参考如下资料,特此致谢!
Source: Joerg Luedicke. 2019. Performing and interpreting discrete choice analyses in Stata. -PDF-
目录
离散选择模型 (discrete choice model, DCM) 是研究个体选择行为强有力的分析工具,目前应用较为广泛的 Stata 命令包括 logit
、probit
、mlogit
、nlogit
、ologit
等,详情可参考连享会专题推文「Probit-Logit」。
相比条件 logit 模型面临的 IIA 假定与现实不符、难以处理个体偏好异质性等问题,混合 logit 模型通过允许其中一个或多个参数随机分布,对标准的条件 logit 模型进行了拓展。Stata 16 则为离散选择模型引入了一套全新的 cm
命令来实现混合 logit 模型,该命令可以进行各种灵活的边际效应分析,功能更加强大。本篇推送将对这套 cm
命令进行系统性的实操介绍。
Stata 16 为离散选择模型引入的全新 cm
系列命令,主要包括 cmclogit
、cmmprobit
、cmroprobit
、cmrologit
、cmmixlogit
和 cmxtmixlogit
,因此可以很方便分析任何选择模型的结果。
在开始进行实证分析之前,首先要对数据进行 cmset
,即宣布数据是 choice model data。其中:
cmset caseidvar altvar [, force]
表示数据为 cross-sectional choice model data;cmset panelvar timevar altvar [, tsoptions force]
则表示数据为 panel choice model data。. use http://www.stata-press.com/data/r16/transport.dta, clear
(Transportation choice data)
. cmset id t alt
note: case identifier _caseid generated from id and t.
note: panel by alternatives identifier _panelaltid generated from id and alt.
Panel data: Panels id and time t
Case ID variable: _caseid
Alternatives variable: alt
Panel by alternatives variable: _panelaltid (strongly balanced)
Time variable: t, 1 to 3
Delta: 1 unit
接下来可以使用 cmchoiceset
、cmtab
、cmsample
等命令对数据进行描述性统计分析。
. **tabulate choice sets
. cmchoiceset
Tabulation of choice-set possibilities
Choice set | Freq. Percent Cum.
------------+-----------------------------------
1 2 3 4 | 1,500 100.00 100.00
------------+-----------------------------------
Total | 1,500 100.00
Note: Total is number of cases.
其中,cmsample
用来检查样本被排除的原因:
. preserve
. replace trcost=. in 5
. replace alt=. in 2
. replace choice=0 if t==3 & id==1
. replace income=1 in 1
. cmsample trcost trtime, choice(choice) casevars(age income)
Reason for exclusion | Freq. Percent Cum.
-----------------------------------+-----------------------------------
observations included | 5,988 99.80 99.80
alternatives variable missing | 4 0.07 99.87
choice variable all 0 | 4 0.07 99.93
casevars not constant within case* | 4 0.07 100.00
-----------------------------------+-----------------------------------
Total | 6,000 100.00
* indicates an error
. restore
在进行完上述分析之后,便可使用下列命令进行各种离散选择模型的实证分析:
cmclogit
:conditional logit model (MaFadden’s choice model)cmmixlogit
:mixed logit modelcmxtmixlogit
:panel data mixed logit modelcmmporbit
:muitinomial probit modelcmroprobit
:rank-ordered probit modelcmrologit
:rank-ordered logit model
本部分介绍以 cmxtmixlogit
为例。其中, cmxtmixlogit
命令为 Stata 16 的一项新功能,用来拟合面板数据的混合 logit 模型。下面我们将以 transport.dta 数据为例,来介绍 cmxtmixlogit
命令的使用。首先,运行 cmxtmixlogit
命令分析各种交通出行的成本对人们选择交通方式的影响:
. webuse transport.dta, clear
. cmset id t alt
. cmxtmixlogit choice trcost, casevars(age income) random(trtime) nolog
Mixed logit choice model Number of obs = 6,000
Number of cases = 1,500
Panel variable: id Number of panels = 500
Time variable: t Cases per panel: min = 3
avg = 3.0
max = 3
Alternatives variable: alt Alts per case: min = 4
avg = 4.0
max = 4
Integration sequence: Hammersley
Integration points: 594 Wald chi2(8) = 432.68
Log simulated-likelihood = -1005.9899 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
choice | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
alt |
trcost | -0.839 0.044 -19.13 0.000 -0.925 -0.753
trtime | -1.509 0.264 -5.71 0.000 -2.026 -0.991
-------------+----------------------------------------------------------------
/Normal |
sd(trtime)| 1.946 0.259 1.498 2.527
-------------+----------------------------------------------------------------
Car | (base alternative)
-------------+----------------------------------------------------------------
Public |
age | 0.154 0.067 2.29 0.022 0.022 0.286
income | -0.382 0.035 -10.98 0.000 -0.450 -0.313
_cons | -0.576 0.352 -1.64 0.102 -1.265 0.113
-------------+----------------------------------------------------------------
Bicycle |
age | 0.206 0.085 2.43 0.015 0.040 0.373
income | -0.523 0.046 -11.28 0.000 -0.613 -0.432
_cons | -1.137 0.446 -2.55 0.011 -2.012 -0.263
-------------+----------------------------------------------------------------
Walk |
age | 0.310 0.107 2.89 0.004 0.100 0.519
income | -0.902 0.069 -13.14 0.000 -1.036 -0.767
_cons | -0.418 0.561 -0.75 0.456 -1.517 0.681
------------------------------------------------------------------------------
接着,我们就可以运行 margins
命令进行边际效应分析。margins
命令的运行较为灵活。下面举几个例子来具体说明 margins
命令的用法。
例 1:当样本年收入为 30000 美元时,人们选择各种交通方式的期望概率。
. margins, at (income=3)
Predictive margins Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
At: income = 3
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_outcome |
Car | 0.333 0.020 16.93 0.000 0.295 0.372
Public | 0.221 0.018 12.00 0.000 0.185 0.257
Bicycle | 0.168 0.018 9.23 0.000 0.132 0.203
Walk | 0.278 0.024 11.41 0.000 0.230 0.326
------------------------------------------------------------------------------
例 2:相比年收入为 30000 美元的样本群体,年收入为 40000 美元的样本群体在不同时间选择各种交通方式的期望概率变化。
. margins, at(income=(3 4)) contrast(at(r) nowald) over(t)
Contrasts of predictive margins Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
Over: t
1._at: 1.t
income = 3
1._at: 2.t
income = 3
1._at: 3.t
income = 3
2._at: 1.t
income = 4
2._at: 2.t
income = 4
2._at: 3.t
income = 4
---------------------------------------------------------------------
| Delta-method
| Contrast std. err. [95% conf. interval]
--------------------+------------------------------------------------
_at@_outcome#t |
(2 vs 1) Car#1 | 0.079 0.004 0.071 0.087
(2 vs 1) Car#2 | 0.083 0.004 0.074 0.091
(2 vs 1) Car#3 | 0.079 0.004 0.071 0.087
(2 vs 1) Public#1 | 0.007 0.005 -0.003 0.016
(2 vs 1) Public#2 | 0.005 0.005 -0.004 0.015
(2 vs 1) Public#3 | 0.008 0.005 -0.001 0.017
(2 vs 1) Bicycle#1 | -0.009 0.006 -0.020 0.002
(2 vs 1) Bicycle#2 | -0.008 0.005 -0.019 0.002
(2 vs 1) Bicycle#3 | -0.007 0.005 -0.018 0.004
(2 vs 1) Walk#1 | -0.077 0.010 -0.097 -0.058
(2 vs 1) Walk#2 | -0.079 0.010 -0.099 -0.060
(2 vs 1) Walk#3 | -0.080 0.010 -0.099 -0.060
---------------------------------------------------------------------
通过 marginsplot
命令,我们可以进一步将这种随时间变化的期望概率的变化可视化。
. marginsplot
Variables that uniquely identify margins: t _outcome
例 3:在整个收入区间内,样本群体选择各种交通方式的平均期望概率。
. margins,at(income=(1 (1) 16))
Predictive margins Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
1._at: income = 1
2._at: income = 2
3._at: income = 3
......
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_outcome#_at |
Car# 1 | 0.187 0.021 8.85 0.000 0.145 0.228
Car# 2 | 0.256 0.021 12.13 0.000 0.215 0.297
Car# 3 | 0.333 0.020 16.93 0.000 0.295 0.372
......
Walk#14 | 0.001 0.000 1.90 0.058 -0.000 0.002
Walk#15 | 0.000 0.000 1.66 0.096 -0.000 0.001
Walk#16 | 0.000 0.000 1.48 0.140 -0.000 0.000
------------------------------------------------------------------------------
. marginsplot,recast(line) ciopts(recast(rarea) color(%20))
Variables that uniquely identify margins: income _outcome
例 4:如果汽车出行成本增加了 25%,这将如何影响人们选择汽车出行的概率?这对人们选择其他出行方式的概率有什么影响?
. margins, alternative(Car) at(trcost=generate(trcost)) ///
> at(trcost=generate(1.25*trcost)) subpop(if t==1)
Predictive margins Number of obs = 6,000
Model VCE: OIM Subpop. no. obs = 2,000
Expression: Pr(alt), predict()
Alternative: Car
1._at: trcost = trcost
2._at: trcost = 1.25*trcost
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_outcome#_at |
Car#1 | 0.544 0.011 47.71 0.000 0.522 0.566
Car#2 | 0.441 0.010 43.61 0.000 0.421 0.460
Public#1 | 0.201 0.010 19.26 0.000 0.181 0.221
Public#2 | 0.255 0.012 21.60 0.000 0.232 0.278
Bicycle#1 | 0.126 0.010 13.14 0.000 0.107 0.144
Bicycle#2 | 0.157 0.011 14.21 0.000 0.135 0.178
Walk#1 | 0.130 0.010 12.76 0.000 0.110 0.149
Walk#2 | 0.148 0.011 13.43 0.000 0.126 0.169
------------------------------------------------------------------------------
进一步地,我们可以将汽车出行成本增加 25% 后人们选择各种出行方式的概率与汽车出行成本未增加的情况进行比较。
. margins, alternative(Car) at(trcost=generate(trcost)) ///
> at(trcost=generate(1.25*trcost)) contrast (at(r) nowald) ///
> subpop(if t==1)
Contrasts of predictive margins Number of obs = 6,000
Model VCE: OIM Subpop. no. obs = 2,000
Expression: Pr(alt), predict()
Alternative: Car
1._at: trcost = trcost
2._at: trcost = 1.25*trcost
-------------------------------------------------------------------
| Delta-method
| Contrast std. err. [95% conf. interval]
------------------+------------------------------------------------
_at@_outcome |
(2 vs 1) Car | -0.103 0.003 -0.108 -0.098
(2 vs 1) Public | 0.054 0.002 0.049 0.058
(2 vs 1) Bicycle | 0.031 0.002 0.027 0.035
(2 vs 1) Walk | 0.018 0.002 0.015 0.022
-------------------------------------------------------------------
. marginsplot, recast(dot) yline(0) plotopts(msymbol(square))
Variables that uniquely identify margins: _outcome
Multiple at() options specified:
_atoption=1: trcost=generate(trcost)
_atoption=2: trcost=generate(1.25*trcost)
例 5:选择汽车出行的概率如何随着汽车出行时间的变化而变化?
. margins, dydx(trtime) outcome(Car) alternative(Car)
Average marginal effects Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
Alternative: Car
Outcome: Car
dy/dx wrt: trtime
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
trtime |
_cons | -0.158 0.027 -5.88 0.000 -0.211 -0.105
------------------------------------------------------------------------------
例 6:选择公共交通工具出行的概率如何随与汽车出行时间的变化而变化?
. margins, dydx(trtime) outcome(Public) alternative(Car)
Average marginal effects Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
Alternative: Car
Outcome: Public
dy/dx wrt: trtime
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
trtime |
_cons | 0.106 0.017 6.15 0.000 0.072 0.139
------------------------------------------------------------------------------
例 7:选择各种出行方式的概率如何随着汽车出行时间的变化而变化?
. margins, dydx(trtime) outcome(Car)
Average marginal effects Number of obs = 6,000
Model VCE: OIM
Expression: Pr(alt), predict()
Outcome: Car
dy/dx wrt: trtime
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
trtime |
alt |
Car | -0.158 0.027 -5.88 0.000 -0.211 -0.105
Public | 0.106 0.017 6.15 0.000 0.072 0.139
Bicycle | 0.037 0.007 5.11 0.000 0.023 0.052
Walk | 0.015 0.004 3.52 0.000 0.007 0.024
------------------------------------------------------------------------------
Note:产生如下推文列表的 Stata 命令为:
lianxh logit probit, m
安装最新版lianxh
命令:
ssc install lianxh, replace
免费公开课
最新课程-直播课
专题 | 嘉宾 | 直播/回看视频 |
---|---|---|
⭐ 最新专题 | 文本分析、机器学习、效率专题、生存分析等 | |
研究设计 | 连玉君 | 我的特斯拉-实证研究设计,-幻灯片- |
面板模型 | 连玉君 | 动态面板模型,-幻灯片- |
面板模型 | 连玉君 | 直击面板数据模型 [免费公开课,2小时] |
⛳ 课程主页
⛳ 课程主页
关于我们
课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法
等
连享会小程序:扫一扫,看推文,看视频……
扫码加入连享会微信群,提问交流更方便
✏ 连享会-常见问题解答:
✨ https://gitee.com/lianxh/Course/wikis
New!
lianxh
和songbl
命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh