# Stata：一文读懂计数模型和因变量受限模型

Stata连享会   主页 || 视频 || 推文 || 知乎 || Bilibili 站

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`

⛳ Stata 系列推文：

​

## 1. 计数模型

### 1.1 泊松回归

``````. cnssc install lxhuse, lianxh replace
. lxhuse poisson_sim.dta, clear
. sum num_awards math

Variable |  Obs    Mean    Std. dev.   Min   Max
-------------+--------------------------------------
num_awards |  200     .63    1.052921      0     6
math |  200  52.645    9.368448     33    75

. tabstat num_awards, by(prog) stats(mean sd n)

Summary for variables: num_awards
Group variable: prog (type of program)

prog |   Mean        SD         N
---------+---------------------------
general |     .2  .4045199        45
vocation |    .24  .5174506        50
---------+---------------------------
Total |    .63  1.052921       200
-------------------------------------

. histogram num_awards, discrete freq scheme(s1mono)
``````
``````. poisson num_awards i.prog math, vce(robust)

Iteration 0:   log pseudolikelihood = -182.75759
Iteration 1:   log pseudolikelihood = -182.75225
Iteration 2:   log pseudolikelihood = -182.75225
Poisson regression                                Number of obs =    200
Wald chi2(3)  =  80.15
Prob > chi2   = 0.0000
Log pseudolikelihood = -182.75225                 Pseudo R2     = 0.2118
------------------------------------------------------------------------
|               Robust
num_awards | Coefficient  std. err.    z    P>|z|   [95% conf. interval]
-----------+------------------------------------------------------------
prog |
academic  |      1.084      0.322   3.37   0.001      0.453       1.715
vocation  |      0.370      0.401   0.92   0.357     -0.417       1.157
|
math |      0.070      0.010   6.71   0.000      0.050       0.091
_cons |     -5.247      0.648  -8.10   0.000     -6.516      -3.978
------------------------------------------------------------------------
``````

math 的系数是 0.070，表明数学分数每增加一单位，得奖的平均次数是原来的 ${e}^{0.07}$ 倍。academic 的系数为 1.084，表明与 general 组相比，academic 组的 log(奖励次数) 平均增加了约 1.084。类似地，vocation 组比 general 组的 log(奖励次数) 平均增加了约 0.370。

``````. estat gof

Deviance goodness-of-fit =  189.4496
Prob > chi2(196)         =    0.6182

Pearson goodness-of-fit  =  212.1437
Prob > chi2(196)         =    0.2040
``````

``````. poisson, irr

Poisson regression                           Number of obs =    200
Wald chi2(3)  =  80.15
Prob > chi2   = 0.0000
Log pseudolikelihood = -182.75225            Pseudo R2     = 0.2118
-------------------------------------------------------------------
|           Robust
num_awards |    IRR   std. err.    z    P>|z|  [95% conf. interval]
-----------+-------------------------------------------------------
prog |
academic  |  2.956      0.951   3.37   0.001     1.573       5.555
vocation  |  1.447      0.581   0.92   0.357     0.659       3.179
|
math |  1.073      0.011   6.71   0.000     1.051       1.095
_cons |  0.005      0.003  -8.10   0.000     0.001       0.019
-------------------------------------------------------------------
Note: _cons estimates baseline incidence rate.
``````

``````. margins prog, atmeans

Adjusted predictions                              Number of obs = 200
Model VCE: Robust
Expression: Predicted number of events, predict()
At: 1.prog =   .225 (mean)
2.prog =   .525 (mean)
3.prog =    .25 (mean)
math   = 52.645 (mean)
---------------------------------------------------------------------
|        Delta-method
| Margin   std. err.      z    P>|z|  [95% conf. interval]
-----------+---------------------------------------------------------
prog |
general  |  0.211      0.063     3.37   0.001     0.088       0.334
academic  |  0.625      0.089     7.05   0.000     0.451       0.799
vocation  |  0.306      0.083     3.69   0.000     0.144       0.468
---------------------------------------------------------------------
``````

``````. margins, at(math=(35(10)75)) vsquish

Predictive margins                         Number of obs = 200
Model VCE: Robust
Expression: Predicted number of events, predict()
1._at: math = 35
2._at: math = 45
3._at: math = 55
4._at: math = 65
5._at: math = 75
--------------------------------------------------------------
|        Delta-method
| Margin   std. err.     z    P>|z|  [95% conf. interval]
-----+--------------------------------------------------------
_at |
1  |  0.131      0.036    3.66   0.000     0.061       0.201
2  |  0.264      0.048    5.57   0.000     0.171       0.358
3  |  0.533      0.058    9.27   0.000     0.421       0.646
4  |  1.076      0.122    8.82   0.000     0.837       1.315
5  |  2.170      0.412    5.27   0.000     1.363       2.976
--------------------------------------------------------------
``````

``````. fitstat

Measures of Fit for poisson of num_awards
Log-Lik Intercept Only:     -231.864     Log-Lik Full Model:  -182.752
D(195):                      365.505     LR(3):                 98.223
Prob > LR:              0.000
Maximum Likelihood R2:         0.388     Cragg & Uhler's R2:     0.430
AIC:                           1.878     AIC*n:                375.505
BIC:                        -667.667     BIC':                 -82.328
``````

``````. twoway scatter c1 c2 c3 math, connect(l l l) sort ///
>        ytitle("Predicted Count") ylabel( ,nogrid) legend(rows(3)) ///
>        legend(ring(0) position(10)) scheme(s1mono)
=
``````

### 1.2 负二项回归

``````lxhuse nb_data.dta, clear
summarize daysabs math
histogram daysabs, discrete freq scheme(s1mono)
nbreg daysabs math i.prog
test 2.prog 3.prog
nbreg, irr
margins prog, atmeans
margins, at(math=(0(20)100)) vsquish
fitstat
predict c
sort math
twoway (line c math if prog==1) ///
(line c math if prog==2) ///
(line c math if prog==3), ///
ytitle("Predicted Mean Value of Response") ylabel(0(3)15 ,nogrid) ///
xtitle("Math Score") legend(order(1 "prog=1" 2 "prog=2" 3 "prog=3")) scheme(s1mono)
``````

## 2.受限被解释变量

### 2.1 断尾回归

``````. lxhuse truncreg.dta, clear
. summarize achiv langscore

Variable |  Obs      Mean   Std. dev.  Min  Max
-----------+-------------------------------------
achiv |  178  54.23596    8.96323    41   76
langscore |  178  54.01124   8.944896    31   67

. tabstat achiv, by(prog) stats(n mean sd)

Summary for variables: achiv
Group variable: prog (type of program)
prog |         N      Mean        SD
---------+------------------------------
general |        40    51.575   7.97074
vocation |        37  49.86486  7.276912
---------+------------------------------
Total |       178  54.23596   8.96323
----------------------------------------

. histogram achiv, bin(6) start(40) freq normal
``````

``````. truncreg achiv langscore i.prog, ll(40)
(0 obs truncated)

Fitting full model:
Iteration 0:   log likelihood = -598.11669
Iteration 1:   log likelihood = -591.68358
Iteration 2:   log likelihood = -591.31208
Iteration 3:   log likelihood = -591.30981
Iteration 4:   log likelihood = -591.30981
Truncated regression
Limit: Lower =   40                        Number of obs     =        178
Upper = +inf                        Wald chi2(3)      =      54.76
Log likelihood = -591.30981                Prob > chi2       =     0.0000
-------------------------------------------------------------------------
achiv | Coefficient  Std. err.      z    P>|z|  [95% conf. interval]
-----------+-------------------------------------------------------------
langscore |      0.713      0.114     6.22   0.000     0.488       0.937
|
prog |
academic  |      4.065      2.055     1.98   0.048     0.038       8.093
vocation  |     -1.136      2.670    -0.43   0.671    -6.369       4.097
|
_cons |     11.302      6.773     1.67   0.095    -1.973      24.576
-----------+-------------------------------------------------------------
/sigma |      8.755      0.667    13.13   0.000     7.448      10.062
-------------------------------------------------------------------------
``````

``````. test 2.prog 3.prog

(1)  [eq1]2.prog = 0
(2)  [eq1]3.prog = 0
chi2(  2) =    7.19
Prob > chi2 =    0.0274
``````

``````. margins prog

Predictive margins                              Number of obs = 178
Model VCE: OIM
Expression: Linear prediction, predict()
-------------------------------------------------------------------
|        Delta-method
| Margin   std. err.      z    P>|z| [95% conf. interval]
----------+--------------------------------------------------------
prog |
general  | 49.789      1.897    26.24   0.000   46.070      53.507
academic  | 53.854      1.150    46.83   0.000   51.600      56.108
vocation  | 48.653      2.140    22.73   0.000   44.458      52.848
--------------------------------------------------------------------

. marginsplot
``````

`truncreg` 命令输出既不包含 ${R}^{2}$ 也不包含伪 ${R}^{2}$。通过将 achiv 与预测值关联起来，然后将结果平方，从而计算出关联程度的粗略估计值。

``````. predict p
. correlate p achiv
. display r(rho)^2

.30519203
``````

### 2.2 Tobit模型

tobit 模型又称归并回归 (censored regression) 模型，用于估计因变量中存在左归并或右归并时变量之间的线性关系(也称为上下归并)。具体来说，当某个值大于或等于某一阈值时，就会出现上述归并，因此真实值可能等于某一阈值，但也可能更高。那些高于或低于某个阈值的值将被归并。例如，在 20 世纪 80 年代，联邦法律限制速度计读数不超过每小时 85 英里。所以，如果你想通过马力和引擎大小来预测一辆车的最高时速，你得到的读数不会高于 85，即使这辆车的实际行驶速度更快。这是一个典型的数据右截 (右截) 的例子。我们唯一能确定的是这些车辆的时速至少为 85 英里。

``````. lxhuse tobitreg.dta, clear

Variable |   Obs      Mean    Std. dev.    Min    Max
----------+-------------------------------------------
apt |   200   640.035    99.21903     352    800
read |   200     52.23    10.25294      28     76
math |   200    52.645    9.368448      33     75

. tabulate prog

type of |
program |      Freq.     Percent        Cum.
------------+-----------------------------------
general |        105       52.50       75.00
vocational |         50       25.00      100.00
------------+-----------------------------------
Total |        200      100.00

. histogram apt, normal bin(10) xline(800)
``````

``````lxhuse tobitreg.dta, clear
tobit apt read math i.prog, ul(800)
test 2.prog 3.prog
test 2.prog = 3.prog
predict yhat
correlate apt yhat
fitstat
``````

## 3. 参考资料

• 陈强. 高级计量经济学及 Stata 应用[M]. 高等教育出版社, 2014.

## 4. 相关推文

Note：产生如下推文列表的 Stata 命令为：
`lianxh logit probit`

`ssc install lianxh, replace`

## 相关课程

### 最新课程-直播课

• Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

### 关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。
• 连享会-主页知乎专栏，400+ 推文，实证分析不再抓狂。直播间 有很多视频课程，可以随时观看。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会-常见问题解答：
https://gitee.com/lianxh/Course/wikis

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`