IV专题- 内生性检验与过度识别检验

发布时间:2022-06-16 阅读 4908

Stata连享会   主页 || 视频 || 推文 || 知乎 || Bilibili 站

温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。

New! lianxh 命令发布了:
随时搜索推文、Stata 资源。安装:
. ssc install lianxh
详情参见帮助文件 (有惊喜):
. help lianxh
连享会新命令:cnssc, ihelp, rdbalance, gitee, installpkg

课程详情 https://gitee.com/lianxh/Course

课程主页 https://gitee.com/lianxh/Course

⛳ Stata 系列推文:

PDF下载 - 推文合集

作者:杨柳 (西北大学)
E-Mail: philoyl@163.com


目录


1. 方程的识别问题

当有足够有效的工具变量时,方程中的参数可以被识别,在这样的情况下,使用 2SLS 法将得到唯一的估计结果。在计量经济分析中,当方程中的参数被识别时,我们就说方程是被识别的。在 IV 估计式中:

仅有当以下两个条件都满足时,β^2SLS 值是唯一的。 (a)(ZZ) 是 × 阶非奇异矩阵; (b)(ZX) 是秩为 k 的满秩矩阵 (列满秩)。 对于条件 (a):只要工具变量之间是线性独立的,(ZZ) 就是 × 阶非奇异矩阵,因此这个要求通常被认为是满足的。 对于条件 (b)(ZX) 的秩为 k 被称为是 "秩条件"k 被称为是 "阶条件"。由于 X 中的外生变量可以作为自己的工具变量,因此,"阶条件" 通常被陈述为 "需要至少与内生变量个数一样多的工具变量",才能保证 k"阶条件" 是必要条件,但满足这个条件还不够,还需要满足 "秩条件"

若 (ZX) 的秩 小于 k,则称方程是 识别不足 的,此时就无法用计量方法得到一致的估计结果。 若 (ZX) 的秩 等于 k,则称方程是 恰足识别 的。 若 (ZX) 的秩 大于 k,则称方程是 过度识别 的。

2. 过度识别约束检验

2.1 干扰项同方差情形

过度识别约束检验 是对 工具变量的外生性 进行检验。在 恰足识别 情况下,我们无法对工具变量的外生性进行直接检验, 但是在过度识别情况下,我们就可以检验多余的工具变量是否与干扰项 u1 不相关。 为了解释这个过程,我们从下式开始:

上式中 y2 是内生变量,维度是 1×G1z1 是外生变量,维度是 1×L1z 是所有外生变量,维度是 1×L,将z 分为两部分 z=(z1,z2),其中 z2 的维度是 1×L2 并且有 L=L1+L2。在过度识别情况下,有 L2>G1。在通常识别条件下,我们可以使用任意 z2 的子集 1×G1 作为 y2 的工具变量来估计上式方程(z1 作为自己的工具变量)。

Hausman (1978) 提出:用恰足识别方程的工具变量的子集进行 2SLS 的估计结果 与 用所有工具变量进行 2SLS 的估计结果 进行比较,如果所有的工具变量都是有效的,那么这两个估计结果之间的差异就应当仅仅是抽样误差。 与检验变量是否是内生的情况类似,构建原始的 Hausman 统计量在计算上是复杂的,不过,我们可以使用一个简单的基于回归的检验过程来替代上面的检验,具体步骤如下:

同方差 情形下: (1). 使用所有的工具变量  z 进行 2SLS 回归,得到残差 u^1; (2). 将 u^1 对 z 进行 OLS 回归(包含常数项),得到 Ru2(假设 z1 与 z 包括常数项,否则为 uncentered R2); (3). 在原假设 E(zu1)=0 和 假设 E(u2zz)=σ2E(zz) σ2=E(u2) (Assumption 2SLS.3) 下,有 NRu2aχQ12,其中 Q1 为多余约束(多余工具变量)的个数,Q1L2G1NRu2 为 Sargan 统计量。 (4). 如果我们拒绝了原假设,那就意味着我们必须重新审查选择的工具变量;如果我们不能拒绝原假设,我们就能够对整体的工具变量的有效性有一定的信心。当然,这个检验对于探测个别工具变量内生性的功效是较低的。

2.2 干扰项异方差情形

异方差 情形下的计算要稍微复杂些。

异方差 情形下: (1). 通过 2SLS 的第一阶段计算得到 y^2; (2). 选择 z^2 的任意子集 h^2,维度为 1×Q1 ( 无论是哪些子集被选取出来,只要我们选择 Q1 个元素即可 ); (3). 将子集 h^2 中的每一个元素对 z1 与 y^2 做回归并计算残差 r^2,维度为 1×Q1,即 Q1 个元素; (4). 将 1 对 u^1r^2 做回归 (不包括常数项) 并计算残差平方和 SSR0 与 NSSR0。 (5). NSSR0 渐进服从 χQ12 分布,判断是否拒绝原假设。与同方差情形一样,如果我们拒绝了原假设,就意味着工具变量不是外生的;如果不能拒绝原假设,就意味着整体上工具变量是外生的。

2.3 Stata 应用举例

在 Stata 中,可以使用命令自动实现上述检验。当 2SLS 回归做完之后,用命令 estat overid 即可对工具变量的外生性进行检验,接下来我们使用案例 1 的数据举例说明。

我们要检验的是:父亲与母亲的受教育年数 (motheduc 与 fatheduc)与结构方程中的随机误差项 u1 是否相关。原假设 H0 是:整体上所有的工具变量与结构方程中的随机误差项 u1 不相关 (1)假设在 同方差 情形下,根据上述步骤,Stata 命令和结果如下所示:

.    use "D:\stata15\ado\personal\IV_2SLS\Data\mroz.dta", clear
 
.  *-过度识别约束检验(**同方差情形下**)(手动计算)
.  *-overidentifying restriction

.  *-2SLS
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc)

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
       exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
     expersq |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
       _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc

.  *-1.计算残差(u的估计值uhat)
.    cap drop uhat
.    predict uhat, residual
(325 missing values generated)
  
.  *-2.将残差与所有外生变量和工具变量做线性回归,得到R2的值
.    reg uhat $aa motheduc fatheduc

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =      0.09
       Model |  .170502977         4  .042625744   Prob > F        =    0.9845
    Residual |  192.849519       423  .455909028   R-squared       =    0.0009
-------------+----------------------------------   Adj R-squared   =   -0.0086
       Total |  193.020022       427  .452037522   Root MSE        =    .67521

------------------------------------------------------------------------------
        uhat |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |  -.0000183   .0133291    -0.00   0.999    -.0262179    .0261813
     expersq |   7.34e-07   .0003985     0.00   0.999    -.0007825     .000784
    motheduc |  -.0066065   .0118864    -0.56   0.579    -.0299704    .0167573
    fatheduc |   .0057823   .0111786     0.52   0.605    -.0161902    .0277547
       _cons |   .0109641   .1412571     0.08   0.938    -.2666892    .2886173
------------------------------------------------------------------------------
 
.  *-3.计算统计量NR2,即Sargan统计量
.    gen sargan = e(N)*e(r2)

.  *-或生成暂元在屏幕上显示
.    scalar Sargan = e(N)*e(r2)
.    dis "Sargan = " Sargan   
Sargan = .37807101

.  *-4.判断是否拒绝原假设H0:所有的外生变量与结构方程中的随机误差项u不相关
.  *-  NR2统计量服从χ2(q)分布,q为过度识别约束的个数,即多余工具变量的个数
.  *-  在本例中,由于只有一个内生变量,但工具变量有motheduc与fatheduc两个,所以q=1
.  *-  在5%的显著性水平上,χ2(1)=3.84,而NR2的值为 n_rsquare = 0.378071
.  *-  因此,不能拒绝原假设,我们对整体上工具变量的外生性是有信心的。
.    scalar pvalue = chiprob(1, Sargan) //与下一行命令等价
.    * scalar pvalue = 1-chi2(1, Sargan)
.    dis "p-value = " pvalue
p-value = .53863741

.  *-过度识别约束检验(**同方差情形下**)(Stata自动计算)
.  *-1.进行2SLS回归
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc)

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
       exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
     expersq |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
       _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc

.  *-2.过度识别约束检验
.    estat overid
  Tests of overidentifying restrictions:

  Sargan (score) chi2(1) =  .378071  (p = 0.5386)
  Basmann chi2(1)        =  .373985  (p = 0.5408)

(2)假设在 异方差 情形下,在第一步进行 2SLS 回归时增加 robust 选项,之后使用 estat overid 命令进行检验,Stata 命令和结果如下所示:

.  *-过度识别约束检验(**异方差情形下**)(手动计算)
.  *-计算educ_hat
.    reg educ motheduc fatheduc huseduc $aa

      Source |       SS           df       MS      Number of obs   =       753
-------------+----------------------------------   F(5, 747)       =    130.16
       Model |  1820.49038         5  364.098077   Prob > F        =    0.0000
    Residual |  2089.54946       747  2.79725496   R-squared       =    0.4656
-------------+----------------------------------   Adj R-squared   =    0.4620
       Total |  3910.03984       752  5.19952106   Root MSE        =    1.6725

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    motheduc |    .130004   .0223789     5.81   0.000      .086071    .1739371
    fatheduc |   .1013613   .0214423     4.73   0.000      .059267    .1434556
     huseduc |   .3715645   .0220465    16.85   0.000     .3282839     .414845
       exper |   .0532406   .0218443     2.44   0.015     .0103571    .0961241
     expersq |  -.0007403    .000708    -1.05   0.296    -.0021303    .0006497
       _cons |   5.115778    .298017    17.17   0.000     4.530727    5.700828
------------------------------------------------------------------------------
.    predict educ_hat, xb 

.  *-计算残差 r1 与 r2
.    reg motheduc $aa educ_hat

      Source |       SS           df       MS      Number of obs   =       753
-------------+----------------------------------   F(3, 749)       =    187.97
       Model |  3662.73295         3  1220.91098   Prob > F        =    0.0000
    Residual |  4864.82881       749  6.49509854   R-squared       =    0.4295
-------------+----------------------------------   Adj R-squared   =    0.4272
       Total |  8527.56175       752  11.3398428   Root MSE        =    2.5485

------------------------------------------------------------------------------
    motheduc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |  -.1051582   .0337061    -3.12   0.002    -.1713279   -.0389885
     expersq |   .0015231   .0010851     1.40   0.161     -.000607    .0036532
    educ_hat |   1.425138   .0608066    23.44   0.000     1.305767     1.54451
       _cons |  -7.412718   .7419038    -9.99   0.000    -8.869176   -5.956259
------------------------------------------------------------------------------
.    predict r1, res

.    reg fatheduc $aa educ_hat

      Source |       SS           df       MS      Number of obs   =       753
-------------+----------------------------------   F(3, 749)       =    197.80
       Model |  4242.00749         3   1414.0025   Prob > F        =    0.0000
    Residual |  5354.45466       749  7.14880462   R-squared       =    0.4420
-------------+----------------------------------   Adj R-squared   =    0.4398
       Total |  9596.46215       752  12.7612529   Root MSE        =    2.6737

------------------------------------------------------------------------------
    fatheduc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |  -.1068543   .0353617    -3.02   0.003    -.1762741   -.0374346
     expersq |   .0014908   .0011384     1.31   0.191    -.0007439    .0037256
    educ_hat |   1.534126   .0637932    24.05   0.000     1.408891     1.65936
       _cons |  -9.170285   .7783437   -11.78   0.000    -10.69828    -7.64229
------------------------------------------------------------------------------
.    predict r2, res

.  *-计算2SLS残差uhat
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc huseduc)

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      34.90
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1495
                                                  Root MSE        =     .66616

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0803918    .021672     3.71   0.000     .0379155    .1228681
       exper |   .0430973   .0132027     3.26   0.001     .0172204    .0689742
     expersq |  -.0008628   .0003943    -2.19   0.029    -.0016357   -.0000899
       _cons |  -.1868574   .2840591    -0.66   0.511     -.743603    .3698883
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc huseduc

.    cap drop uhat
.    predict uhat, res
(325 missing values generated)

.  *-生成新变量uhat*r1与uhat*r2
.    gen uhat_r1 = uhat * r1
(325 missing values generated)
.    gen uhat_r2 = uhat * r2  
(325 missing values generated)

.  *-计算SSR
.    gen one = 1
.    reg one uhat_r1 uhat_r2, noconstant

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(2, 426)       =      0.51
       Model |    1.018745         2  .509372498   Prob > F        =    0.6019
    Residual |  426.981255       426  1.00230342   R-squared       =    0.0024
-------------+----------------------------------   Adj R-squared   =   -0.0023
       Total |         428       428           1   Root MSE        =    1.0012

------------------------------------------------------------------------------
         one |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     uhat_r1 |  -.0270098    .028959    -0.93   0.352    -.0839302    .0299106
     uhat_r2 |  -.0004977   .0307894    -0.02   0.987    -.0610157    .0600203
------------------------------------------------------------------------------

.    predict e, res 
(325 missing values generated)
.    gen e2 = e*e
(325 missing values generated)
.    sum e2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          e2 |        428    .9976198    .0942201   .3649098   1.558913

.    scalar SSR = r(N)*r(mean)
.    dis "SSR = " SSR
SSR = 426.98126

.  *-计算N-SSR与p-value
.  *-结果为不能拒绝原假设H0,即工具变量是外生的
.    scalar N_SSR = r(N)- SSR
.    scalar pvalue = chiprob(2, N_SSR)
.    dis "N_SSR = " N_SSR
N_SSR = 1.018745
.    dis "p-value = " pvalue  
p-value = .60087251


.  *-过度识别约束检验(**异方差情形下**)(Stata自动计算)
.  *-1.进行2SLS回归,增加异方差选项robust
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc), robust

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      18.61
                                                  Prob > chi2     =     0.0003
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

------------------------------------------------------------------------------
             |               Robust
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0613966   .0331824     1.85   0.064    -.0036397     .126433
       exper |   .0441704   .0154736     2.85   0.004     .0138428     .074498
     expersq |   -.000899   .0004281    -2.10   0.036     -.001738     -.00006
       _cons |   .0481003   .4277846     0.11   0.910    -.7903421    .8865427
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc

.  *-2.过度识别约束检验
.    estat overid
  Test of overidentifying restrictions:
  Score chi2(1)          =  .443461  (p = 0.5055)

在同方差或异方差的情形下,过度识别约束的检验结果表明我们无法拒绝原假设,表明整体上所有工具变量(父母的受教育年数)是外生的。

3 内生性问题检验

3.1 单个内生变量的情形

让我们从线性模型、里面存在单一的可能是内生变量的情形开始。为了清楚标记,我们定义 y1 为因变量,可能的内生解释变量为 y2 。在所有的 2SLS 的情况中,y2 可以是连续型变量或者可以是二元哑变量,或者它可能既有连续又有离散的特征,对此并无限制。总体模型为:

上式中 z1 的维度是 1×L1 (包括常数项), δ1 的维度是 L1×1u1 是不可观测的干扰项。整套的所有的外生变量由矢量 z 来定义,它的维度是 1×L。其中,z1 是 z 的一个严格的子集。外生的假设是:

重要的是要记住上式的假设在整节内容中都成立。我们还要假设当 E(y2u1)0 时,总体模型是可以识别的,这就要求:z 中至少有一个元素不包括在 z1 中(阶条件) ;秩条件是,在 z 中但不包括在 z1 中,至少有一个元素与 y2 是部分相关的(去掉 z1 之后)。在这些假设之上,我们现在检验原假设:y2 实际上是外生变量。

Hausman(1978)提出比较 OLS 和 2SLS 的估计量 β1(δ1,α1) 作为正式的内生性检验:如果 y2 与 u1 不相关,那么 OLS 和 2SLS 的估计量应当只有抽样误差的差别。这个道理就是检验内生性的 Hausman 检验。这个统计量的原始形式对于计算来说是复杂的,因为矩阵的二次形式是奇异的,除非总体模型中没有外生变量。Hausman(1978,1983)提出了一个基于回归形式的检验,这个检验能够渐进的等价于原始形式的 Hausman 检验。并且,这个基于回归形式的检验能够容易的扩展到其他情形,包括一些非线性的模型。

为了进行基于回归的检验,我们写出包含误差项形式的 y2 对 z 的线性投影,如下:

上式中 π2 的维度是 L×1。由于 u1 与 z 不相关,且 y2 是内生的,即 Cov(zπ2+v2,u1)0,则有 Cov(v2,u1)0。**因此,当且仅当 E(u1v2)0 时 y2 是内生的。这样,我们就检验结构误差项 u1 与约减形式的误差项 v2 是否相关。**我们写出包含误差项形式的 u1 对 v2 的线性投影,如下:

上式中 ρ1=E(v2u1)/E(v22)E(v2e1)=0,并且 E(ze1)=0 (由于 u1 与 v2 分别与 z 正交)。因此,当且仅当 ρ1=0 时,y2 是内生变量。 将上式代入总体模型中可得:

上式的关键是在模型设定上 e1 与 z1y2 和 v2 不相关。因此,对于 H0:ρ1=0 的检验就可以在包括 z1 与 y2 的 OLS 模型中对变量 v2 进行标准 t 检验。由于 v2 是不可观测的,我们可以使用 v^2 ,这个 v2 的估计值是第一个阶段约减形式 OLS 回归方程 y2 对 z (y2=zπ2+v2) 的残差值。我们将 v2 以 v^2 代替,就可以得到下式:

z 包括了所有的外生变量,v^2 则为除过 z 之外能够影响 y2 的所有因素的估计值,因此,上式中的误差项 error 中不包括任何影响 y2 的因素,即 error 与 y2 不相关,**这样就能够使用 OLS 得到 δ1α1 与 ρ1 的一致估计量。**在 同方差 假定 E(u12|z,y2)=σ12 下,对 ρ^1 进行常规 OLS 的 t 检验就是对 H0:ρ1=0 的有效检验。 (需要记住的是:在 H0 下,y2 是外生变量。)在 异方差 情况下,我们使用异方差稳健型 t 检验。

由上式得到的 OLS估计量 δ1 与 α1 和 2SLS 估计量实质上是一样的。上式可以允许我们比较 OLS 与 2SLS 估计量的数量级,以判断这两者在实践中的差异是否重要,这样就不仅仅能发现 y2 是内生变量的统计显著性。并且,它还可以对我们计算的统计量进行验证。

由上式得到的 OLS 标准误不是有效的(除非  ρ1=0)。若要得到合适的标准误和检验统计量的值,我们就对总体模型使用 2SLS 回归计算得到。

在 Stata 中,可以使用命令自动实现上述检验。当 2SLS 回归做完之后,使用 estat endog 命令即可检验内生变量是否是内生的,接下来我们举例说明。

  • Stata应用举例(使用案例1的数据)

我们要检验的是 educ 是否是内生变量。原假设 H0 为:educ 是外生变量。

(1)假设在 同方差 情形下,根据上述步骤,Stata 命令和结果如下所示:

.  ***************单个内生变量*************************
.  *-Hausman检验(**同方差情形下**)(手动计算)
.  *-1.进行2SLS回归
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc)

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
       exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
     expersq |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
       _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc

.  *-2.获取参与回归的样本,给sample_2sls赋值为1
.    gen sample_2sls = e(sample)

.  *-3.对约减方程做回归:用内生变量作为因变量,所有外生变量和工具变量作为自变量
.    reg educ $aa motheduc fatheduc if sample_2sls == 1

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =     28.36
       Model |  471.620998         4   117.90525   Prob > F        =    0.0000
    Residual |  1758.57526       423  4.15738833   R-squared       =    0.2115
-------------+----------------------------------   Adj R-squared   =    0.2040
       Total |  2230.19626       427  5.22294206   Root MSE        =     2.039

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |   .0452254   .0402507     1.12   0.262    -.0338909    .1243417
     expersq |  -.0010091   .0012033    -0.84   0.402    -.0033744    .0013562
    motheduc |    .157597   .0358941     4.39   0.000      .087044    .2281501
    fatheduc |   .1895484   .0337565     5.62   0.000     .1231971    .2558997
       _cons |    9.10264   .4265614    21.34   0.000     8.264196    9.941084
------------------------------------------------------------------------------

.  *-4.计算上述约减方程的残差
.    predict vhat_reducedeq, res

.  *-5.将vhat_reducedeq加入到结构方程中进行回归
.  *-  原假设H0:beta_vhat_reducedeq = 0(educ是外生变量)
.  *-  若beta_vhat_reducedeq显著异于0,则拒绝原假设,表明educ是内生变量
.    reg lwage $aa educ vhat_reducedeq if sample_2sls == 1

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =     20.50
       Model |  36.2573159         4  9.06432898   Prob > F        =    0.0000
    Residual |  187.070135       423  .442246183   R-squared       =    0.1624
-------------+----------------------------------   Adj R-squared   =    0.1544
       Total |  223.327451       427  .523015108   Root MSE        =    .66502

--------------------------------------------------------------------------------
         lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
         exper |   .0441704   .0132394     3.34   0.001     .0181471    .0701937
       expersq |   -.000899   .0003959    -2.27   0.024    -.0016772   -.0001208
          educ |   .0613966   .0309849     1.98   0.048      .000493    .1223003
vhat_reducedeq |   .0581666   .0348073     1.67   0.095    -.0102501    .1265834
         _cons |   .0481003   .3945753     0.12   0.903    -.7274721    .8236727
--------------------------------------------------------------------------------

.  *-Hausman检验(**同方差情形下**)(Stata自动计算)
.  *-1.进行2SLS回归
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc) 

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
       exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
     expersq |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
       _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc

.  *-2.Hausman检验
.    estat endog
  Tests of endogeneity
  Ho: variables are exogenous
  Durbin (score) chi2(1)          =  2.80707  (p = 0.0938)
  Wu-Hausman F(1,423)             =  2.79259  (p = 0.0954)

(2)假设在 异方差 情形下,根据上述步骤,Stata 命令和结果如下所示:

.  *-Hausman检验(**异方差情形下**)(手动计算)
.  *-1.进行2SLS回归
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc)

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
       exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
     expersq |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
       _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc

.  *-2.获取参与回归的样本,给sample_2sls赋值为1
.    cap drop sample_2sls 
.    gen sample_2sls = e(sample)

.  *-3.对约减方程做回归:用内生变量作为因变量,所有外生变量和工具变量作为自变量
.    reg educ $aa motheduc fatheduc if sample_2sls == 1

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =     28.36
       Model |  471.620998         4   117.90525   Prob > F        =    0.0000
    Residual |  1758.57526       423  4.15738833   R-squared       =    0.2115
-------------+----------------------------------   Adj R-squared   =    0.2040
       Total |  2230.19626       427  5.22294206   Root MSE        =     2.039

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |   .0452254   .0402507     1.12   0.262    -.0338909    .1243417
     expersq |  -.0010091   .0012033    -0.84   0.402    -.0033744    .0013562
    motheduc |    .157597   .0358941     4.39   0.000      .087044    .2281501
    fatheduc |   .1895484   .0337565     5.62   0.000     .1231971    .2558997
       _cons |    9.10264   .4265614    21.34   0.000     8.264196    9.941084
------------------------------------------------------------------------------

.  *-4.计算上述约减方程的残差
.    cap drop vhat_reducedeq
.    predict vhat_reducedeq, res

.  *-5.将vhat_reducedeq加入到结构方程中进行回归
.  *-  原假设H0:beta_vhat_reducedeq = 0(educ是外生变量)
.  *-  使用稳健型标准误计算t统计量的值
.  *-  若beta_vhat_reducedeq显著异于0,则拒绝原假设,表明educ是内生变量
.    reg lwage $aa educ vhat_reducedeq if sample_2sls == 1, robust

Linear regression                               Number of obs     =        428
                                                F(4, 423)         =      21.52
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1624
                                                Root MSE          =     .66502

--------------------------------------------------------------------------------
               |               Robust
         lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
         exper |   .0441704   .0151219     2.92   0.004     .0144469    .0738939
       expersq |   -.000899   .0004152    -2.16   0.031    -.0017152   -.0000828
          educ |   .0613966   .0326667     1.88   0.061    -.0028127     .125606
vhat_reducedeq |   .0581666   .0364135     1.60   0.111    -.0134073    .1297406
         _cons |   .0481003   .4221019     0.11   0.909    -.7815781    .8777787
--------------------------------------------------------------------------------

.  *-Hausman检验(**异方差情形下**)(Stata自动计算)
.  *-1.进行2SLS回归,增加robust选项
.    ivregress 2sls lwage $aa (educ = motheduc fatheduc), robust

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      18.61
                                                  Prob > chi2     =     0.0003
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

------------------------------------------------------------------------------
             |               Robust
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0613966   .0331824     1.85   0.064    -.0036397     .126433
       exper |   .0441704   .0154736     2.85   0.004     .0138428     .074498
     expersq |   -.000899   .0004281    -2.10   0.036     -.001738     -.00006
       _cons |   .0481003   .4277846     0.11   0.910    -.7903421    .8865427
------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper expersq motheduc fatheduc

.  *-2.Hausman检验
.    estat endog
  Tests of endogeneity
  Ho: variables are exogenous
  Robust score chi2(1)            =  2.52857  (p = 0.1118)
  Robust regression F(1,423)      =  2.55166  (p = 0.1109)

在同方差假设情形下,内生性的检验结果表明在 10% 的显著性水平上可以拒绝原假设,表明在 10% 的显著性水平上, educ 变量是内生的。在异方差假设情形下,内生性的检验结果却表明在 10% 的显著性水平上无法拒绝原假设,表明 educ 变量是外生的。

3.2 多个内生变量的情形

下面将基于回归的 Hausman 检验拓展至多个内生变量的情形。定义 y2 为一个 1×G1 的向量,表示下式的总体模型中所有可能的内生变量:

上式中,α1 的维度是 G1×1。同样的,我们假设满足 2SLS 的秩条件。写出约减方程为 y2=zII2+v2,其中,Π2 的维度是 L×G1v2 的维度是 1×G1,为约减方程中的误差项。对于一般的观察,定义 v^2 的维度是 1×G1,表示对每个约减方程进行 OLS 回归之后的残差(即:将 y2 中的每一个元素对 z 做回归之后得到 RF 残差值;然后将这些残差值收集起来列在行向量 v^2 中)。接下来就可以估计下式:

并做一个标准 F 检验(原假设 H0:ρ1=0),以检验上式未被约束的模型中的 G1 个约束。若设置 ρ1=0,我们就得到了受约束的模型,这就意味着我们使用 OLS 对总体模型进行了估计。该检验还可以用在 u1 为异方差时的情形:通过使用异方差稳健型 Wald 统计量。在一些回归软件包中,例如 Stata,稳健型检验由 Ftype 检验来操作执行。

我们还可以使用 LMtype 检验来代替 F 检验。定义 u^1 为 y1 对 z1 和 y2 进行 OLS 回归之后的残差(在原假设 y2 是外生变量的情况下获得残差)。然后,从下面的回归中获得一般的 R-squared(设定 z1 中包含常数项),记为 Ru2

NRu2 渐进服从于 χG12 分布。上述检验是在原假设 H0 下的同方差情形下进行的。在异方差情形下,可以使用伍德里奇横截面与面板的计量分析书中第四章公式(4.17):令 x1=(z1,y2),且 x2=v^2

  • Stata 应用举例(使用案例2的数据)

使用案例 2 中的 card.dta 数据。在工资方程中我们增加交乘项 blackeduc,写出下述模型:

上式中 z1 包含了一个常数项,experexper2blacksouthsmsareg661,...,reg668smsa66。若 educ 与 u1 相关,那么我们认为 blackeduc 同样也与 u1 相关。若我们认为 near4(工作者是否在四年制大学附近成长大)是 educ 的一个有效工具变量,那么很自然的,blackeduc 的工具变量就是 blacknear4。注意到 blacknear4 与 u1 不相关是在条件均值假设 E(u1|z)=0 下成立的,其中 z 包括了所有的外生变量。对上式进行 OLS 回归的 Stata 命令和结果如下:

.   ***************多个内生变量*************************
.    use "D:\stata15\ado\personal\IV_2SLS\Data\card.dta", clear

.  *-手动计算 
.  *-Coviariates set up 
.    global cc "exper expersq south smsa reg661 reg662 reg663 reg664 reg665 reg666 reg667 reg668 smsa66"

.  *-OLS
.    reg lwage educ i.black##c.educ $cc 
note: educ omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =     3,010
-------------+----------------------------------   F(16, 2993)     =     80.83
       Model |  178.817017        16  11.1760636   Prob > F        =    0.0000
    Residual |  413.824594     2,993  .138264148   R-squared       =    0.3017
-------------+----------------------------------   Adj R-squared   =    0.2980
       Total |  592.641611     3,009  .196956335   Root MSE        =    .37184

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0707788   .0037548    18.85   0.000     .0634165    .0781411
     1.black |  -.4191076   .0794021    -5.28   0.000    -.5747958   -.2634194
        educ |          0  (omitted)
             |
black#c.educ |
          1  |   .0178595    .006271     2.85   0.004     .0055636    .0301554
             |
       exper |   .0821556   .0066828    12.29   0.000     .0690522    .0952589
     expersq |  -.0021349   .0003207    -6.66   0.000    -.0027638    -.001506
       south |  -.1441927   .0259827    -5.55   0.000    -.1951384    -.093247
        smsa |   .1340695   .0200931     6.67   0.000     .0946718    .1734671
      reg661 |  -.1221745   .0388047    -3.15   0.002    -.1982611    -.046088
      reg662 |  -.0232881   .0282266    -0.83   0.409    -.0786336    .0320574
      reg663 |   .0230953   .0273506     0.84   0.399    -.0305325    .0767231
      reg664 |  -.0666851   .0356556    -1.87   0.062    -.1365971    .0032269
      reg665 |   .0032644     .03614     0.09   0.928    -.0675974    .0741261
      reg666 |   .0151249   .0401224     0.38   0.706    -.0635454    .0937952
      reg667 |  -.0074966   .0394073    -0.19   0.849    -.0847648    .0697716
      reg668 |  -.1757195   .0462851    -3.80   0.000    -.2664733   -.0849657
      smsa66 |   .0249824   .0194297     1.29   0.199    -.0131144    .0630793
       _cons |    4.80677   .0752604    63.87   0.000     4.659202    4.954337
------------------------------------------------------------------------------

OLS 的结果表示黑人的教育回报率比非黑人的高1.79%。

为了检验 educ 的外生性,我们必须检验 educ 与 educblack 是否都与 u1 不相关。我们先做第一阶段的回归:将 educ 对所有外生变量再加上两个工具变量 near4 与 blacknear4 进行回归(交乘项 blacknear4 应当包括进来的原因是它可能与 educ 部分相关)。令 v^21 为该回归的 OLS 残差。类似的,将 blackeduc 对所有外生变量再加上两个工具变量 near4 与 blacknear4 进行回归,将该回归的残差记为 v^22。(注:在第二个约减形式的回归中,因变量 blackeduc 的值在样本中大多数为 0,但这并不影响内生性检验。)将 v^21 与 v^22 加入到结构方程模型中:

接下来进行联合 F 检验,计算得到 F 统计量=0.54pvalue=0.581,因此,我们不能拒绝原假设,即 educ 与 educblack 是外生的。Stata 命令和结果如下:

.   ***************多个内生变量*************************
.    use "D:\stata15\ado\personal\IV_2SLS\Data\card.dta", clear

.  *-手动计算 
.  *-Coviariates set up 
.    global cc "exper expersq south smsa reg661 reg662 reg663 reg664 reg665 reg666 reg667  reg668 smsa66"

.  *-OLS
.    reg lwage educ i.black##c.educ $cc 
note: educ omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =     3,010
-------------+----------------------------------   F(16, 2993)     =     80.83
       Model |  178.817017        16  11.1760636   Prob > F        =    0.0000
    Residual |  413.824594     2,993  .138264148   R-squared       =    0.3017
-------------+----------------------------------   Adj R-squared   =    0.2980
       Total |  592.641611     3,009  .196956335   Root MSE        =    .37184

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0707788   .0037548    18.85   0.000     .0634165    .0781411
     1.black |  -.4191076   .0794021    -5.28   0.000    -.5747958   -.2634194
        educ |          0  (omitted)
             |
black#c.educ |
          1  |   .0178595    .006271     2.85   0.004     .0055636    .0301554
             |
       exper |   .0821556   .0066828    12.29   0.000     .0690522    .0952589
     expersq |  -.0021349   .0003207    -6.66   0.000    -.0027638    -.001506
       south |  -.1441927   .0259827    -5.55   0.000    -.1951384    -.093247
        smsa |   .1340695   .0200931     6.67   0.000     .0946718    .1734671
      reg661 |  -.1221745   .0388047    -3.15   0.002    -.1982611    -.046088
      reg662 |  -.0232881   .0282266    -0.83   0.409    -.0786336    .0320574
      reg663 |   .0230953   .0273506     0.84   0.399    -.0305325    .0767231
      reg664 |  -.0666851   .0356556    -1.87   0.062    -.1365971    .0032269
      reg665 |   .0032644     .03614     0.09   0.928    -.0675974    .0741261
      reg666 |   .0151249   .0401224     0.38   0.706    -.0635454    .0937952
      reg667 |  -.0074966   .0394073    -0.19   0.849    -.0847648    .0697716
      reg668 |  -.1757195   .0462851    -3.80   0.000    -.2664733   -.0849657
      smsa66 |   .0249824   .0194297     1.29   0.199    -.0131144    .0630793
       _cons |    4.80677   .0752604    63.87   0.000     4.659202    4.954337
------------------------------------------------------------------------------

.  *-将内生变量与所有外生变量包括工具变量进行OLS回归,得到残差v21与v22
.    reg educ $cc nearc4 i.black##i.nearc4
note: 1.nearc4 omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =     3,010
-------------+----------------------------------   F(16, 2993)     =    170.69
       Model |   10287.619        16  642.976186   Prob > F        =    0.0000
    Residual |  11274.4611     2,993  3.76694323   R-squared       =    0.4771
-------------+----------------------------------   Adj R-squared   =    0.4743
       Total |  21562.0801     3,009  7.16586243   Root MSE        =    1.9409

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |  -.4125542    .033728   -12.23   0.000    -.4786866   -.3464218
     expersq |   .0008699   .0016525     0.53   0.599    -.0023703    .0041101
       south |  -.0517208   .1356037    -0.38   0.703    -.3176067    .2141651
        smsa |   .4021227    .104889     3.83   0.000     .1964609    .6077845
      reg661 |  -.2102379   .2025002    -1.04   0.299    -.6072915    .1868158
      reg662 |  -.2888672   .1473834    -1.96   0.050    -.5778502    .0001158
      reg663 |  -.2382962   .1427517    -1.67   0.095    -.5181975    .0416051
      reg664 |  -.0932447   .1862439    -0.50   0.617    -.4584237    .2719343
      reg665 |  -.4828321   .1882474    -2.56   0.010    -.8519394   -.1137248
      reg666 |  -.5129027   .2099523    -2.44   0.015     -.924568   -.1012373
      reg667 |   -.427108   .2056584    -2.08   0.038    -.8303541    -.023862
      reg668 |   .3135707   .2417323     1.30   0.195    -.1604075     .787549
      smsa66 |   .0254418   .1058119     0.24   0.810    -.1820295    .2329132
      nearc4 |   .3191761   .0978211     3.26   0.001     .1273726    .5109796
     1.black |  -.9374537    .147931    -6.34   0.000     -1.22751   -.6473969
    1.nearc4 |          0  (omitted)
             |
black#nearc4 |
        1 1  |   .0029741   .1767953     0.02   0.987    -.3436786    .3496267
             |
       _cons |    16.8492   .2149486    78.39   0.000     16.42774    17.27066
------------------------------------------------------------------------------

.    predict v21, res
.    gen black_educ = black*educ  //计算black与educ交乘项的值
.    reg black_educ $cc nearc4 i.black##i.nearc4
note: 1.nearc4 omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =     3,010
-------------+----------------------------------   F(16, 2993)     =   3680.14
       Model |  77916.1435        16  4869.75897   Prob > F        =    0.0000
    Residual |   3960.4957     2,993  1.32325282   R-squared       =    0.9516
-------------+----------------------------------   Adj R-squared   =    0.9514
       Total |  81876.6392     3,009  27.2105813   Root MSE        =    1.1503

------------------------------------------------------------------------------
  black_educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |   .0533248   .0199902     2.67   0.008     .0141289    .0925207
     expersq |   -.007937   .0009794    -8.10   0.000    -.0098574   -.0060166
       south |   -.252799   .0803708    -3.15   0.002    -.4103867   -.0952114
        smsa |   .1952868   .0621665     3.14   0.002     .0733934    .3171803
      reg661 |    .162124   .1200196     1.35   0.177    -.0732053    .3974534
      reg662 |   .0056958   .0873525     0.07   0.948    -.1655812    .1769729
      reg663 |   .0860648   .0846073     1.02   0.309    -.0798296    .2519592
      reg664 |    .113297   .1103847     1.03   0.305    -.1031406    .3297345
      reg665 |   .2615297   .1115721     2.34   0.019     .0427638    .4802956
      reg666 |   .3347247   .1244364     2.69   0.007     .0907352    .5787143
      reg667 |   .2962538   .1218915     2.43   0.015     .0572543    .5352533
      reg668 |   .0995837   .1432721     0.70   0.487     -.181338    .3805054
      smsa66 |   .0469365   .0627135     0.75   0.454    -.0760295    .1699025
      nearc4 |  -.0908895   .0579775    -1.57   0.117    -.2045693    .0227903
     1.black |    11.5499   .0876771   131.73   0.000     11.37799    11.72182
    1.nearc4 |          0  (omitted)
             |
black#nearc4 |
        1 1  |    .874705   .1047846     8.35   0.000     .6692478    1.080162
             |
       _cons |   .0948535   .1273977     0.74   0.457    -.1549425    .3446494
------------------------------------------------------------------------------

.    predict v22, res

.  *-同方差情形
.  *-将残差v21与v22加入到结构方程模型中进行OLS回归,结果记为m1
.  *-不加v21与v22时对结构方程模型进行OLS回归,结果记为m2
.  *-使用ftest命令进行F检验,判断v21与v22的联合显著性
.  *-ftest命令只适用于vce默认选项的情况下的回归
.  *-若F检验结果为拒绝,则说明v21与v22联合时的系数估计值显著异于0,表示educ与educ*black是内生的
.  *-若F检验结果为无法拒绝,则说明v21与v22联合时的系数估计值等于0,表示educ与educ*black是外生的
.    reg lwage educ i.black##c.educ $cc v21 v22
note: educ omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =     3,010
-------------+----------------------------------   F(18, 2991)     =     71.89
       Model |  178.967071        18  9.94261506   Prob > F        =    0.0000
    Residual |   413.67454     2,991  .138306433   R-squared       =    0.3020
-------------+----------------------------------   Adj R-squared   =    0.2978
       Total |  592.641611     3,009  .196956335   Root MSE        =     .3719

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1273556   .0547317     2.33   0.020       .02004    .2346712
     1.black |   -.282765   .4866263    -0.58   0.561    -1.236921    .6713912
        educ |          0  (omitted)
             |
black#c.educ |
          1  |   .0109036   .0387795     0.28   0.779    -.0651337    .0869408
             |
       exper |   .1059116   .0241963     4.38   0.000     .0584685    .1533547
     expersq |  -.0022406   .0004635    -4.83   0.000    -.0031493   -.0013318
       south |  -.1424762   .0272675    -5.23   0.000    -.1959412   -.0890112
        smsa |   .1111556   .0304028     3.66   0.000     .0515431    .1707681
      reg661 |  -.1103479   .0410557    -2.69   0.007    -.1908481   -.0298477
      reg662 |  -.0081783   .0317789    -0.26   0.797     -.070489    .0541325
      reg663 |   .0382414   .0314436     1.22   0.224    -.0234119    .0998946
      reg664 |  -.0600379   .0368007    -1.63   0.103    -.1321951    .0121194
      reg665 |   .0337805   .0479745     0.70   0.481     -.060286    .1278469
      reg666 |   .0498975   .0537534     0.93   0.353    -.0554998    .1552948
      reg667 |   .0216942   .0501526     0.43   0.665    -.0766428    .1200312
      reg668 |  -.1908353   .0485659    -3.93   0.000    -.2860613   -.0956092
      smsa66 |   .0180009   .0207769     0.87   0.386    -.0227375    .0587393
         v21 |  -.0568274   .0548612    -1.04   0.300    -.1643969    .0507422
         v22 |   .0070106   .0392971     0.18   0.858    -.0700415    .0840627
       _cons |   3.844991   .9314527     4.13   0.000     2.018638    5.671343
------------------------------------------------------------------------------

.    est store m1

.    reg lwage educ i.black##c.educ $cc
note: educ omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =     3,010
-------------+----------------------------------   F(16, 2993)     =     80.83
       Model |  178.817017        16  11.1760636   Prob > F        =    0.0000
    Residual |  413.824594     2,993  .138264148   R-squared       =    0.3017
-------------+----------------------------------   Adj R-squared   =    0.2980
       Total |  592.641611     3,009  .196956335   Root MSE        =    .37184

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0707788   .0037548    18.85   0.000     .0634165    .0781411
     1.black |  -.4191076   .0794021    -5.28   0.000    -.5747958   -.2634194
        educ |          0  (omitted)
             |
black#c.educ |
          1  |   .0178595    .006271     2.85   0.004     .0055636    .0301554
             |
       exper |   .0821556   .0066828    12.29   0.000     .0690522    .0952589
     expersq |  -.0021349   .0003207    -6.66   0.000    -.0027638    -.001506
       south |  -.1441927   .0259827    -5.55   0.000    -.1951384    -.093247
        smsa |   .1340695   .0200931     6.67   0.000     .0946718    .1734671
      reg661 |  -.1221745   .0388047    -3.15   0.002    -.1982611    -.046088
      reg662 |  -.0232881   .0282266    -0.83   0.409    -.0786336    .0320574
      reg663 |   .0230953   .0273506     0.84   0.399    -.0305325    .0767231
      reg664 |  -.0666851   .0356556    -1.87   0.062    -.1365971    .0032269
      reg665 |   .0032644     .03614     0.09   0.928    -.0675974    .0741261
      reg666 |   .0151249   .0401224     0.38   0.706    -.0635454    .0937952
      reg667 |  -.0074966   .0394073    -0.19   0.849    -.0847648    .0697716
      reg668 |  -.1757195   .0462851    -3.80   0.000    -.2664733   -.0849657
      smsa66 |   .0249824   .0194297     1.29   0.199    -.0131144    .0630793
       _cons |    4.80677   .0752604    63.87   0.000     4.659202    4.954337
------------------------------------------------------------------------------

.    est store m2

.    ftest m1 m2
Assumption: m2 nested in m1
F(  2,    2991) =      0.54
       prob > F =    0.5814

.  *-异方差情形
.  *-将残差v21与v22加入到结构方程模型中进行OLS回归,加上robust选项
.  *-test命令与ftest命令的功能是一样的,但它适用于所有vce的选项的情况下的回归
.  *-使用test命令进行联合F检验,判断v21与v22的联合显著性
.  *-若F检验结果为拒绝,则说明v21与v22联合时的系数估计值显著异于0,表示educ与educ*black是内生的
.  *-若F检验结果为无法拒绝,则说明v21与v22联合时的系数估计值等于0,表示educ与educ*black是外生的
.    reg lwage educ i.black##c.educ $ww v21 v22, robust
note: educ omitted because of collinearity

Linear regression                               Number of obs     =      3,010
                                                F(5, 3004)        =     166.95
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2174
                                                Root MSE          =     .39292

------------------------------------------------------------------------------
             |               Robust
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |  -.0598108   .0081385    -7.35   0.000    -.0757685   -.0438531
     1.black |  -3.005263   .3148406    -9.55   0.000    -3.622588   -2.387938
        educ |          0  (omitted)
             |
black#c.educ |
          1  |   .2162143   .0253083     8.54   0.000     .1665909    .2658377
             |
         v21 |    .130339   .0091265    14.28   0.000     .1124442    .1482339
         v22 |  -.1983001     .02635    -7.53   0.000     -.249966   -.1466343
       _cons |   7.153203   .1111008    64.38   0.000     6.935362    7.371044
------------------------------------------------------------------------------

.    test v21 v22
 ( 1)  v21 = 0
 ( 2)  v22 = 0
       F(  2,  3004) =  117.35
            Prob > F =    0.0000

.  *-Stata自动计算
.    gen black_nearc4 = black*nearc4

.  *-同方差情形下
.    ivregress 2sls lwage $cc (educ black_educ = nearc4 black_nearc4)

Instrumental variables (2SLS) regression          Number of obs   =      3,010
                                                  Wald chi2(15)   =     677.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.2336
                                                  Root MSE        =     .38846

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1346034   .0537559     2.50   0.012     .0292437    .2399631
  black_educ |  -.0111829   .0043351    -2.58   0.010    -.0196794   -.0026863
       exper |    .110093    .022998     4.79   0.000     .0650178    .1551682
     expersq |  -.0024288   .0003313    -7.33   0.000    -.0030782   -.0017795
       south |  -.1465937   .0274104    -5.35   0.000    -.2003172   -.0928702
        smsa |   .1123811   .0318409     3.53   0.000     .0499741    .1747882
      reg661 |  -.1051369   .0416016    -2.53   0.011    -.1866746   -.0235993
      reg662 |  -.0062918   .0327826    -0.19   0.848    -.0705446     .057961
      reg663 |   .0419279   .0315592     1.33   0.184    -.0199269    .1037827
      reg664 |  -.0557743   .0375173    -1.49   0.137    -.1293069    .0177582
      reg665 |   .0391725   .0471075     0.83   0.406    -.0531565    .1315014
      reg666 |   .0558905   .0528237     1.06   0.290    -.0476421    .1594231
      reg667 |   .0283201   .0487958     0.58   0.562    -.0673178    .1239581
      reg668 |   -.190215     .05078    -3.75   0.000     -.289742    -.090688
      smsa66 |    .019601    .021732     0.90   0.367     -.022993     .062195
       _cons |   3.721227   .9143942     4.07   0.000     1.929047    5.513406
------------------------------------------------------------------------------
Instrumented:  educ black_educ
Instruments:   exper expersq south smsa reg661 reg662 reg663 reg664 reg665
               reg666 reg667 reg668 smsa66 nearc4 black_nearc4

.    estat endog
  Tests of endogeneity
  Ho: variables are exogenous
  Durbin (score) chi2(2)          =  1.75065  (p = 0.4167)
  Wu-Hausman F(2,2992)            =  .870599  (p = 0.4188)

.  *-异方差情形下
.    ivregress 2sls lwage $cc (educ black_educ = nearc4 black_nearc4), robust

Instrumental variables (2SLS) regression          Number of obs   =      3,010
                                                  Wald chi2(15)   =     752.86
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.2336
                                                  Root MSE        =     .38846

------------------------------------------------------------------------------
             |               Robust
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1346034   .0530173     2.54   0.011     .0306913    .2385155
  black_educ |  -.0111829   .0042026    -2.66   0.008    -.0194199   -.0029459
       exper |    .110093   .0227831     4.83   0.000     .0654388    .1547471
     expersq |  -.0024288   .0003479    -6.98   0.000    -.0031107    -.001747
       south |  -.1465937   .0290684    -5.04   0.000    -.2035668   -.0896207
        smsa |   .1123811   .0313138     3.59   0.000     .0510072     .173755
      reg661 |  -.1051369   .0409012    -2.57   0.010    -.1853019    -.024972
      reg662 |  -.0062918   .0337161    -0.19   0.852    -.0723742    .0597906
      reg663 |   .0419279   .0324651     1.29   0.197    -.0217024    .1055583
      reg664 |  -.0557743   .0392737    -1.42   0.156    -.1327493    .0212007
      reg665 |   .0391725    .049584     0.79   0.430    -.0580105    .1363554
      reg666 |   .0558905   .0526598     1.06   0.289    -.0473208    .1591019
      reg667 |   .0283201   .0499725     0.57   0.571    -.0696242    .1262645
      reg668 |   -.190215   .0509013    -3.74   0.000    -.2899798   -.0904503
      smsa66 |    .019601   .0206926     0.95   0.344    -.0209557    .0601577
       _cons |   3.721227   .9006163     4.13   0.000     1.956051    5.486402
------------------------------------------------------------------------------
Instrumented:  educ black_educ
Instruments:   exper expersq south smsa reg661 reg662 reg663 reg664 reg665
               reg666 reg667 reg668 smsa66 nearc4 black_nearc4

.    estat endog
  Tests of endogeneity
  Ho: variables are exogenous
  Robust score chi2(2)            =  1.79498  (p = 0.4076)
  Robust regression F(2,2992)     =  .892218  (p = 0.4099)

4. 小结:使用 2SLS 的分析步骤

(1) 从理论上论证是否存在内生性问题,如有,则需说明内生性问题的来源; (2) 参考前期文献并结合自己的分析,选择合适的工具变量(不易); (3) 执行内生性检验,确认存在内生性问题,这与你选择的工具变量有关; (4) 执行过度识别检验,确认工具变量的合理性; (5) 完成第(4)步后,可能需要重新执行第(3)步; (6) 做第一阶段回归, 以便确认是否存在弱工具变量问题。

5. 参考文献

  • Christopher F. Baum (2006). An Introduction to Modern Econometrics Using Stata.
  • Wooldridge, J. M. (2012). Introductory Econometrics A Modern Approach.
  • Stata 初级讲义. 连玉君. [-Link-]

相关课程

免费公开课

最新课程-直播课

专题 嘉宾 直播/回看视频
最新专题 文本分析、机器学习、效率专题、生存分析等
研究设计 连玉君 我的特斯拉-实证研究设计-幻灯片-
面板模型 连玉君 动态面板模型-幻灯片-
面板模型 连玉君 直击面板数据模型 [免费公开课,2小时]
  • Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。

课程主页

课程主页

关于我们

  • Stata连享会 由中山大学连玉君老师团队创办,定期分享实证分析经验。
  • 连享会-主页知乎专栏,700+ 推文,实证分析不再抓狂。直播间 有很多视频课程,可以随时观看。
  • 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标,输入简要关键词,以便快速呈现历史推文,获取工具软件和数据下载。常见关键词:课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法

连享会小程序:扫一扫,看推文,看视频……

扫码加入连享会微信群,提问交流更方便

✏ 连享会-常见问题解答:
https://gitee.com/lianxh/Course/wikis

New! lianxh 命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh