# Stata：OLS与WLS的差异检验

Source：OLS vs WLS: Dealing with heteroskedasticity

## 1. 介绍

Wooldridge 教授最近的推特提醒我们，如果模型设定是正确的，那么 OLS 和 WLS 方法都是一致的，但是 WLS 是有效的；然而如果模型设定错误，OLS 和 WLS 就可能呈现不同的结果，并且两者的估计结果都是错误的。

## 2. 选择

``````. use "http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta", clear
. keep if lnwage!=.
``````

``````. reg lnwage educ exper tenure female age agesq

Source |       SS           df       MS      Number of obs   =     1,434
-------------+----------------------------------   F(6, 1427)      =    123.27
Model |  137.953858         6  22.9923096   Prob > F        =    0.0000
Residual |  266.165946     1,427  .186521336   R-squared       =    0.3414
Total |  404.119804     1,433  .282009633   Root MSE        =    .43188
------------------------------------------------------------------------------
lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ |      0.063      0.005    12.56   0.000        0.053       0.073
exper |     -0.000      0.002    -0.14   0.890       -0.004       0.003
tenure |      0.006      0.002     3.32   0.001        0.003       0.010
female |     -0.151      0.024    -6.27   0.000       -0.198      -0.104
age |      0.112      0.008    14.63   0.000        0.097       0.127
agesq |     -0.001      0.000   -13.17   0.000       -0.001      -0.001
_cons |      0.333      0.143     2.33   0.020        0.052       0.614
------------------------------------------------------------------------------

. estat hett, iid rhs

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: educ exper tenure female age agesq
chi2(6)      =    78.98
Prob > chi2  =   0.0000

. predict resid, resid
. gen resid2=resid^2
. regress resid2 educ exper tenure female age agesq

Source |       SS           df       MS      Number of obs   =     1,434
-------------+----------------------------------   F(6, 1427)      =     13.86
Model |  20.9491235         6  3.49152058   Prob > F        =    0.0000
Residual |  359.403977     1,427   .25185983   R-squared       =    0.0551
Total |  380.353101     1,433  .265424355   Root MSE        =    .50186
------------------------------------------------------------------------------
resid2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ |     -0.010      0.006    -1.67   0.095       -0.021       0.002
exper |     -0.007      0.002    -3.34   0.001       -0.011      -0.003
tenure |     -0.003      0.002    -1.25   0.211       -0.007       0.002
female |      0.116      0.028     4.16   0.000        0.061       0.171
age |     -0.047      0.009    -5.26   0.000       -0.064      -0.029
agesq |      0.001      0.000     6.03   0.000        0.000       0.001
_cons |      1.094      0.167     6.57   0.000        0.767       1.421
------------------------------------------------------------------------------

. display "Chi2: `=e(N)*e(r2)'"
Chi2: 78.9819853933182
``````

``````. gen logresid2=log(resid2)
. regress logresid2 educ exper tenure female age agesq

Source |       SS           df       MS      Number of obs   =     1,434
-------------+----------------------------------   F(6, 1427)      =     17.75
Model |  557.714323         6  92.9523871   Prob > F        =    0.0000
Residual |  7473.81328     1,427  5.23743047   R-squared       =    0.0694
Total |   8031.5276     1,433  5.60469477   Root MSE        =    2.2885
------------------------------------------------------------------------------
logresid2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ |     -0.010      0.027    -0.37   0.710       -0.062       0.042
exper |     -0.033      0.010    -3.50   0.000       -0.052      -0.015
tenure |     -0.008      0.010    -0.81   0.416       -0.028       0.012
female |      0.648      0.128     5.09   0.000        0.398       0.899
age |     -0.266      0.040    -6.57   0.000       -0.345      -0.187
agesq |      0.004      0.000     7.28   0.000        0.003       0.005
_cons |      1.286      0.759     1.69   0.091       -0.204       2.775
------------------------------------------------------------------------------

. predictnl h_x=exp(xb())
``````

``````. qui:reg lnwage educ exper tenure female age agesq
. est sto mols
. qui:reg lnwage educ exper tenure female age agesq [aw=1/h_x]
. est sto mwls
. foreach i in lnwage educ exper tenure female age agesq {
1.   gen `i'w=`i'*sqrt(1/h_x)
2.}
. gen one =sqrt(1/h_x)
. qui:reg lnwagew educw experw tenurew femalew agew agesqw one, nocons
. est sto mtls
. esttab mols mwls mtls,se nogaps b(4) mtitle(ols wls tls)

------------------------------------------------------------
(1)             (2)             (3)
ols             wls             tls
------------------------------------------------------------
educ               0.0633***       0.0558***
(0.0050)        (0.0044)
exper             -0.0002         -0.0015
(0.0018)        (0.0017)
tenure             0.0063***       0.0033*
(0.0019)        (0.0015)
female            -0.1508***      -0.1537***
(0.0241)        (0.0218)
age                0.1118***       0.0976***
(0.0076)        (0.0082)
agesq             -0.0012***      -0.0011***
(0.0001)        (0.0001)
educw                                              0.0558***
(0.0044)
experw                                            -0.0015
(0.0017)
tenurew                                            0.0033*
(0.0015)
femalew                                           -0.1537***
(0.0218)
agew                                               0.0976***
(0.0082)
agesqw                                            -0.0011***
(0.0001)
one                                                0.7189***
(0.1615)
_cons              0.3332*         0.7189***
(0.1433)        (0.1615)
------------------------------------------------------------
N                    1434            1434            1434
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
``````

``````. hausman mols mwls

---- Coefficients ----
|      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
|      mols         mwls        Difference       Std. err.
-------------+----------------------------------------------------------------
educ |    .0632923     .0557789        .0075134        .0024064
exper |   -.0002496    -.0014931        .0012435        .0006326
tenure |    .0062977     .0032773        .0030204        .0011022
female |   -.1508285    -.1536601        .0028316        .0101524
age |    .1117635     .0975919        .0141716               .
agesq |   -.0012397    -.0010556       -.0001841               .
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from regress.
B = Inconsistent under Ha, efficient under H0; obtained from regress.

Test of H0: Difference in coefficients not systematic

chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -10.54

Warning: chi2 < 0 ==> model fitted on these data
fails to meet the asymptotic assumptions
of the Hausman test; see suest for a
generalized test.
``````

``````. suest mols mwls
inconsistent weighting types
r(322);
``````

``````. suest mols mtls

Simultaneous results for mols, mtls
Number of obs     =      1,434
------------------------------------------------------------------------------
|               Robust
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mols_mean    |
educ |      0.063      0.006    11.43   0.000        0.052       0.074
exper |     -0.000      0.002    -0.14   0.891       -0.004       0.003
tenure |      0.006      0.002     3.51   0.000        0.003       0.010
female |     -0.151      0.023    -6.52   0.000       -0.196      -0.106
age |      0.112      0.010    11.15   0.000        0.092       0.131
agesq |     -0.001      0.000   -10.01   0.000       -0.001      -0.001
_cons |      0.333      0.199     1.68   0.093       -0.056       0.722
-------------+----------------------------------------------------------------
mols_lnvar   |
_cons |     -1.679      0.073   -23.02   0.000       -1.822      -1.536
-------------+----------------------------------------------------------------
mtls_mean    |
educw |      0.056      0.005    10.91   0.000        0.046       0.066
experw |     -0.001      0.002    -0.92   0.355       -0.005       0.002
tenurew |      0.003      0.002     2.02   0.043        0.000       0.006
femalew |     -0.154      0.022    -7.10   0.000       -0.196      -0.111
agew |      0.098      0.008    11.75   0.000        0.081       0.114
agesqw |     -0.001      0.000   -10.50   0.000       -0.001      -0.001
one |      0.719      0.161     4.46   0.000        0.403       1.035
-------------+----------------------------------------------------------------
mtls_lnvar   |
_cons |      1.581      0.066    24.00   0.000        1.451       1.710
------------------------------------------------------------------------------

. test ([mols_mean]educ=[mtls_mean]educw) ([mols_mean]exper=[mtls_mean]experw)       ///
>      ([mols_mean]tenure=[mtls_mean]tenurew) ([mols_mean]female=[mtls_mean]femalew) ///
>      ([mols_mean]age=[mtls_mean]agew) ([mols_mean]agesq=[mtls_mean]agesqw)         ///
>      ([mols_mean]_cons=[mtls_mean]one)

( 1)  [mols_mean]educ - [mtls_mean]educw = 0
( 2)  [mols_mean]exper - [mtls_mean]experw = 0
( 3)  [mols_mean]tenure - [mtls_mean]tenurew = 0
( 4)  [mols_mean]female - [mtls_mean]femalew = 0
( 5)  [mols_mean]age - [mtls_mean]agew = 0
( 6)  [mols_mean]agesq - [mtls_mean]agesqw = 0
( 7)  [mols_mean]_cons - [mtls_mean]one = 0
chi2(  7) =   32.11
Prob > chi2 =    0.0000
``````

``````. foreach i in educ exper tenure female age agesq {
1.    gen `i'z=`i'*(1/h_x)
2. }
. gen onez =(1/h_x)
. reg lnwage educ exper tenure female age agesq  educz experz tenurez femalez ///
>     agez agesqz onez, robust

Linear regression                               Number of obs     =      1,434
F(13, 1420)       =      50.98
Prob > F          =     0.0000
R-squared         =     0.3554
Root MSE          =     .42829
------------------------------------------------------------------------------
|               Robust
lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ |      0.084      0.014     6.24   0.000        0.058       0.111
exper |      0.006      0.014     0.45   0.651       -0.020       0.033
tenure |      0.015      0.006     2.55   0.011        0.004       0.027
female |     -0.116      0.232    -0.50   0.617       -0.572       0.339
age |      0.148      0.091     1.63   0.103       -0.030       0.326
agesq |     -0.002      0.001    -1.38   0.167       -0.004       0.001
educz |     -0.001      0.000    -1.81   0.070       -0.001       0.000
experz |     -0.000      0.000    -0.11   0.912       -0.001       0.001
tenurez |     -0.000      0.000    -2.20   0.028       -0.000      -0.000
femalez |     -0.003      0.007    -0.47   0.641       -0.018       0.011
agez |     -0.001      0.003    -0.26   0.795       -0.006       0.005
agesqz |      0.000      0.000     0.21   0.831       -0.000       0.000
onez |      0.023      0.074     0.31   0.755       -0.122       0.168
_cons |     -0.564      0.918    -0.61   0.539       -2.364       1.236
------------------------------------------------------------------------------

. est sto mcry
. test educz  experz  tenurez  femalez  agez  agesqz onez

( 1)  educz = 0
( 2)  experz = 0
( 3)  tenurez = 0
( 4)  femalez = 0
( 5)  agez = 0
( 6)  agesqz = 0
( 7)  onez = 0
F(  7,  1420) =    5.05
Prob > F =    0.0000
``````

1. 为数据中的每个观测值生成 id 变量；
2. 复制所有的数据并将原始数据和克隆数据区别开；
3. 构造合适的权重，原始数据记为 1，克隆数据对应为 1/h(X)；
4. 使用交互项，或许也会用到聚类方法 (因为数据被复制了) 估计模型；
5. 检查存在的差异。
``````. gen id=_n
. expand 2, gen(clone)
. gen wgt=1 if clone==0
. replace wgt=(1/h_x) if clone==1
. reg lnwage c.(educ exper tenure female age agesq)##i.clone [w=wgt], cluster(id)

Linear regression                        Number of obs     =      2,868
F(13, 1433)       =      48.47
Prob > F          =     0.0000
R-squared         =     0.2875
Root MSE          =     .36353
(Std. err. adjusted for 1,434 clusters in id)
-----------------------------------------------------------------------------------
|               Robust
lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
------------------------+----------------------------------------------------------
educ |      0.063      0.006    11.40   0.000        0.052       0.074
exper |     -0.000      0.002    -0.14   0.891       -0.004       0.003
tenure |      0.006      0.002     3.50   0.000        0.003       0.010
female |     -0.151      0.023    -6.51   0.000       -0.196      -0.105
age |      0.112      0.010    11.13   0.000        0.092       0.131
agesq |     -0.001      0.000    -9.99   0.000       -0.001      -0.001
clone |      0.386      0.103     3.75   0.000        0.184       0.588
clone#c.educ |     -0.008      0.003    -2.51   0.012       -0.013      -0.002
clone#c.exper |     -0.001      0.001    -1.18   0.239       -0.003       0.001
clone#c.tenure |     -0.003      0.001    -2.79   0.005       -0.005      -0.001
clone#c.female |     -0.003      0.011    -0.25   0.803       -0.025       0.019
clone#c.age |     -0.014      0.005    -2.67   0.008       -0.025      -0.004
clone#c.agesq |      0.000      0.000     2.66   0.008        0.000       0.000
_cons |      0.333      0.199     1.67   0.094       -0.057       0.724
-----------------------------------------------------------------------------------

. test 1.clone#c.educ  1.clone#c.exper 1.clone#c.tenure 1.clone#c.female 1.clone#c.age ///
>      1.clone#c.agesq 1.clone

( 1)  1.clone#c.educ = 0
( 2)  1.clone#c.exper = 0
( 3)  1.clone#c.tenure = 0
( 4)  1.clone#c.female = 0
( 5)  1.clone#c.age = 0
( 6)  1.clone#c.agesq = 0
( 7)  1.clone = 0
F( 7,  1433) =    4.57
Prob > F =    0.0000
``````

1. WLS 的权重应该被视为外生和固定吗？
2. 还是说它们应当在估计方差-协方差矩阵式时被视为内生？

``````. keep if clone==0
. program bs_wls_ols, eclass
1.     reg lnwage educ exper tenure female age agesq
2.     matrix b1=e(b)
3.     capture drop lres
4.     predictnl lres=log((lnwage-xb())^2)
5.     reg lres educ exper tenure female age agesq
6.     capture drop nwgt
7.     predictnl nwgt=1/exp(xb())
8.     ** two steps assuming Weights change
.        reg lnwage educ exper tenure female age agesq  [w=nwgt]
9.     matrix b2=e(b)
10.     ** two steps assuming Weights do not change
.        reg lnwage educ exper tenure female age agesq  [w=1/h_x]
11.     matrix b3=e(b)
12.     ** Finally the differences
.        matrix db2=b1-b2
13.     matrix db3=b1-b3
14.     ** putting all together:
.        matrix coleq b2= wols_dw
15.     matrix coleq b3= wols_fw
16.     matrix coleq db2= dwols_dw
17.     matrix coleq db3= dwols_fw
18.     matrix b=b2,b3,db2,db3
19.     ereturn post b
20. end

. bootstrap, reps(500) seed(10) nodots: bs_wls_ols

Bootstrap results                                        Number of obs = 1,434
Replications  =   500
------------------------------------------------------------------------------
|   Observed   Bootstrap                         Normal-based
| coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
wols_dw      |
educ |      0.056      0.005    10.63   0.000        0.045       0.066
exper |     -0.001      0.002    -0.88   0.379       -0.005       0.002
tenure |      0.003      0.002     2.05   0.040        0.000       0.006
female |     -0.154      0.025    -6.16   0.000       -0.203      -0.105
age |      0.098      0.008    12.69   0.000        0.083       0.113
agesq |     -0.001      0.000   -11.12   0.000       -0.001      -0.001
_cons |      0.719      0.142     5.07   0.000        0.441       0.997
-------------+----------------------------------------------------------------
wols_fw      |
educ |      0.056      0.005    11.24   0.000        0.046       0.066
exper |     -0.001      0.002    -0.94   0.346       -0.005       0.002
tenure |      0.003      0.002     2.06   0.040        0.000       0.006
female |     -0.154      0.022    -6.97   0.000       -0.197      -0.110
age |      0.098      0.008    11.58   0.000        0.081       0.114
agesq |     -0.001      0.000   -10.44   0.000       -0.001      -0.001
_cons |      0.719      0.163     4.42   0.000        0.400       1.037
-------------+----------------------------------------------------------------
dwols_dw     |
educ |      0.008      0.004     2.02   0.044        0.000       0.015
exper |      0.001      0.001     1.07   0.286       -0.001       0.004
tenure |      0.003      0.001     2.51   0.012        0.001       0.005
female |      0.003      0.013     0.22   0.828       -0.023       0.028
age |      0.014      0.008     1.79   0.074       -0.001       0.030
agesq |     -0.000      0.000    -1.89   0.059       -0.000       0.000
_cons |     -0.386      0.163    -2.36   0.018       -0.706      -0.065
-------------+----------------------------------------------------------------
dwols_fw     |
educ |      0.008      0.003     2.48   0.013        0.002       0.013
exper |      0.001      0.001     1.14   0.254       -0.001       0.003
tenure |      0.003      0.001     2.92   0.003        0.001       0.005
female |      0.003      0.011     0.25   0.799       -0.019       0.025
age |      0.014      0.005     2.69   0.007        0.004       0.025
agesq |     -0.000      0.000    -2.75   0.006       -0.000      -0.000
_cons |     -0.386      0.103    -3.73   0.000       -0.588      -0.183
------------------------------------------------------------------------------
``````

1. 将 WLS 权重视作已知仅对 WLS 的标准误估计有轻微影响；
2. 权重被视为固定时，对于 WLS 自抽样标准误和稳健标准误是接近的；
3. 同时，正如你接下来要看到的，在两种情况下都能推断出 WLS 和 OLS 的估计系数是不同的，特别是将权重视作固定的时候。
``````. test [dwols_fw]educ [dwols_fw]exper [dwols_fw]tenure [dwols_fw]female [dwols_fw]age [dwols_fw]agesq

( 1)  [dwols_fw]educ = 0
( 2)  [dwols_fw]exper = 0
( 3)  [dwols_fw]tenure = 0
( 4)  [dwols_fw]female = 0
( 5)  [dwols_fw]age = 0
( 6)  [dwols_fw]agesq = 0
chi2(  6) =   29.33
Prob > chi2 =    0.0001

. test [dwols_dw]educ [dwols_dw]exper [dwols_dw]tenure [dwols_dw]female [dwols_dw]age [dwols_dw]agesq

( 1)  [dwols_dw]educ = 0
( 2)  [dwols_dw]exper = 0
( 3)  [dwols_dw]tenure = 0
( 4)  [dwols_dw]female = 0
( 5)  [dwols_dw]age = 0
( 6)  [dwols_dw]agesq = 0
chi2(  6) =   14.46
Prob > chi2 =    0.0249
``````

``````. program myolswls1
1.	args lnf xb1 xb2
2.    qui:{
3.    * OLS regression:
.       replace `lnf' = -(\$ML_y1-`xb1')^2
4.    * WLS where weights are added "manually"
.       replace `lnf' = `lnf'-(1/h_x)*(\$ML_y2-`xb2')^2
5.    }
6.  end
. ml model lf myolswls1 (ols:lnwage = educ exper tenure female age agesq) ///
>    (wls:lnwage = educ exper tenure female age agesq), robust maximize
. ml display

Number of obs =  1,434
Wald chi2(6)  = 506.14
Log pseudolikelihood = -7197.7961                       Prob > chi2   = 0.0000
------------------------------------------------------------------------------
|               Robust
| Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ols          |
educ |      0.063      0.006    11.43   0.000        0.052       0.074
exper |     -0.000      0.002    -0.14   0.891       -0.004       0.003
tenure |      0.006      0.002     3.51   0.000        0.003       0.010
female |     -0.151      0.023    -6.52   0.000       -0.196      -0.106
age |      0.112      0.010    11.15   0.000        0.092       0.131
agesq |     -0.001      0.000   -10.01   0.000       -0.001      -0.001
_cons |      0.333      0.199     1.68   0.093       -0.056       0.722
-------------+----------------------------------------------------------------
wls          |
educ |      0.056      0.005    10.91   0.000        0.046       0.066
exper |     -0.001      0.002    -0.92   0.355       -0.005       0.002
tenure |      0.003      0.002     2.02   0.043        0.000       0.006
female |     -0.154      0.022    -7.10   0.000       -0.196      -0.111
age |      0.098      0.008    11.75   0.000        0.081       0.114
agesq |     -0.001      0.000   -10.50   0.000       -0.001      -0.001
_cons |      0.719      0.161     4.46   0.000        0.403       1.035
------------------------------------------------------------------------------
``````

``````. test [ols=wls]

( 1)  [ols]educ - [wls]educ = 0
( 2)  [ols]exper - [wls]exper = 0
( 3)  [ols]tenure - [wls]tenure = 0
( 4)  [ols]female - [wls]female = 0
( 5)  [ols]age - [wls]age = 0
( 6)  [ols]agesq - [wls]agesq = 0
chi2(  6) =   30.39
Prob > chi2 =    0.0000
``````

