温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。
New!
lianxh
命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh
作者: 陈滨志 (英国伯明翰大学)
邮箱: Rickchen0910@163.com
目录
我们在实际的问卷收集中,会因为诸多原因存在数据缺失的问题,诸如填写问卷的人没有完成全部的问卷调查、一些跟进问题的缺失和存储设备故障等。在统计学中,补漏 (imputation) 是用替换值替换缺失数据的过程。本文将着重介绍多重补漏 (multiple imputation) 及 Stata 的实现。
由于缺少数据可能会造成分析数据的潜在问题,因此补漏被视为一种避免列表式删除具有缺失值的案例所涉及的陷阱的方法。也就是说,当一个案例缺少一个或多个值时,大多数统计数据包默认会丢弃任何具有缺失值的案例,这可能会引入偏差或影响结果的代表性。补漏通过基于其他可用信息将丢失的数据替换为估计值来保留所有情况。估算完所有缺失值后,即可使用标准技术对数据集进行分析以获取完整数据。目前国内外学者已经接受了许多理论来解释缺失的数据。
********************
*** 单一补漏方法 ***
********************
. //完整案例分析//
. *-missing()函数
. sysuse nlsw88.dta, clear
(NLSW, 1988 extract)
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
idcode | 2,246 2612.654 1480.864 1 5159
age | 2,246 39.15316 3.060002 34 46
race | 2,246 1.282725 .4754413 1 3
married | 2,246 .6420303 .4795099 0 1
never_marr~d | 2,246 .1041852 .3055687 0 1
-------------+-----------------------------------------------------
grade | 2,244 13.09893 2.521246 0 18
collgrad | 2,246 .2368655 .4252538 0 1
south | 2,246 .4194123 .4935728 0 1
smsa | 2,246 .7039181 .4566292 0 1
c_city | 2,246 .2916296 .4546139 0 1
-------------+-----------------------------------------------------
industry | 2,232 8.189516 3.010875 1 12
occupation | 2,237 4.642825 3.408897 1 13
union | 1,878 .2454739 .4304825 0 1
wage | 2,246 7.766949 5.755523 1.004952 40.74659
hours | 2,242 37.21811 10.50914 1 80
-------------+-----------------------------------------------------
ttl_exp | 2,246 12.53498 4.610208 .1153846 28.88461
tenure | 2,231 5.97785 5.510331 0 25.91667
. drop if missing(grade,indus,occup,union,hours,tenure)
(398 observations deleted)
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
idcode | 1,848 2614.384 1486.31 1 5159
age | 1,848 39.21429 3.041416 34 46
race | 1,848 1.291667 .4823869 1 3
married | 1,848 .6515152 .4766194 0 1
never_marr~d | 1,848 .1087662 .31143 0 1
-------------+-----------------------------------------------------
grade | 1,848 13.17208 2.550548 0 18
collgrad | 1,848 .2478355 .4318727 0 1
south | 1,848 .4242424 .4943612 0 1
smsa | 1,848 .7083333 .4546527 0 1
c_city | 1,848 .2938312 .4556388 0 1
-------------+-----------------------------------------------------
industry | 1,848 8.255952 3.042377 1 12
occupation | 1,848 4.62013 3.479021 1 13
union | 1,848 .2467532 .4312386 0 1
wage | 1,848 7.60597 4.173447 1.344605 39.23074
hours | 1,848 37.61905 9.957783 1 80
-------------+-----------------------------------------------------
ttl_exp | 1,848 12.86178 4.576879 .4038461 28.88461
tenure | 1,848 6.582882 5.631957 0 25.91667
. *-更为简洁的命令:-dropmiss- (外部命令)
. sysuse nlsw88.dta, clear
(NLSW, 1988 extract)
. dropmiss, any obs
(398 observations deleted)
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
idcode | 1,848 2614.384 1486.31 1 5159
age | 1,848 39.21429 3.041416 34 46
race | 1,848 1.291667 .4823869 1 3
married | 1,848 .6515152 .4766194 0 1
never_marr~d | 1,848 .1087662 .31143 0 1
-------------+-----------------------------------------------------
grade | 1,848 13.17208 2.550548 0 18
collgrad | 1,848 .2478355 .4318727 0 1
south | 1,848 .4242424 .4943612 0 1
smsa | 1,848 .7083333 .4546527 0 1
c_city | 1,848 .2938312 .4556388 0 1
-------------+-----------------------------------------------------
industry | 1,848 8.255952 3.042377 1 12
occupation | 1,848 4.62013 3.479021 1 13
union | 1,848 .2467532 .4312386 0 1
wage | 1,848 7.60597 4.173447 1.344605 39.23074
hours | 1,848 37.61905 9.957783 1 80
-------------+-----------------------------------------------------
ttl_exp | 1,848 12.86178 4.576879 .4038461 28.88461
tenure | 1,848 6.582882 5.631957 0 25.91667
可以从描述性分析看出来,dropmiss
可以有效的删除缺失值。
. //平均值补漏//
. sysuse nlsw88.dta, clear
(NLSW, 1988 extract)
. summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
idcode | 2,246 2612.654 1480.864 1 5159
age | 2,246 39.15316 3.060002 34 46
race | 2,246 1.282725 .4754413 1 3
married | 2,246 .6420303 .4795099 0 1
never_marr~d | 2,246 .1041852 .3055687 0 1
-------------+-----------------------------------------------------
grade | 2,244 13.09893 2.521246 0 18
collgrad | 2,246 .2368655 .4252538 0 1
south | 2,246 .4194123 .4935728 0 1
smsa | 2,246 .7039181 .4566292 0 1
c_city | 2,246 .2916296 .4546139 0 1
-------------+-----------------------------------------------------
industry | 2,232 8.189516 3.010875 1 12
occupation | 2,237 4.642825 3.408897 1 13
union | 1,878 .2454739 .4304825 0 1
wage | 2,246 7.766949 5.755523 1.004952 40.74659
hours | 2,242 37.21811 10.50914 1 80
-------------+-----------------------------------------------------
ttl_exp | 2,246 12.53498 4.610208 .1153846 28.88461
tenure | 2,231 5.97785 5.510331 0 25.91667
. replace grade = r(mean) if grade==.
variable grade was byte now float
(2 real changes made)
由于 sum
函数不会包括缺漏值,所以可以直接用内置 r list
进行替换。
. //回归补漏//
.
. sysuse nlsw88.dta, clear
(NLSW, 1988 extract)
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
idcode | 2,246 2612.654 1480.864 1 5159
age | 2,246 39.15316 3.060002 34 46
race | 2,246 1.282725 .4754413 1 3
married | 2,246 .6420303 .4795099 0 1
never_marr~d | 2,246 .1041852 .3055687 0 1
-------------+-----------------------------------------------------
grade | 2,244 13.09893 2.521246 0 18
collgrad | 2,246 .2368655 .4252538 0 1
south | 2,246 .4194123 .4935728 0 1
smsa | 2,246 .7039181 .4566292 0 1
c_city | 2,246 .2916296 .4546139 0 1
-------------+-----------------------------------------------------
industry | 2,232 8.189516 3.010875 1 12
occupation | 2,237 4.642825 3.408897 1 13
union | 1,878 .2454739 .4304825 0 1
wage | 2,246 7.766949 5.755523 1.004952 40.74659
hours | 2,242 37.21811 10.50914 1 80
-------------+-----------------------------------------------------
ttl_exp | 2,246 12.53498 4.610208 .1153846 28.88461
tenure | 2,231 5.97785 5.510331 0 25.91667
.
. reg wage grade hours
Source | SS df MS Number of obs = 2,240
----------+--------------------------------- F(2, 2237) = 157.14
Model | 9149.60544 2 4574.80272 Prob > F = 0.0000
Residual | 65124.5132 2,237 29.1124333 R-squared = 0.1232
----------+--------------------------------- Adj R-squared = 0.1224
Total | 74274.1187 2,239 33.172898 Root MSE = 5.3956
----------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+-----------------------------------------------------------
grade | .7176616 .0454271 15.80 0.000 .628578 .8067452
hours | .072271 .0108872 6.64 0.000 .0509209 .0936211
_cons | -4.315149 .6995475 -6.17 0.000 -5.686979 -2.943319
----------------------------------------------------------------------
. list wage hours if grade==.
+------------------+
| wage hours |
|------------------|
496. | 7.045088 40 |
2210. | 4.146536 40 |
+------------------+
. replace grade = (wage[496]+_b[_cons] - _b[hours]*hours[496])/_b[grade] ///
if (wage == wage[496]&grade==.&hours==hours[496])
. replace grade = (wage[2210]+_b[_cons] - _b[hours]*hours[2210])/_b[grade] ///
if (wage == wage[2210]&grade==.&hours==hours[2210])
. sum grade
Variable | Obs Mean Std. Dev. Min Max
----------+---------------------------------------------
grade | 2,246 13.08527 2.562064 -4.263086 18
回归分析补漏相对复杂,需要先用 reg
回归求解各个系数值,接下来使用 list
找到缺失值所在行,并使用 Stata 内置 _b
进行计算。
*== 向前向后填补 ==
*-向前填补
. sysuse nlsw88, clear
(NLSW, 1988 extract)
. sum grade
Variable | Obs Mean Std. Dev. Min Max
-------------+------------------------------------------
grade | 2,244 13.09893 2.521246 0 18
. sort grade
. replace grade = grade[_n-1] if mi(grade)
(2 real changes made)
. sum grade
Variable | Obs Mean Std. Dev. Min Max
-------------+-------------------------------------------
grade | 2,246 13.10329 2.524361 0 18
*-向后填补
. sysuse nlsw88, clear
(NLSW, 1988 extract)
. sum grade
Variable | Obs Mean Std. Dev. Min Max
-------------+------------------------------------------
grade | 2,244 13.09893 2.521246 0 18
. sort grade
. replace grade = grade[_n+1] if mi(grade)
(0 real changes made)
. sum grade
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------
grade | 2,244 13.09893 2.521246 0 18
*-面板数据填补
. use "http://www.stata-press.com/data/r13/nlswork", clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. misstable sum
Obs<.
+--------------------------
| | Unique
Variable | Obs=. Obs>. Obs<. | values Min Max
-----------+-------------------------+--------------------------
age | 24 28,510 | 33 14 46
msp | 16 28,518 | 2 0 1
nev_mar | 16 28,518 | 2 0 1
grade | 2 28,532 | 19 0 18
not_smsa | 8 28,526 | 2 0 1
c_city | 8 28,526 | 2 0 1
south | 8 28,526 | 2 0 1
ind_code | 341 28,193 | 12 1 12
occ_code | 121 28,413 | 13 1 13
union | 9,296 19,238 | 2 0 1
wks_ue | 5,704 22,830 | 61 0 76
tenure | 433 28,101 | 270 0 25.91667
hours | 67 28,467 | 85 1 168
wks_work | 703 27,831 | 105 0 104
----------------------------------------------------------------
. xtset idcode year
panel variable: idcode (unbalanced)
time variable: year, 68 to 88, but with gaps
delta: 1 unit
. by idcode: replace grade = grade[_n+1] if mi(grade) //为什么没有填补?
(0 real changes made)
在面板数据填补缺失值那里没有有效进行填补的原因是该样本只有一年期的数据,所以没办法通过向前一期或者向后一期进行填补。
上述单一补漏方法方法大多仅仅可以生成单个替补值以来解决数据缺失的问题,Rubin (1987) 开发了一种多重补漏方法,该方法(MI)是一种基于模拟,并用于处理丢失的数据的灵活统计技术。MI 作为一种缺失数据的补漏技术,具有两个主要特征:
假设一个比较不同血压治疗方法的研究,如果一些受试者移至另一个区域进而未从这些受试者中收集随访血压测量值。只要受试者的移动决定与研究中的任何项目无关,这些丢失的血压测量值都可以视为 MCAR。
沿用上述假设比较不同血压治疗方法的研究,假设某些受试者由于分配高剂量药物的严重副作用而决定退出研究。在这里,丢失血压测量值不太可能是 MCAR,因为接受较高剂量药物的受试者比受到较低剂量药物的受试者更可能遭受严重的副作用,因此更可能退出研究。血压测量值的缺失取决于所接受治疗的剂量,因此为 MAR。
沿用上述假设比较不同血压治疗方法的研究,如果出于伦理原因,让具有极高血压的受试者退出研究,则血压测量的失误将不会是 MAR。在这里血压非常高的受试者的测量值丢失是与未观测值相关。
接下来,我们用 Stata 进一步解释上述原理。首先,引入数据,并进行基本回归。这里我们使用 Stata help mi
的帮助文档中的 Fictional heart attack data 进行单变量补漏分析,各变量的具体含义如下:
. use "http://www.stata-press.com/data/r15/mheart0", clear
*-或
. webuse "mheart0", clear
. describe
Contains data from http://www.stata-press.com/data/r15/mheart0.dta
obs: 154 Fictional heart attack data; bmi missing
vars: 9 19 Jun 2016 10:50
size: 2,310
------------------------------------------------
value
variable label variable label
------------------------------------------------
attack Outcome (heart attack)
smokes Current smoker
age Age, in years
bmi Body Mass Index, kg/m^2
female Gender
hsgrad High school graduate
marstatus mar Marital status: single, married, divorced
alcohol alc Alcohol consumption: none, <2 drinks/day, >=2 drinks/day
hightar Smokes high tar cigarettes
------------------------------------------------
由 summarize
呈现的基本统计量可知,变量 bmi 有缺失值 (bmi缺失值满足哪种缺失值假设?)。由 logit
回归结果可知,该回归仅使用了完整案例分析的 132 个样本,剔出了缺失值;只有 smokes bmi 两个变量在 5%上显著。
. summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------
attack | 154 .4480519 .4989166 0 1
smokes | 154 .4155844 .4944304 0 1
age | 154 56.48829 11.73051 20.73613 87.14446
bmi | 132 25.24136 4.027137 17.22643 38.24214
female | 154 .2467532 .4325285 0 1
-------------+--------------------------------------------------
hsgrad | 154 .7532468 .4325285 0 1
marstatus | 154 1.941558 .8183916 1 3
alcohol | 154 1.181818 .6309506 0 2
hightar | 154 .2077922 .407051 0 1
. logit attack smokes age bmi hsgrad female
Iteration 0: log likelihood = -91.359017
Iteration 1: log likelihood = -79.374749
Iteration 2: log likelihood = -79.342218
Iteration 3: log likelihood = -79.34221
Logistic regression Number of obs = 132
LR chi2(5) = 24.03
Prob > chi2 = 0.0002
Log likelihood = -79.34221 Pseudo R2 = 0.1315
-----------------------------------------------------------------------
attack | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------
smokes | 1.544053 .3998329 3.86 0.000 .7603945 2.327711
age | .026112 .017042 1.53 0.125 -.0072898 .0595137
bmi | .1129938 .0500061 2.26 0.024 .0149837 .211004
hsgrad | .4048251 .4446019 0.91 0.363 -.4665786 1.276229
female | .2255301 .4527558 0.50 0.618 -.6618549 1.112915
_cons | -5.408398 1.810603 -2.99 0.003 -8.957115 -1.85968
-----------------------------------------------------------------------
Stata 中的 misstable
命令可以让我们直观的了解缺失值的数量与类型。由 misstable summarize
可知,该数据仅有 bmi 这一个变量有缺失值。由 misstable patterns
可知,bmi 数据中缺失值变量占总体的 14%。
. misstable summarize
Obs<.
+------------------------------
| | Unique
Variable | Obs=. Obs>. Obs<. | values Min Max
----------+------------------------+------------------------------
bmi | 22 132 | 132 17.22643 38.24214
------------------------------------------------------------------
. misstable patterns
Missing-value patterns
(1 means complete)
| Pattern
Percent | 1
------------+-------------
86% | 1
|
14 | 0
------------+-------------
100% |
Variables are (1) bmi
接下来,我们通过调用 mi set
和 mi register
指令来设置所需要的补漏变量,并通过调用 mi impute regress
进行单变量补漏分析,即使用高斯正态回归补漏方法填充连续变量的缺失值。在这里我们选择进行 20 次多重补漏。
. mi set wide
. mi register imputed bmi age
. mi impute regress bmi attack smokes age hsgrad female, add(20) rseed(2232)
note: variable age registered as imputed and used to model variable bmi;
this may cause some observations to be omitted from the estimation and may lead to missing imputed values
Univariate imputation Imputations = 20
Linear regression added = 20
Imputed: m=1 through m=20 updated = 0
---------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
----------+-----------------------------------+----------
bmi | 132 22 22 | 154
---------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
由上表可知,20 次补漏均完成了填补 22 个缺失值的工作,如果我们想检验每次补漏是否都正常工作,我们可以使用 mi xeq
命令来查看,在这里我们选取查看第 0 次、第 1 次和第 20 次补漏的结果。
. mi xeq 0 1 20: summarize bmi
m=0 data:
-> summarize bmi
Variable | Obs Mean Std. Dev. Min Max
-------------+-------------------------------------------------
bmi | 132 25.24136 4.027137 17.22643 38.24214
m=1 data:
-> summarize bmi
Variable | Obs Mean Std. Dev. Min Max
-------------+-------------------------------------------------
bmi | 154 25.28134 3.969649 17.22643 38.24214
m=20 data:
-> summarize bmi
Variable | Obs Mean Std. Dev. Min Max
-------------+-------------------------------------------------
bmi | 154 25.30992 4.05665 16.44644 38.24214
最后,我们使用 mi estimate
指令进行回归,查看多重补漏之后对回归结果有没有影响。
. mi estimate, dots:logit attack smokes age bmi hsgrad female
Imputations (20):
.........10.........20 done
Multiple-imputation estimates Imputations = 20
Logistic regression Number of obs = 154
Average RVI = 0.0611
Largest FMI = 0.2518
DF adjustment: Large sample DF: min = 311.30
avg = 116,139.89
max = 252,553.06
Model F test: Equal FMI F( 5,19590.7) = 3.52
Within VCE type: OIM Prob > F = 0.0035
--------------------------------------------------------------------
attack | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+-----------------------------------------------------------
smokes | 1.222431 .3608138 3.39 0.001 .5152409 1.92962
age | .0358403 .0154631 2.32 0.020 .0055329 .0661476
bmi | .1094125 .0518803 2.11 0.036 .0073322 .2114929
hsgrad | .1740094 .4055789 0.43 0.668 -.6209156 .9689344
female | -.0985455 .4191946 -0.24 0.814 -.9201594 .7230684
_cons | -5.625926 1.782136 -3.16 0.002 -9.124984 -2.126867
--------------------------------------------------------------------
与之前原数据集的 logit 回归相比较,多重补漏检测出了变量 age 在 5%水平上的显著性。
接下来,我们通过调用 mi set
和 mi register
指令来设置所需要的补漏变量,并通过调用mi impute mvn
进行连续变量的多变量补漏分析,即使用多元正态回归填充一个或多个连续变量的缺失值;调用mi impute chained
进行离散变量的多变量补漏分析。在这里我们选择进行 10 次多重补漏。
. //多变量多重补漏//
. use https://www.stata-press.com/data/r16/mheart5s0, clear
(Fictional heart attack data)
.
. mi describe
Style: mlong
last mi update 19apr2019 14:00:11, 222 days ago
Obs.: complete 126
incomplete 28 (M = 0 imputations)
---------------------
total 154
Vars.: imputed: 2; bmi(28) age(12)
passive: 0
regular: 4; attack smokes female hsgrad
system: 3; _mi_m _mi_id _mi_miss
(there are no unregistered variables)
.
. mi misstable patterns
Missing-value patterns
(1 means complete)
| Pattern
Percent | 1 2
------------+-------------
82% | 1 1
|
10 | 1 0
8 | 0 0
------------+-------------
100% |
Variables are (1) age (2) bmi
运用这次的数据集,我们发现两个变量有数据缺失的情况,其中变量 bmi 有 28 个缺失值,变量 age 有 12 个缺失值。并且全部观测值的 8%是两个变量的共同缺失值,这里两个变量遵循了「单调缺失」规律 (Monotonr),即:变量 1 的缺失值小于等于变量 2 的缺失值,且可以被变量 2 涵盖。可视化如图:
如果是单调缺失这种情况,mi impute monotone
,mi impute mvn
和 mi impute chained
都可以使用:
. mi impute monotone (regress) age bmi = attack smokes hsgrad female, add(10)
Conditional models:
age: regress age attack smokes hsgrad female
bmi: regress bmi age attack smokes hsgrad female
Multivariate imputation Imputations = 10
Monotone method added = 10
Imputed: m=1 through m=10 updated = 0
age: linear regression
bmi: linear regression
---------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
----------+-----------------------------------+----------
age | 142 12 12 | 154
bmi | 126 28 28 | 154
---------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
.
. mi impute mvn age bmi = attack smokes hsgrad female, replace nolog
Multivariate imputation Imputations = 10
Multivariate normal regression added = 0
Imputed: m=1 through m=10 updated = 10
Prior: uniform Iterations = 1000
burn-in = 100
between = 100
-----------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
------------+-----------------------------------+----------
age | 142 12 12 | 154
bmi | 126 28 28 | 154
-----------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
.
. mi impute chained (regress) age bmi = attack smokes hsgrad female, replace
note: missing-value pattern is monotone; no iteration performed
Conditional models (monotone):
age: regress age attack smokes hsgrad female
bmi: regress bmi age attack smokes hsgrad female
Performing chained iterations ...
Multivariate imputation Imputations = 10
Chained equations added = 0
Imputed: m=1 through m=10 updated = 10
Initialization: monotone Iterations = 0
burn-in = 0
age: linear regression
bmi: linear regression
---------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
----------+-----------------------------------+----------
age | 142 12 12 | 154
bmi | 126 28 28 | 154
---------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
Note:产生如下推文列表的命令为:
lianxh 机器学习, m
安装最新版lianxh
命令:ssc install lianxh, replace
webdoc init Example, replace logall plain md
********************
*****单一补漏方法*****
********************
//完整案例分析//
*-missing()函数
sysuse nlsw88.dta, clear
sum
drop if missing(grade,indus,occup,union,hours,tenure)
sum
*-更为简洁的命令:-dropmiss- (外部命令)
sysuse nlsw88.dta, clear
dropmiss, any obs // 这或许是我们所需要的
sum
//平均值补漏//
sysuse nlsw88.dta, clear
summarize
replace grade = r(mean) if grade==.
//回归补漏//
sysuse nlsw88.dta, clear
sum
reg wage grade hours
list wage hours if grade==.
replace grade = (wage[496]+_b[_cons] - _b[hours]*hours[496])/_b[grade] ///
if (wage == wage[496]&grade==.&hours==hours[496])
replace grade = (wage[2210]+_b[_cons] - _b[hours]*hours[2210])/_b[grade] ///
if (wage == wage[2210]&grade==.&hours==hours[2210])
sum grade
//向前向后填补
*——向前填补
sysuse nlsw88, clear
sum grade
sort grade
replace grade = grade[_n-1] if mi(grade)
sum grade
*——向后填补
sysuse nlsw88, clear
sum grade
sort grade
replace grade = grade[_n+1] if mi(grade)
sum grade
*——面板数据填补
use http://www.stata-press.com/data/r13/nlswork,clear
misstable sum
xtset idcode year
by idcode: replace grade = grade[_n+1] if mi(grade) //为什么没有填补?
********************
*****多重补漏方法*****
********************
//单变量多重补漏//
use http://www.stata-press.com/data/r15/mheart0,clear
describe
summarize
logit attack smokes age bmi hsgrad female
misstable summarize
misstable patterns
mi set wide
mi register imputed bmi
mi impute regress bmi attack smokes age hsgrad female, add(20) rseed(2232)
mi xeq 0 1 20: summarize bmi
mi estimate, dots:logit attack smokes age bmi hsgrad female
//多变量多重补漏//
use https://www.stata-press.com/data/r16/mheart5s0, clear
mi describe
mi misstable patterns
mi impute monotone (regress) age bmi = attack smokes hsgrad female, add(10)
mi impute mvn age bmi = attack smokes hsgrad female, replace nolog
mi impute chained (regress) age bmi = attack smokes hsgrad female, replace
连享会-直播课 上线了!
http://lianxh.duanshu.com
免费公开课:
直击面板数据模型 - 连玉君,时长:1小时40分钟,课程主页 Stata 33 讲 - 连玉君, 每讲 15 分钟. Stata 小白的取经之路 - 龙志能,时长:2 小时,课程主页 部分直播课 课程资料下载 (PPT,dofiles等)
支持回看
专题 | 嘉宾 | 直播/回看视频 |
---|---|---|
⭐ 最新专题 | 因果推断, 空间计量,寒暑假班等 | |
⭕ 数据清洗系列 | 游万海 | 直播, 88 元,已上线 |
研究设计 | 连玉君 | 我的特斯拉-实证研究设计,-幻灯片- |
面板模型 | 连玉君 | 动态面板模型,-幻灯片- |
面板模型 | 连玉君 | 直击面板数据模型 [免费公开课,2小时] |
Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。
关于我们
课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法
等
连享会小程序:扫一扫,看推文,看视频……
扫码加入连享会微信群,提问交流更方便
✏ 连享会学习群-常见问题解答汇总:
✨ https://gitee.com/arlionn/WD
New!
lianxh
命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh