温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。
New!
lianxh
命令发布了:
随时搜索推文、Stata 资源。安装命令如下:
. ssc install lianxh
详情参见帮助文件 (有惊喜):
. help lianxh
⛳ Stata 系列推文:
作者:夏书浩 (中山大学)
邮箱:1012021558@qq.com
编者按:本文主要摘译自下文,特此致谢!
Source:Cox N J. Speaking Stata: A set of utilities for managing missing values[J]. The Stata Journal, 2015, 15(4): 1174-1185. -PDF-
目录
大量数据集,即使是高质量的数据集,也可能会由于各种原因存在数据缺失。为此,Stata 提供了许多分析和处理缺失值的命令,例如:
codebook
:描述缺失值的数目;egen
:生成缺失值数目的变量;ipolate
:用于填补缺失值;misstable
:报告缺失值;mvdecode
:将缺失值转换为数值,如 -99;tabulate
:可以将缺失值考虑在统计范围内。在本文中,我们将介绍一个功能更强大的处理缺失值的命令 missings
。
*命令安装
ssc install missings, replace
*命令语法
missings report [varlist] [if] [in] [, common_options observations minimum(#) percent format(format) sort show(#) list_options]
missings list [varlist] [if] [in] [, common_options minimum(#) list_options]
missings table [varlist] [if] [in] [, common_options minimum(#) tabulate_options]
missings tag [varlist] [if] [in], generate(newvar) [common_options]
missings dropvars [varlist] [, common_options force]
missings dropobs [varlist] [if] [in] [, common_options force]
common_options are numeric, string, and sysmiss.
子命令介绍
missing report
:发布一份关于在 varlist
中缺失值数量的报告;missing list
:列出 varlist
中含有缺失值的变量;missing table
:统计存在不同数量缺失值的观测数量;missing tag
:生成一个包含每个观测存在缺失值数量的变量;missings dropvars
:删除所有值都是缺失值的变量;missings dropobs
:删除 varlist
中所有值都是缺失值的观测。选项介绍
numeric (all subcommands)
:指只包含数值型变量;string (all subcommands)
:指只包含字符串变量;sysmiss (all subcommands)
:指仅包括系统性缺失 .。此选项对字符串变量不起作用,对于字符串变量,缺失值都会被视为空字符串;observations (missings report)
:指通过观察计数缺失的值,而不是默认通过变量计数;minium(#) (missings report, missings list, and missings table)
:指定展示的最小缺失值数量;percent (missings report)
:报告缺失的百分比和计数;format(format) (missings report)
:指定百分比的显示格式;generate(newvar) (missings tag)
:指定新变量的名称;force (missings dropvars and missings dropobs)
:在更改内存中的数据集时,这是必需选项。
. webuse nlswork.dta, clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. missings report
Checking missings in all variables:
15082 observations with missing values
-------------------
| #
----------+--------
age | 24
msp | 16
nev_mar | 16
grade | 2
not_smsa | 8
c_city | 8
south | 8
ind_code | 341
occ_code | 121
union | 9296
wks_ue | 5704
tenure | 433
hours | 67
wks_work | 703
-------------------
. missings report, minimum(1000)
Checking missings in all variables:
15082 observations with missing values
-----------------
| #
--------+--------
union | 9296
wks_ue | 5704
-----------------
. local xx "age nev_mar c_city south ind_code occ_code union wks_u"
. missings list `xx', minimum(4)
Checking missings in age nev_mar c_city south ind_code occ_code union wks_ue:
14800 observations with missing values
+----------------------------------------------------------------+
| age nev_mar c_city south ind_code occ_code union wks_ue |
|----------------------------------------------------------------|
21220. | 26 0 . . 7 3 . . |
22493. | 32 0 0 0 . . . . |
+----------------------------------------------------------------+
. missings table
Checking missings in all variables:
15082 observations with missing values
# of |
missing |
values | Freq. Percent Cum.
------------+-----------------------------------
0 | 13,452 47.14 47.14
1 | 13,790 48.33 95.47
2 | 964 3.38 98.85
3 | 291 1.02 99.87
4 | 32 0.11 99.98
5 | 2 0.01 99.99
6 | 3 0.01 100.00
------------+-----------------------------------
Total | 28,534 100.00
. bysort race: missings table
------------------------------------------------------
-> race = white
Checking missings in all variables:
10576 observations with missing values
# of |
missing |
values | Freq. Percent Cum.
------------+-----------------------------------
0 | 9,604 47.59 47.59
1 | 9,672 47.93 95.52
2 | 677 3.35 98.88
3 | 199 0.99 99.86
4 | 25 0.12 99.99
5 | 1 0.00 99.99
6 | 2 0.01 100.00
------------+-----------------------------------
Total | 20,180 100.00
------------------------------------------------------
-> race = black
Checking missings in all variables:
4342 observations with missing values
# of |
missing |
values | Freq. Percent Cum.
------------+-----------------------------------
0 | 3,709 46.07 46.07
1 | 3,966 49.26 95.33
2 | 278 3.45 98.78
3 | 89 1.11 99.89
4 | 7 0.09 99.98
5 | 1 0.01 99.99
6 | 1 0.01 100.00
------------+-----------------------------------
Total | 8,051 100.00
------------------------------------------------------
-> race = other
Checking missings in all variables:
164 observations with missing values
# of |
missing |
values | Freq. Percent Cum.
------------+-----------------------------------
0 | 139 45.87 45.87
1 | 152 50.17 96.04
2 | 9 2.97 99.01
3 | 3 0.99 100.00
------------+-----------------------------------
Total | 303 100.00
. missings tag, generate(nmissing) //与 egen var = rowmiss() 功能类似
Checking missings in all variables:
15082 observations with missing values
. generate newt = ""
. generate frog = .
. generate toad = .a
. missings dropvars newt frog toad, force sysmiss
Checking missings in newt frog toad:
28534 observations with system missing values
note: newt frog dropped
在本例中,newt 和 frog 被删除,而 toad 并没有被删除。这是因为它受到选项 sysmiss
的保护,如果移除这一保护,toad 将会被删除。
. missings dropvars toad, force
Checking missings in toad:
28534 observations with missing values
note: toad dropped
将观测数量扩大为 30000,则会产生 1466 个所有变量都是缺失值的观测。
. set obs 30000
number of observations (_N) was 28,534, now 30,000
. missings dropobs, force
Checking missings in idcode year birth_yr age race msp nev_mar grade collgrad not_smsa c_city south ind_code occ_code union wks_ue
ttl_exp tenure hours wks_work ln_wage:
16548 observations with missing values
(1,466 observations deleted)
Note:产生如下推文列表的 Stata 命令为:
lianxh 缺失 补漏 egen, m
安装最新版lianxh
命令:
ssc install lianxh, replace
免费公开课
最新课程-直播课
专题 | 嘉宾 | 直播/回看视频 |
---|---|---|
⭐ 最新专题 | 文本分析、机器学习、效率专题、生存分析等 | |
研究设计 | 连玉君 | 我的特斯拉-实证研究设计,-幻灯片- |
面板模型 | 连玉君 | 动态面板模型,-幻灯片- |
面板模型 | 连玉君 | 直击面板数据模型 [免费公开课,2小时] |
⛳ 课程主页
⛳ 课程主页
关于我们
课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法
等
连享会小程序:扫一扫,看推文,看视频……
扫码加入连享会微信群,提问交流更方便
✏ 连享会-常见问题解答:
✨ https://gitee.com/lianxh/Course/wikis
New!
lianxh
命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh