# 归一化和标准化：别再傻傻分不清

Stata连享会   主页 || 视频 || 推文 || 知乎

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`

⛳ Stata 系列推文：

## 2. 标准化的作用

### 2.1 变量可比性

We use the standardized values so that we can compare the relative magnitudes of ${\beta }_{ozone}$ and ${\beta }_{PM}$, since the different pollutants have different units of measurement. The coefficients can be interpreted as the standard deviation change in bird abundance from a 1 standard deviation increase in ozone or particulate matter.

### 2.2 组间可比性

While the flood counts are provided separately for the Yellow River, the Yangtze river, and the Pearl River. To make the three counts comparable ( rivers have different frequencies of floods for natural and institutional reasons ) and aggregatable, we first conducted normalization on each of them.

### 2.3 跨期可比性

valid comparisons over time require that the scaling of the latent factors are comparable between periods. One way to meet this condition is to normalize each factor on the same measure every period.

In order to make test scores comparable across grades and years, I use a common convention in the literature and standardize these test scores within grade and school year so that the grade-by-year test scores have means of zero and standard deviations of one.

## 4. Stata 实现

``````clear
set seed 1234
set obs 1000

*- 生成正态分布
gen dist1 = rnormal(3, 2)
gen dist2 = rnormal(21, 3)

*- 绘图
twoway (histogram dist1)(histogram dist2) ///
(kdensity dist1)(kdensity dist2),	  ///
xlabel(-1(3)28) legend(off) graphregion(color(white))
``````

### 4.1 Z-score Normalization 实现

``````*-  Z-score normalization
ssc install norm, replace  //安装norm命令
norm dist1 dist2, method(zee)

*- 绘图
twoway (histogram zee_dist1)	///
(histogram zee_dist2, color(white) lstyle(foreground)) ///
(kdensity zee_dist1)(kdensity zee_dist2),	  ///
legend(off) graphregion(color(white))
``````

`sum` 一下也可以发现，标准化之后，两个变量都成为均值为 0 ，标准差为 1 的分布。

``````. sum

Variable |    Obs       Mean   Std. Dev.       Min        Max
-----------+---------------------------------------------------
dist1 |  1,000   2.998486   2.035739  -3.435818   9.040881
dist2 |  1,000   20.85914   2.870078   11.35959   32.52251
zee_dist1 |  1,000  -1.95e-15          1  -3.160673   2.968159
zee_dist2 |  1,000   2.71e-15          1  -3.309858   4.063779
``````

``````*- center + standardize 等价于 method(zee)
center dist1 dist2, prefix(z_) standardize

. sum zee_dist1 zee_dist2 z_dist1 z_dist2

Variable |    Obs       Mean   Std. Dev.       Min        Max
-----------+---------------------------------------------------
zee_dist1 |  1,000  -1.95e-15          1  -3.160673   2.968159
zee_dist2 |  1,000   2.71e-15          1  -3.309858   4.063779
z_dist1 |  1,000  -1.95e-15          1  -3.160673   2.968159
z_dist2 |  1,000   2.71e-15          1  -3.309858   4.063779
``````

### 4.2 Min-Max Normalization 实现

``````*- Min-Max Normalization
norm dist1 dist2, method(mmx)

*- 绘图
twoway (histogram mmx_dist1)	///
(histogram mmx_dist2, color(white) lstyle(foreground)) ///
(kdensity mmx_dist1)(kdensity mmx_dist2),	  ///
legend(off) graphregion(color(white))
``````

``````. sum dist1 dist2 mmx_dist1 mmx_dist2

Variable |    Obs       Mean   Std. Dev.       Min        Max
-----------+---------------------------------------------------
dist1 |  1,000   2.998486   2.035739  -3.435818   9.040881
dist2 |  1,000   20.85914   2.870078   11.35959   32.52251
mmx_dist1 |  1,000   .5157056   .1631632          0          1
mmx_dist2 |  1,000   .4488773   .1356183          0          1
``````

### 4.3 Mean Normalization 实现

mean ormalization 在 Stata 中则没有专门的命令，我们可以写一个简单的循环来实现 ( 实际上，前两种方法也可以自己编写循环来实现 ) ：

``````*- Mean normalization
foreach v in dist1 dist2 {
sum `v'
gen de_`v' = (`v' - r(mean)) / (r(max) - r(min))
}

*- 绘图
twoway (histogram de_dist1)	///
(histogram de_dist2, color(white) lstyle(foreground)) ///
(kdensity de_dist1)(kdensity de_dist2),	  ///
legend(off) graphregion(color(white))
``````

## 6. 参考文献

• Attanasio, O., Meghir, C., & Nix, E. (2020). Human capital development and parental investment in India. The Review of Economic Studies, 87(6), 2511-2541. -PDF-
• Bietenbeck, J., Piopiunik, M., & Wiederhold, S. (2018). Africa’s Skill Tragedy Does Teachers’ Lack of Knowledge Lead to Low Student Performance?. Journal of Human Resources, 53(3), 553-578. -PDF-
• Lee, W. S., & Li, B. G. (2021). Extreme weather and mortality: Evidence from two millennia of Chinese elites. Journal of Health Economics, 76, 102401. -PDF-
• Liang, Y., Rudik, I., Zou, E. Y., Johnston, A., Rodewald, A. D., & Kling, C. L. (2020). Conservation cobenefits from air pollution regulation: Evidence from birds. Proceedings of the National Academy of Sciences, 117(49), 30900-30906. -PDF-
• Fort, M., Ichino, A., & Zanella, G. (2020). Cognitive and noncognitive costs of day care at age 0–2 for children in advantaged families. Journal of Political Economy, 128(1), 158-205. -PDF-
• Thompson, P. N. (2021). Is four less than five? Effects of four-day school weeks on student achievement in Oregon. Journal of Public Economics, 193, 104308. -PDF-

## 7. 附：本文代码

``````clear
set seed 1234
set obs 1000
gen dist1 = rnormal(3, 2)
gen dist2 = rnormal(21, 3)

twoway (histogram dist1)(histogram dist2) ///
(kdensity dist1)(kdensity dist2),	  ///
xlabel(-3(2)32) legend(off)           ///
graphregion(color(white))

*-  Z-score normalization
ssc install norm, replace
norm dist1 dist2, method(zee)

twoway ///
(histogram zee_dist1)	///
(histogram zee_dist2, color(white) lstyle(foreground)) ///
(kdensity zee_dist1)(kdensity zee_dist2),	  ///
legend(off) graphregion(color(white))

*- Min-Max Normalization
norm dist1 dist2, method(mmx)

twoway (histogram mmx_dist1)	///
(histogram mmx_dist2, color(white) lstyle(foreground)) ///
(kdensity mmx_dist1)(kdensity mmx_dist2),	  ///
legend(off) graphregion(color(white))

*- center/standardize 等价于method(zze)
center dist1 dist2, prefix(z_) standardize

sum sum zee_dist1 zee_dist2 z_dist1 z_dist2

*- Mean normalization
foreach v in dist1 dist2 {
sum `v'
gen de_`v' = (`v' - r(mean)) / (r(max) - r(min))
}

twoway (histogram de_dist1)	///
(histogram de_dist2, color(white) lstyle(foreground)) ///
(kdensity de_dist1)(kdensity de_dist2),	  ///
legend(off) graphregion(color(white))
``````

## 8. 相关推文

Note：产生如下推文列表的 Stata 命令为：
`lianxh 离群值 对数 时间趋势 稳健性检验 安慰剂检验`

`ssc install lianxh, replace`

## 相关课程

### 最新课程-直播课

• Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

### 关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。
• 连享会-主页知乎专栏，400+ 推文，实证分析不再抓狂。直播间 有很多视频课程，可以随时观看。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会-常见问题解答：
https://gitee.com/lianxh/Course/wikis

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`