# 归一化和标准化：别再傻傻分不清

## 2. 标准化的作用

### 2.1 变量可比性

We use the standardized values so that we can compare the relative magnitudes of ${\beta }_{ozone}$ and ${\beta }_{PM}$, since the different pollutants have different units of measurement. The coefficients can be interpreted as the standard deviation change in bird abundance from a 1 standard deviation increase in ozone or particulate matter.

### 2.2 组间可比性

While the flood counts are provided separately for the Yellow River, the Yangtze river, and the Pearl River. To make the three counts comparable ( rivers have different frequencies of floods for natural and institutional reasons ) and aggregatable, we first conducted normalization on each of them.

### 2.3 跨期可比性

valid comparisons over time require that the scaling of the latent factors are comparable between periods. One way to meet this condition is to normalize each factor on the same measure every period.

In order to make test scores comparable across grades and years, I use a common convention in the literature and standardize these test scores within grade and school year so that the grade-by-year test scores have means of zero and standard deviations of one.

## 4. Stata 实现

``````clear
set seed 1234
set obs 1000

*- 生成正态分布
gen dist1 = rnormal(3, 2)
gen dist2 = rnormal(21, 3)

*- 绘图
twoway (histogram dist1)(histogram dist2) ///
(kdensity dist1)(kdensity dist2),	  ///
xlabel(-1(3)28) legend(off) graphregion(color(white))
``````

### 4.1 Z-score Normalization 实现

``````*-  Z-score normalization
ssc install norm, replace  //安装norm命令
norm dist1 dist2, method(zee)

*- 绘图
twoway (histogram zee_dist1)	///
(histogram zee_dist2, color(white) lstyle(foreground)) ///
(kdensity zee_dist1)(kdensity zee_dist2),	  ///
legend(off) graphregion(color(white))
``````

`sum` 一下也可以发现，标准化之后，两个变量都成为均值为 0 ，标准差为 1 的分布。

``````. sum

Variable |    Obs       Mean   Std. Dev.       Min        Max
-----------+---------------------------------------------------
dist1 |  1,000   2.998486   2.035739  -3.435818   9.040881
dist2 |  1,000   20.85914   2.870078   11.35959   32.52251
zee_dist1 |  1,000  -1.95e-15          1  -3.160673   2.968159
zee_dist2 |  1,000   2.71e-15          1  -3.309858   4.063779
``````

``````*- center + standardize 等价于 method(zee)
center dist1 dist2, prefix(z_) standardize

. sum zee_dist1 zee_dist2 z_dist1 z_dist2

Variable |    Obs       Mean   Std. Dev.       Min        Max
-----------+---------------------------------------------------
zee_dist1 |  1,000  -1.95e-15          1  -3.160673   2.968159
zee_dist2 |  1,000   2.71e-15          1  -3.309858   4.063779
z_dist1 |  1,000  -1.95e-15          1  -3.160673   2.968159
z_dist2 |  1,000   2.71e-15          1  -3.309858   4.063779
``````

### 4.2 Min-Max Normalization 实现

``````*- Min-Max Normalization
norm dist1 dist2, method(mmx)

*- 绘图
twoway (histogram mmx_dist1)	///
(histogram mmx_dist2, color(white) lstyle(foreground)) ///
(kdensity mmx_dist1)(kdensity mmx_dist2),	  ///
legend(off) graphregion(color(white))
``````

``````. sum dist1 dist2 mmx_dist1 mmx_dist2

Variable |    Obs       Mean   Std. Dev.       Min        Max
-----------+---------------------------------------------------
dist1 |  1,000   2.998486   2.035739  -3.435818   9.040881
dist2 |  1,000   20.85914   2.870078   11.35959   32.52251
mmx_dist1 |  1,000   .5157056   .1631632          0          1
mmx_dist2 |  1,000   .4488773   .1356183          0          1
``````

### 4.3 Mean Normalization 实现

mean ormalization 在 Stata 中则没有专门的命令，我们可以写一个简单的循环来实现 ( 实际上，前两种方法也可以自己编写循环来实现 ) ：

``````*- Mean normalization
foreach v in dist1 dist2 {
sum `v'
gen de_`v' = (`v' - r(mean)) / (r(max) - r(min))
}

*- 绘图
twoway (histogram de_dist1)	///
(histogram de_dist2, color(white) lstyle(foreground)) ///
(kdensity de_dist1)(kdensity de_dist2),	  ///
legend(off) graphregion(color(white))
``````

## 6. 参考文献

## 7. 附：本文代码

``````clear
set seed 1234
set obs 1000
gen dist1 = rnormal(3, 2)
gen dist2 = rnormal(21, 3)

twoway (histogram dist1)(histogram dist2) ///
(kdensity dist1)(kdensity dist2),	  ///
xlabel(-3(2)32) legend(off)           ///
graphregion(color(white))

*-  Z-score normalization
ssc install norm, replace
norm dist1 dist2, method(zee)

twoway ///
(histogram zee_dist1)	///
(histogram zee_dist2, color(white) lstyle(foreground)) ///
(kdensity zee_dist1)(kdensity zee_dist2),	  ///
legend(off) graphregion(color(white))

*- Min-Max Normalization
norm dist1 dist2, method(mmx)

twoway (histogram mmx_dist1)	///
(histogram mmx_dist2, color(white) lstyle(foreground)) ///
(kdensity mmx_dist1)(kdensity mmx_dist2),	  ///
legend(off) graphregion(color(white))

*- center/standardize 等价于method(zze)
center dist1 dist2, prefix(z_) standardize

sum sum zee_dist1 zee_dist2 z_dist1 z_dist2

*- Mean normalization
foreach v in dist1 dist2 {
sum `v'
gen de_`v' = (`v' - r(mean)) / (r(max) - r(min))
}

twoway (histogram de_dist1)	///
(histogram de_dist2, color(white) lstyle(foreground)) ///
(kdensity de_dist1)(kdensity de_dist2),	  ///
legend(off) graphregion(color(white))
``````

