본문 바로가기
Statistics

Regression error

by wycho 2021. 7. 4.

- Mean Absolute Error, np.mean(np.abs((y_true - y_pred))), is related to Least Absolute Deviations or L1-norm.

- Mean Squared Error, np.mean(np.square((y_true - y_pred))).

- Root Mean Squared Error, np.sqrt(MSE(y_true, y_pred)), is called as Euclidean norm or L2-norm.

- Mean Absolute Percentage Error, np.mean(np.abs((y_true - y_pred) / y_true)) * 100.

- Mean Percentage Error, np.mean((y_true - y_pred) / y_true) * 100.

 

 

* footnote

Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance, https://www.jstor.org/stable/24869236

 

Measures of average error (such as RMSE) that are based on the sum of squared errors (i.e. on the sum of
e_i^2, e_i^2 or |e_i|^2 is not a metric because thery do not satisfy the triangle inequality of a metric) are functions of the average error (MAE), the distribution of error magnitudes (or squared errors), and n^{1/2}; therefore, they do not describe average error alone. Among the disturbing characteristics of RMSE are: it tends to become increasingly larger than MAE (but not necessarily in a monotonic fashion) as the distribution of error magnitudes becomes more variable; and, it tends to grow larger than MAE with n^{1/2}, since its lower limit is fixed at MAE and its upper limit (n^{1/2} ·MAE) increases with n^{1/2}. For these reasons, it seems to us that there is no clear interpretation of RMSE or related measures, and we recommend that such measures no longer be reported in the literature. It also occurs to us that previous model-performance evaluations and inter-comparisons, which were based primarily on RMSE or related measures, are questionable and should be reconsidered. Other commonly used bivariate statistics that share RMSE’s reliance on the sum of squares (e.g. certain correlation and skill measures) also are questionable model-performance statistics.

Our analysis indicates that MAE is the most natural measure of average error magnitude, and that (unlike RMSE) it is an unambiguous measure of average error magnitude. It seems to us that all dimensioned evaluations and inter-comparisons of average modelperformance error should be based on MAE. 

 

[MAE] (lower limit) ≤ [RMSE] ≤ [MAE * sqrt(n)] (upper limit)

 

 

 

Reference

- 회귀의 오류 지표 알아보기, https://partrita.github.io/posts/regression-error/

- Tutorial: Understanding Regression Error Metrics in Python, https://www.dataquest.io/blog/understanding-regression-error-metrics/

- https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

 

 

 

'Statistics' 카테고리의 다른 글

Count data distribution  (0) 2021.10.17
Correlation coefficient  (0) 2021.08.19
Multiple test correction  (0) 2021.06.10
ANOVA  (0) 2021.06.08
Fisher's exact test  (0) 2021.06.01

댓글