https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.glm.html
데이터를 통해 GLM으로 모델링한 model이 있다고 해보자. 함수 predict 는 모델로부터 값을 예측해주는 기능을 한다.
predict(model, newdata, type=c("link", "response", "terms"), se.fit=FALSE, dispersion=NULL, term=NULL, na.action=na.pass,..)
- Parameter
* model : glm fittef model from data.
* newdata : 예측하고자 하는 값이 있는 데이터.
* type : { link : log-odds, response : expected value, terms : constant when the given value is 0 }.
* se.fit : standard error.
* dispersion : standard error 를 구할때 사용하는 dispersion. NULL 일 경우 summary(model)$dispersion 사용.
* term : type="terms" 로 했을 경우, 꼭 들어가야 하는 term. NULL 일 경우 모든 term이 들어감.
* na.action : 데이터의 NA 값을 어떻게 처리할 것인지 판단하는 값.
- Example : https://stats.idre.ucla.edu/r/dae/negative-binomial-regression/
m1 <- glm.nb(daysabs ~ math + prog, data = dat) # 2 independent variable, math and prog
newdata1 <- data.frame(math = mean(dat$math), prog = factor(1:3, levels = 1:3,
labels = levels(dat$prog)))
newdata1$phat <- predict(m1, newdata1, type = "response")
## math prog phat
## 1 48.27 General 10.237
## 2 48.27 Academic 6.588
## 3 48.27 Vocational 2.850
newdata2 <- data.frame(
math = rep(seq(from = min(dat$math), to = max(dat$math), length.out = 100), 3),
prog = factor(rep(1:3, each = 100), levels = 1:3, labels =
levels(dat$prog)))
newdata2 <- cbind(newdata2, predict(m1, newdata2, type = "link", se.fit=TRUE))
newdata2 <- within(newdata2, {
DaysAbsent <- exp(fit)
LL <- exp(fit - 1.96 * se.fit)
UL <- exp(fit + 1.96 * se.fit)
})
## math prog fit se.fit residual.scale UL LL DaysAbsent
## 1 1.000000 General 2.609272 0.19606060 1 19.956427 9.253424 13.589161
## 2 1.989899 General 2.603340 0.19469699 1 19.785436 9.223309 13.508782
## 3 2.979798 General 2.597408 0.19335557 1 19.616763 9.192893 13.428879
## 4 3.969697 General 2.591475 0.19203679 1 19.450392 9.162169 13.349449
...
## 122 21.787879 Academic 2.043931 0.09627755 1 9.324403 6.393149 7.720900
## 123 22.777778 Academic 2.037998 0.09499133 1 9.245912 6.371376 7.675232
## 124 23.767677 Academic 2.032066 0.09375306 1 9.168943 6.349081 7.629833
## 125 24.757576 Academic 2.026134 0.09256468 1 9.093504 6.326245 7.584704
summary(m1)
##
## Call:
## glm.nb(formula = daysabs ~ math + prog, data = dat, init.theta = 1.032713156,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.155 -1.019 -0.369 0.229 2.527
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.61527 0.19746 13.24 < 2e-16 ***
## math -0.00599 0.00251 -2.39 0.017 *
## progAcademic -0.44076 0.18261 -2.41 0.016 *
## progVocational -1.27865 0.20072 -6.37 1.9e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(1.033) family taken to be 1)
##
## Null deviance: 427.54 on 313 degrees of freedom
## Residual deviance: 358.52 on 310 degrees of freedom
## AIC: 1741
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 1.033
## Std. Err.: 0.106
##
## 2 x log-likelihood: -1731.258
* dispersion parameter = ( sum of squared Pearson Residuals ) / ( degrees of freedom )
Reference
- https://feliperego.github.io/blog/2015/10/23/Interpreting-Model-Output-In-R
'R' 카테고리의 다른 글
[R] TCGAbiobanks (0) | 2022.03.10 |
---|---|
[R] table information (0) | 2022.01.06 |
[R] Enhanced Volcano plot (0) | 2021.10.12 |
[R] boxplot options (0) | 2021.10.06 |
[R] install packages (0) | 2021.10.05 |
댓글