I am trying to get r-squared, or explained variation, in a complex survey data using a linear regression (OLS).

In Stata, this can be done by using svy: regress. In R, however, when I use ‘survey’ package, there is no option for OLS linear regression. There is svyglm, which is generalized linear model (GLM), but this does not provide a value for explained variation (r-squared) because it isn’t OLS. Is there a way to get r-squared for complex survey data in R?

```
library(survey)
design <- svydesign(id = ~psu, strata = ~strata, weight = ~w_mec, nest = TRUE, data = sample)
model1 <- svyglm(design = design, bmi ~ 1 + age + black + hispanics + others + female + edu2 + edu3 + edu4 + near_poor + middle + high, family = gaussian(link = "identity"), data = sample)
summary(model1)
```

Above is an example of what I did in R. This doesn’t give r-squared because it’s GLM. You don’t really need to reproduce anything; this isn’t a code issue, I just want to know if there is a way to get r-squared for complex survey data in R.

Cross Validated Asked on November 14, 2021

4 AnswersTo get the multiple R2 of a svyglm model, you could also use the *summ* function in the jtools package.

Answered by David F on November 14, 2021

Because "svyglm" objects are also "lm" objects by default -- see `class(model1)`

in your example -- you can just run `summary.lm(model1)`

to get the R^2.

Answered by user697473 on November 14, 2021

For a Gaussian glm (where the population parameter is the OLS parameter) you can just divide the dispersion parameter by the population variance and subtract from 1

Using one of the examples from the `svyglm`

help page:

```
> data(api)
> dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
> api.reg <- svyglm(api.stu~enroll, design=dstrat)
> summary(api.reg)
Call:
svyglm(formula = api.stu ~ enroll, design = dstrat)
Survey design:
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.34383 11.46399 1.164 0.246
enroll 0.81454 0.02459 33.120 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 7331.633)
Number of Fisher Scoring iterations: 2
> nullmodel<-svyglm(api.stu~1,design=dstrat)
> summary(nullmodel)
Call:
svyglm(formula = api.stu ~ 1, design = dstrat)
Survey design:
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 498.23 16.06 31.02 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 137086.3)
Number of Fisher Scoring iterations: 2
> 1-7331.633/137086.3
[1] 0.9465181
```

You could also get the null-model variance using `svyvar`

```
> svyvar(~api.stu,design=dstrat)
variance SE
api.stu 137086 19197
```

And in this case we have the whole population, so we can run `lm`

on the population and compare the survey estimate of rsquared with the population value

```
> summary(lm(api.stu ~ enroll,data=apipop))
Call:
lm(formula = api.stu ~ enroll, data = apipop)
Residuals:
Min 1Q Median 3Q Max
-1021.20 -13.76 6.13 29.56 498.98
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.613245 1.953709 6.968 3.55e-12 ***
enroll 0.813556 0.002522 322.581 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 92.16 on 6155 degrees of freedom
(37 observations deleted due to missingness)
Multiple R-squared: 0.9442, Adjusted R-squared: 0.9441
F-statistic: 1.041e+05 on 1 and 6155 DF, p-value: < 2.2e-16
```

As an added bonus answer: if you want the Nagelkerke or Cox-Snell r-squared for binary or count data, there's a function `psrsq`

Answered by Thomas Lumley on November 14, 2021

You can get the amount of explained variation just by looking at the residual variance of your model and the residual variance of a null model (model with the explanatory covariates removed, including just an intercept). So the formula you need is:

R2_var = 1 - residual_variance / residual_variance_of_a_null_model

Answered by Tomas on November 14, 2021

1 Asked on December 11, 2020 by sue-doh-nimh

0 Asked on December 11, 2020 by luchonacho

1 Asked on December 11, 2020 by rumtscho

assumptions exchangeability permutation test random variable terminology

1 Asked on December 10, 2020 by quant_dev

1 Asked on December 10, 2020

asymptotics convergence estimation expected value normality assumption

1 Asked on December 10, 2020 by st4co4

2 Asked on December 10, 2020 by yaynikkiprograms

0 Asked on December 10, 2020 by statboy_41

data transformation generalized linear model interpretation link function sas

0 Asked on December 9, 2020 by aite97

0 Asked on December 9, 2020 by pinocchio

machine learning mean non independent standard deviation standard error

1 Asked on December 9, 2020 by andrew-kirk

0 Asked on December 9, 2020 by s-ramagokula-krishnan

0 Asked on December 9, 2020 by johanna

backpropagation gradient descent neural networks regression sgd

0 Asked on December 9, 2020 by ayberk-yavuz

1 Asked on December 9, 2020 by roromario

0 Asked on December 8, 2020 by soham-bagchi

1 Asked on December 8, 2020 by daniela-rodrigues

0 Asked on December 8, 2020 by rain

Get help from others!

Recent Questions

Recent Answers

- Jaime on How to do transaction on a dapp without any wallet
- user38075 on How to do transaction on a dapp without any wallet
- Riven on validar datos de un objeto JSON
- Patrick Collins on I am following Ethereum Book, but I am getting errors in the Terminal
- kjroger94 on How to do transaction on a dapp without any wallet

© 2021 InsideDarkWeb.com. All rights reserved.