06-05-2015
Explain a variable y as a linear function of x (and z, etc.).
model <- lm(dist ~ speed, data=cars)
head(my.cars) #See first observations
## speed dist id ## 1 4 2 1 ## 2 4 10 2 ## 3 7 4 3 ## 4 7 22 4 ## 5 8 16 5 ## 6 9 10 6
str(my.cars) #See datatypes in dataframe
## 'data.frame': 50 obs. of 3 variables: ## $ speed: num 4 4 7 7 8 9 10 10 10 11 ... ## $ dist : num 2 10 4 22 16 10 18 26 34 17 ... ## $ id : Factor w/ 50 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
head(my.cars) #See first observations
## speed dist id ## 1 4 2 1 ## 2 4 10 1 ## 3 7 4 1 ## 4 7 22 1 ## 5 8 16 1 ## 6 9 10 1
str(my.cars) #See datatypes in dataframe
## 'data.frame': 50 obs. of 3 variables: ## $ speed: num 4 4 7 7 8 9 10 10 10 11 ... ## $ dist : num 2 10 4 22 16 10 18 26 34 17 ... ## $ id : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
A dataset: How boring is it to participate in a psychophysics experiment?
summary(data) #summary of data
## rating id ## Min. :1.000 1 :10 ## 1st Qu.:3.933 2 :10 ## Median :5.065 3 :10 ## Mean :5.133 4 :10 ## 3rd Qu.:6.624 5 :10 ## Max. :9.843 6 :10 ## (Other):40
Calculate mean of rating per subject, then model.
summary.data1 <- tapply(rating2,id,mean) summary.model1 <- lm(summary.data~1)
Make a linear mixed model using subject as random effect
library(lme4) #First load package "lme4" mixed.model1 <- lmer(rating2~1+(1|id))
display(summary.model1)
## lm(formula = summary.data ~ 1) ## coef.est coef.se ## (Intercept) 4.91 0.43 ## --- ## n = 10, k = 1 ## residual sd = 1.36, R-Squared = 0.00
display(mixed.model1)
## lmer(formula = rating2 ~ 1 + (1 | id)) ## coef.est coef.se ## 4.91 0.43 ## ## Error terms: ## Groups Name Std.Dev. ## id (Intercept) 1.23 ## Residual 1.83 ## --- ## number of obs: 100, groups: id, 10 ## AIC = 426.4, DIC = 420.6 ## deviance = 420.5
Calculate mean of rating per subject, then model.
summary.data2 <- aggregate(rating3,by=list(id=id),mean,na.rm=T) summary.model2 <- lm(summary.data2$x~1)
Make a linear mixed model using subject as random effect
mixed.model2 <- lmer(rating3~1+(1|id))
display(summary.model2)
## lm(formula = summary.data2$x ~ 1) ## coef.est coef.se ## (Intercept) 4.81 0.52 ## --- ## n = 10, k = 1 ## residual sd = 1.65, R-Squared = 0.00
display(mixed.model2)
## lmer(formula = rating3 ~ 1 + (1 | id)) ## coef.est coef.se ## 4.85 0.53 ## ## Error terms: ## Groups Name Std.Dev. ## id (Intercept) 1.51 ## Residual 1.82 ## --- ## number of obs: 69, groups: id, 10 ## AIC = 299.8, DIC = 294.8 ## deviance = 294.3
table(id)
## id ## 1 2 3 4 5 6 7 8 9 10 ## 3 10 6 5 7 4 7 9 8 10
"Because of the conflicting definitions and advice, we avoid the terms "fixed" and "random" entirely, and focus on the description of the model itself […]"
- Andrew Gelman & Jennifer Hill
Fixed effects are specified similar to lm/glm
formula <- y ~ x ## y as a function of x
y ~ 1 ## model the intercept for "y" y ~ x ## model the main effect of x and the intercept for y y ~ x + 1 ## the same as above (+ 1 is implicit) y ~ x + 0 ## model the main effect of x and no intercept y ~ x - 1 ## the same as above y ~ 0 ## doesn't model anything (Thanks Lau for including this!) y ~ x + z ## model the main effects x and z (and an intercept) y ~ x:z ## model interaction of x and z y ~ x * z ## model the main effects x and z and their interaction y ~ x + z + x:z ## the same as above
intercept.model <- lmer(y ~ x + (1|id)) slope.and.intercept.model <- lmer(y ~ x + (x|id)) slope.model <- lmer(y ~ x + (x-1|id))
str(time.data)
## 'data.frame': 1200 obs. of 3 variables: ## $ rating: num 4.93 4.96 4.29 5.74 6.01 ... ## $ time : int 1 2 3 4 5 6 7 8 9 10 ... ## $ id : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(time.data)
## rating time id ## Min. : 1.094 Min. : 1.00 1 : 60 ## 1st Qu.: 3.599 1st Qu.:15.75 2 : 60 ## Median : 5.337 Median :30.50 3 : 60 ## Mean : 5.236 Mean :30.50 4 : 60 ## 3rd Qu.: 6.771 3rd Qu.:45.25 5 : 60 ## Max. :10.000 Max. :60.00 6 : 60 ## (Other):840
incp.model <- lmer(rating~time+(1|id),data=time.data) slop.model <- lmer(rating~time+(time-1|id),data=time.data) in.sl.model <- lmer(rating~time+(time|id),data=time.data)
Get a overview of the model using summary()
or display()
(from the package "arm"
)
display(in.sl.model)
## lmer(formula = rating ~ time + (time | id), data = time.data) ## coef.est coef.se ## (Intercept) 4.34 0.29 ## time 0.03 0.00 ## ## Error terms: ## Groups Name Std.Dev. Corr ## id (Intercept) 1.30 ## time 0.02 0.44 ## Residual 0.52 ## --- ## number of obs: 1200, groups: id, 20 ## AIC = 2019.2, DIC = 1988 ## deviance = 1997.6
fixef(in.sl.model)
## (Intercept) time ## 4.34258072 0.02928874
ranef(incp.model) #Random intercept model
## $id ## (Intercept) ## 1 0.7704406 ## 2 0.8992311 ## 3 -0.2229189 ## 4 -1.6904649 ## 5 -1.4370763 ## 6 1.2749988 ## 7 1.5859591 ## 8 1.9865373 ## 9 2.5570740 ## 10 0.5912011 ## 11 2.3379619 ## 12 0.4545416 ## 13 -0.1013762 ## 14 0.1978353 ## 15 -1.9695634 ## 16 -1.4452323 ## 17 -2.3199298 ## 18 -2.9998556 ## 19 -2.1552827 ## 20 1.6859195
coef(incp.model) #Random intercept model
## $id ## (Intercept) time ## 1 5.113021 0.02928874 ## 2 5.241812 0.02928874 ## 3 4.119662 0.02928874 ## 4 2.652116 0.02928874 ## 5 2.905504 0.02928874 ## 6 5.617580 0.02928874 ## 7 5.928540 0.02928874 ## 8 6.329118 0.02928874 ## 9 6.899655 0.02928874 ## 10 4.933782 0.02928874 ## 11 6.680543 0.02928874 ## 12 4.797122 0.02928874 ## 13 4.241204 0.02928874 ## 14 4.540416 0.02928874 ## 15 2.373017 0.02928874 ## 16 2.897348 0.02928874 ## 17 2.022651 0.02928874 ## 18 1.342725 0.02928874 ## 19 2.187298 0.02928874 ## 20 6.028500 0.02928874 ## ## attr(,"class") ## [1] "coef.mer"
ranef(in.sl.model) #random intercept and slope model
## $id ## (Intercept) time ## 1 0.8159900 -0.0015251012 ## 2 0.3586853 0.0178025097 ## 3 0.0103611 -0.0076877896 ## 4 -1.0967770 -0.0195319730 ## 5 -0.9099170 -0.0173451118 ## 6 0.9520621 0.0106139401 ## 7 1.2720089 0.0103083587 ## 8 1.7054609 0.0092122524 ## 9 1.6832843 0.0287451194 ## 10 -0.1886973 0.0257062569 ## 11 2.3117742 0.0007947901 ## 12 0.6110393 -0.0051752375 ## 13 -0.9709554 0.0286846633 ## 14 -0.8937112 0.0359970724 ## 15 -0.8856078 -0.0356944373 ## 16 -0.9620559 -0.0158941655 ## 17 -1.6196800 -0.0230281794 ## 18 -2.4672793 -0.0174776806 ## 19 -1.0789018 -0.0354391179 ## 20 1.3529166 0.0109338305
coef(in.sl.model) #random intercept and slope model
## $id ## (Intercept) time ## 1 5.158571 0.027763644 ## 2 4.701266 0.047091255 ## 3 4.352942 0.021600955 ## 4 3.245804 0.009756772 ## 5 3.432664 0.011943633 ## 6 5.294643 0.039902685 ## 7 5.614590 0.039597104 ## 8 6.048042 0.038500997 ## 9 6.025865 0.058033864 ## 10 4.153883 0.054995002 ## 11 6.654355 0.030083535 ## 12 4.953620 0.024113507 ## 13 3.371625 0.057973408 ## 14 3.448870 0.065285817 ## 15 3.456973 -0.006405692 ## 16 3.380525 0.013394579 ## 17 2.722901 0.006260566 ## 18 1.875301 0.011811064 ## 19 3.263679 -0.006150373 ## 20 5.695497 0.040222575 ## ## attr(,"class") ## [1] "coef.mer"
m1 <- lmer(y ~ x+z (x|id)) m2 <- lmer(y ~ x*z (x*z|id)) m3 <- lmer(y ~ x*z+c (x-1|id)) m4 <- lmer(y ~ x*z (x|id)+(1|c))
Warning: Only Your fantasy is the limit!
display(in.sl.model)
## lmer(formula = rating ~ time + (time | id), data = time.data) ## coef.est coef.se ## (Intercept) 4.34 0.29 ## time 0.03 0.00 ## ## Error terms: ## Groups Name Std.Dev. Corr ## id (Intercept) 1.30 ## time 0.02 0.44 ## Residual 0.52 ## --- ## number of obs: 1200, groups: id, 20 ## AIC = 2019.2, DIC = 1988 ## deviance = 1997.6
"I'am not really interested in testing whether things are different from zero. I'??d say just go with the confidence intervals, posterior inferences, cet"??
– Andrew Gelman
“The parameter estimates calculated by lmer are the maximum likelihood or the REML (residual maximum likelihood) estimates and they are not based on observed and expected mean squares or on error strata. And that's a good thing because lmer can handle unbalanced designs with multiple nested or fully crossed or partially crossed grouping factors for the random effects. This is important for analyzing data from large observational studies such as occur in psychometrics.”
– Douglas Bates
“For simple models fit to small, balanced data sets one can calculate a p-value. Not so for unbalanced data. When the number of groups and observations are large, approximations don’t matter — you can consider the ratio as having a standard normal distribution”
“Here, just say a coefficient is “significant” if |t|>2.”
– Douglas Bates
Is there a significant effect of time
?
model1 <- in.sl.model model2 <- update(model1, ~. -time) #Remove time from model
model1 <- lmer(rating~time+(time|id),data=time.data, REML=FALSE) model2 <- lmer(rating~1+(time|id),data=time.data, REML=FALSE)
or
model1 <- update(model1, REML=FALSE) model2 <- update(model2, REML=FALSE)
Is there a significant effect of time
?
display(model2)
## lmer(formula = rating ~ 1 + (time | id), data = time.data, REML = FALSE) ## coef.est coef.se ## 3.63 0.26 ## ## Error terms: ## Groups Name Std.Dev. Corr ## id (Intercept) 1.46 ## time 0.04 0.62 ## Residual 0.52 ## --- ## number of obs: 1200, groups: id, 20 ## AIC = 2028.5, DIC = 2018.5 ## deviance = 2018.5
Is there a significant effect of time
?
anova(model1, model2)
## Data: time.data ## Models: ## model2: rating ~ 1 + (time | id) ## model1: rating ~ time + (time | id) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## model2 5 2028.5 2054.0 -1009.3 2018.5 ## model1 6 2009.6 2040.2 -998.8 1997.6 20.939 1 4.741e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What is the best random stucture?
anova(incp.model,in.sl.model)
## refitting model(s) with ML (instead of REML)
## Data: time.data ## Models: ## incp.model: rating ~ time + (1 | id) ## in.sl.model: rating ~ time + (time | id) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## incp.model 4 2441.8 2462.2 -1216.9 2433.8 ## in.sl.model 6 2009.6 2040.2 -998.8 1997.6 436.21 2 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
"Essentially, all models are wrong, but some are useful"
-?? George Box
Function | Model-type | Package |
---|---|---|
lm() |
linear regression | "Stats" |
glm() |
Logistic/Poisson regression | "Stats" |
lmer() |
Mixed-effect regression | "lme4" |
glmer() |
Mixed-effect logistic/Poisson regression | "lme4" |
nlme() /nlmer() |
Non-linear mixed-models | "nlme" /"lme4" |