r tutorial CFIN pt. II

06-05-2015

Outline of tutorial part 2

1. Why mixed-models?

2. Random and fixed effects

3. How to make mixed-models in R

4. Tests

Linear models

Explain a variable y as a linear function of x (and z, etc.).

model <- lm(dist ~ speed, data=cars)

Assumptions

Normal of errors
Independence of observations

Residual of errors

Independence of observations

head(my.cars)   #See first observations

##   speed dist id
## 1     4    2  1
## 2     4   10  2
## 3     7    4  3
## 4     7   22  4
## 5     8   16  5
## 6     9   10  6

str(my.cars)    #See datatypes in dataframe

## 'data.frame':    50 obs. of  3 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
##  $ id   : Factor w/ 50 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

Pseudoreplications

head(my.cars)   #See first observations

##   speed dist id
## 1     4    2  1
## 2     4   10  1
## 3     7    4  1
## 4     7   22  1
## 5     8   16  1
## 6     9   10  1

str(my.cars)    #See datatypes in dataframe

## 'data.frame':    50 obs. of  3 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
##  $ id   : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

Pseudoreplications

Several observations from one participant:
- Repetition of task
- Repeated measure design
- Time courses across participants
Group/sample cluster

Solutions

Summary statistics
Mixed models

Summary or multilevel

A dataset: How boring is it to participate in a psychophysics experiment?

summary(data)   #summary of data

##      rating            id    
##  Min.   :1.000   1      :10  
##  1st Qu.:3.933   2      :10  
##  Median :5.065   3      :10  
##  Mean   :5.133   4      :10  
##  3rd Qu.:6.624   5      :10  
##  Max.   :9.843   6      :10  
##                  (Other):40

Summary or multilevel

Calculate mean of rating per subject, then model.

summary.data1 <- tapply(rating2,id,mean) 
summary.model1 <- lm(summary.data~1)

Make a linear mixed model using subject as random effect

library(lme4)  #First load package "lme4"
mixed.model1 <- lmer(rating2~1+(1|id))

Summary or multilevel

display(summary.model1)

## lm(formula = summary.data ~ 1)
##             coef.est coef.se
## (Intercept) 4.91     0.43   
## ---
## n = 10, k = 1
## residual sd = 1.36, R-Squared = 0.00

Summary or multilevel

display(mixed.model1)

## lmer(formula = rating2 ~ 1 + (1 | id))
## coef.est  coef.se 
##     4.91     0.43 
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  id       (Intercept) 1.23    
##  Residual             1.83    
## ---
## number of obs: 100, groups: id, 10
## AIC = 426.4, DIC = 420.6
## deviance = 420.5

Summary or multilevel

Unbalanced designs

Calculate mean of rating per subject, then model.

summary.data2 <- aggregate(rating3,by=list(id=id),mean,na.rm=T) 
summary.model2 <- lm(summary.data2$x~1)

Make a linear mixed model using subject as random effect

mixed.model2 <- lmer(rating3~1+(1|id))

Unbalanced designs

display(summary.model2)

## lm(formula = summary.data2$x ~ 1)
##             coef.est coef.se
## (Intercept) 4.81     0.52   
## ---
## n = 10, k = 1
## residual sd = 1.65, R-Squared = 0.00

Unbalanced designs

display(mixed.model2)

## lmer(formula = rating3 ~ 1 + (1 | id))
## coef.est  coef.se 
##     4.85     0.53 
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  id       (Intercept) 1.51    
##  Residual             1.82    
## ---
## number of obs: 69, groups: id, 10
## AIC = 299.8, DIC = 294.8
## deviance = 294.3

Unbalanced designs

table(id)

## id
##  1  2  3  4  5  6  7  8  9 10 
##  3 10  6  5  7  4  7  9  8 10

Estimation

Very short summary

Two-step estimation
One-step estimation
- Pooled
- non-pooled
- partial pooled

Note on terminology

Fixed effects and random effects.
Group level and individual level.
1st level and 2nd level.
Within effects and between effects.
Hieracical models, mixed-models, multilevel models, nested models, etc.

"Because of the conflicting definitions and advice, we avoid the terms "fixed" and "random" entirely, and focus on the description of the model itself […]"
- Andrew Gelman & Jennifer Hill

Specifying models in R

Fixed effects are specified similar to lm/glm

formula <- y ~ x  ## y as a function of x

y ~ 1           ## model the intercept for "y"
y ~ x           ## model the main effect of x and the intercept for y
y ~ x + 1       ## the same as above (+ 1 is implicit)
y ~ x + 0       ## model the main effect of x and no intercept
y ~ x - 1       ## the same as above
y ~ 0           ## doesn't model anything (Thanks Lau for including this!)
y ~ x + z       ## model the main effects x and z (and an intercept)
y ~ x:z         ## model interaction of x and z
y ~ x * z       ## model the main effects x and z and their interaction
y ~ x + z + x:z ## the same as above

Specifying random effects in lmer/glmer

intercept.model           <- lmer(y ~ x + (1|id))
slope.and.intercept.model <- lmer(y ~ x + (x|id))
slope.model               <- lmer(y ~ x + (x-1|id))

Example

str(time.data)

## 'data.frame':    1200 obs. of  3 variables:
##  $ rating: num  4.93 4.96 4.29 5.74 6.01 ...
##  $ time  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ id    : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

summary(time.data)

##      rating            time             id     
##  Min.   : 1.094   Min.   : 1.00   1      : 60  
##  1st Qu.: 3.599   1st Qu.:15.75   2      : 60  
##  Median : 5.337   Median :30.50   3      : 60  
##  Mean   : 5.236   Mean   :30.50   4      : 60  
##  3rd Qu.: 6.771   3rd Qu.:45.25   5      : 60  
##  Max.   :10.000   Max.   :60.00   6      : 60  
##                                   (Other):840

Example

How to model random effects

incp.model <- lmer(rating~time+(1|id),data=time.data)
slop.model <- lmer(rating~time+(time-1|id),data=time.data)
in.sl.model <- lmer(rating~time+(time|id),data=time.data)

Different types of models

Exploring models

Get a overview of the model using summary() or display() (from the package "arm")

display(in.sl.model)

## lmer(formula = rating ~ time + (time | id), data = time.data)
##             coef.est coef.se
## (Intercept) 4.34     0.29   
## time        0.03     0.00   
## 
## Error terms:
##  Groups   Name        Std.Dev. Corr 
##  id       (Intercept) 1.30          
##           time        0.02     0.44 
##  Residual             0.52          
## ---
## number of obs: 1200, groups: id, 20
## AIC = 2019.2, DIC = 1988
## deviance = 1997.6

Exploring models

fixef(in.sl.model)

## (Intercept)        time 
##  4.34258072  0.02928874

Exploring models

ranef(incp.model) #Random intercept model

## $id
##    (Intercept)
## 1    0.7704406
## 2    0.8992311
## 3   -0.2229189
## 4   -1.6904649
## 5   -1.4370763
## 6    1.2749988
## 7    1.5859591
## 8    1.9865373
## 9    2.5570740
## 10   0.5912011
## 11   2.3379619
## 12   0.4545416
## 13  -0.1013762
## 14   0.1978353
## 15  -1.9695634
## 16  -1.4452323
## 17  -2.3199298
## 18  -2.9998556
## 19  -2.1552827
## 20   1.6859195

Exploring models

coef(incp.model) #Random intercept model

## $id
##    (Intercept)       time
## 1     5.113021 0.02928874
## 2     5.241812 0.02928874
## 3     4.119662 0.02928874
## 4     2.652116 0.02928874
## 5     2.905504 0.02928874
## 6     5.617580 0.02928874
## 7     5.928540 0.02928874
## 8     6.329118 0.02928874
## 9     6.899655 0.02928874
## 10    4.933782 0.02928874
## 11    6.680543 0.02928874
## 12    4.797122 0.02928874
## 13    4.241204 0.02928874
## 14    4.540416 0.02928874
## 15    2.373017 0.02928874
## 16    2.897348 0.02928874
## 17    2.022651 0.02928874
## 18    1.342725 0.02928874
## 19    2.187298 0.02928874
## 20    6.028500 0.02928874
## 
## attr(,"class")
## [1] "coef.mer"

Exploring models

ranef(in.sl.model) #random intercept and slope model

## $id
##    (Intercept)          time
## 1    0.8159900 -0.0015251012
## 2    0.3586853  0.0178025097
## 3    0.0103611 -0.0076877896
## 4   -1.0967770 -0.0195319730
## 5   -0.9099170 -0.0173451118
## 6    0.9520621  0.0106139401
## 7    1.2720089  0.0103083587
## 8    1.7054609  0.0092122524
## 9    1.6832843  0.0287451194
## 10  -0.1886973  0.0257062569
## 11   2.3117742  0.0007947901
## 12   0.6110393 -0.0051752375
## 13  -0.9709554  0.0286846633
## 14  -0.8937112  0.0359970724
## 15  -0.8856078 -0.0356944373
## 16  -0.9620559 -0.0158941655
## 17  -1.6196800 -0.0230281794
## 18  -2.4672793 -0.0174776806
## 19  -1.0789018 -0.0354391179
## 20   1.3529166  0.0109338305

Exploring models

coef(in.sl.model) #random intercept and slope model

## $id
##    (Intercept)         time
## 1     5.158571  0.027763644
## 2     4.701266  0.047091255
## 3     4.352942  0.021600955
## 4     3.245804  0.009756772
## 5     3.432664  0.011943633
## 6     5.294643  0.039902685
## 7     5.614590  0.039597104
## 8     6.048042  0.038500997
## 9     6.025865  0.058033864
## 10    4.153883  0.054995002
## 11    6.654355  0.030083535
## 12    4.953620  0.024113507
## 13    3.371625  0.057973408
## 14    3.448870  0.065285817
## 15    3.456973 -0.006405692
## 16    3.380525  0.013394579
## 17    2.722901  0.006260566
## 18    1.875301  0.011811064
## 19    3.263679 -0.006150373
## 20    5.695497  0.040222575
## 
## attr(,"class")
## [1] "coef.mer"

Specifying random effects in lmer/glmer

m1 <- lmer(y ~ x+z (x|id))
m2 <- lmer(y ~ x*z (x*z|id))
m3 <- lmer(y ~ x*z+c (x-1|id))
m4 <- lmer(y ~ x*z (x|id)+(1|c))

Warning: Only Your fantasy is the limit!

What to include in the random effect

"Keep it maximal"
What would result in pseudoreplications?
Random or fixed effect? Depend on what You want to answer?
- What level estimation of parameters?

Statistical tests

display(in.sl.model)

## lmer(formula = rating ~ time + (time | id), data = time.data)
##             coef.est coef.se
## (Intercept) 4.34     0.29   
## time        0.03     0.00   
## 
## Error terms:
##  Groups   Name        Std.Dev. Corr 
##  id       (Intercept) 1.30          
##           time        0.02     0.44 
##  Residual             0.52          
## ---
## number of obs: 1200, groups: id, 20
## AIC = 2019.2, DIC = 1988
## deviance = 1997.6

Where are my P-values???

Where are my P-values?

"I'am not really interested in testing whether things are different from zero. I'??d say just go with the confidence intervals, posterior inferences, cet"??
– Andrew Gelman

Where are my P-values?

“The parameter estimates calculated by lmer are the maximum likelihood or the REML (residual maximum likelihood) estimates and they are not based on observed and expected mean squares or on error strata. And that's a good thing because lmer can handle unbalanced designs with multiple nested or fully crossed or partially crossed grouping factors for the random effects. This is important for analyzing data from large observational studies such as occur in psychometrics.”
– Douglas Bates

Where are my P-values?

“For simple models fit to small, balanced data sets one can calculate a p-value. Not so for unbalanced data. When the number of groups and observations are large, approximations don’t matter — you can consider the ratio as having a standard normal distribution”
“Here, just say a coefficient is “significant” if |t|>2.”
– Douglas Bates

Model Comparison

Is there a significant effect of time?

model1 <- in.sl.model
model2 <- update(model1, ~. -time) #Remove time from model

REML and ML

model1 <- lmer(rating~time+(time|id),data=time.data, REML=FALSE)
model2 <- lmer(rating~1+(time|id),data=time.data, REML=FALSE)

model1 <- update(model1, REML=FALSE)
model2 <- update(model2, REML=FALSE)

Model Comparison

Is there a significant effect of time?

display(model2)

## lmer(formula = rating ~ 1 + (time | id), data = time.data, REML = FALSE)
## coef.est  coef.se 
##     3.63     0.26 
## 
## Error terms:
##  Groups   Name        Std.Dev. Corr 
##  id       (Intercept) 1.46          
##           time        0.04     0.62 
##  Residual             0.52          
## ---
## number of obs: 1200, groups: id, 20
## AIC = 2028.5, DIC = 2018.5
## deviance = 2018.5

Model Comparison

Is there a significant effect of time?

anova(model1, model2)

## Data: time.data
## Models:
## model2: rating ~ 1 + (time | id)
## model1: rating ~ time + (time | id)
##        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## model2  5 2028.5 2054.0 -1009.3   2018.5                             
## model1  6 2009.6 2040.2  -998.8   1997.6 20.939      1  4.741e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Comparison

What is the best random stucture?

anova(incp.model,in.sl.model)

## refitting model(s) with ML (instead of REML)

## Data: time.data
## Models:
## incp.model: rating ~ time + (1 | id)
## in.sl.model: rating ~ time + (time | id)
##             Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## incp.model   4 2441.8 2462.2 -1216.9   2433.8                             
## in.sl.model  6 2009.6 2040.2  -998.8   1997.6 436.21      2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Remember…

"Essentially, all models are wrong, but some are useful"
-?? George Box

Now you are reday to do unbalanced experiments, get P-values for psychology journals - and do valid statistics.

Quick overview of functions in R

Function	Model-type	Package
`lm()`	linear regression	`"Stats"`
`glm()`	Logistic/Poisson regression	`"Stats"`
`lmer()`	Mixed-effect regression	`"lme4"`
`glmer()`	Mixed-effect logistic/Poisson regression	`"lme4"`
`nlme()`/`nlmer()`	Non-linear mixed-models	`"nlme"`/`"lme4"`