<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.lang.r.linguistics">
    <title>gmane.comp.lang.r.linguistics</title>
    <link>http://blog.gmane.org/gmane.comp.lang.r.linguistics</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/528"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/527"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/526"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/525"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/524"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/523"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/522"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/521"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/520"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/519"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/518"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/517"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/516"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/515"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/514"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/513"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/512"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/511"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/510"/>
        <rdf:li rdf:resource="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/509"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/528">
    <title>Re: Simpler model with random slopes</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/528</link>
    <description>&lt;pre&gt;Hi Sverre,

This is a belated follow-up to your follow-up questions.

On Mar 20, 2012, at 3:49 AM PDT, Sverre Stausland wrote:


Critical question: by "assess the contribution" do you mean assess whether there is a reliable fixed effect of Indep2 (i.e. whether Indep2 has a reliable effect in one direction or another that generalizes across subjects)?  Or assessing whether including Indep2 at all in your model helps its ability to predict Dep, even if Indep2 may matter only in a subset of subjects?


--

Roger Levy                      Email: rlevy-XkckGZ689+c&amp;lt; at &amp;gt;public.gmane.org
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://idiom.ucsd.edu/~rlevy











&lt;/pre&gt;</description>
    <dc:creator>Levy, Roger</dc:creator>
    <dc:date>2012-03-31T00:38:23</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/527">
    <title>Re: Conflicting p-values from pvals.fnc</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/527</link>
    <description>&lt;pre&gt;
On Mar 28, 2012, at 7:13 PM PDT, Levy, Roger wrote:


On Mar 27, 2012, at 3:19 PM PDT, Tom Gijssels wrote:

Hey Roger,

Thanks for the response!

Let me try to address your questions as good as possible (and in reverse order to make things a bit more concrete).

The reason for including weights is that there's a reasonable amount of variance in the relative importance of the data-points I'm using. To be a bit more specific: the experiment looks at whether people accommodate their speech rate to a conversational partner's speech rate. Speech rate was calculated by taking the words per second for each of the subject's utterances during Baseline and during Conversation. Since a subject's utterances vary in duration (e.g. some are 2s, others 10s), I want to make sure that the speech rate measure of a long utterance gets contributes more strongly to the model than a short utterance. Therefore, I wanted to include the duration for each utterance as a weight for the corresponding speech rate measure. Does this sound sensible?

As far as I can see, it seems that this is at least conceptually what the weights argument does in lm (the lmer documentation states 'weights' gets implemented exactly as in lm).

The vector supplied as the weights argument is used to calculate weighted least squares (minimizing sum(w*e^2)). The weights are included to smooth out differences in the variance of the different observations ('with the values in weights being inversely proportional to the variances', taken from the lm documentation). If I understand this correctly, in the current data set, the short duration observations would have higher variance than the long duration observations. Including duration as a weight, then, would lead to the former observations exerting less influence on the final fit. Is this correct?

Thanks for the additional information Tom.  I see your justification for using the weights argument as you're using it, but I'm not sure that lmer and pvals.fnc jointly are handling the weights in a totally consistent way.  Note that as you crank the weights up, the t statistic gets larger and larger (which would be what you expect on the reading that a weight of k should simulate k replicates of that datum), but the pMCMC "significance level" goes in the opposite direction.

I will follow up on r-sig-me.

Well, we don't seem to have gotten any enlightenment from r-sig-me.  I guess I would suggest being wary of using weights indiscriminately in lmer until the understanding of exactly what it's doing becomes more clear.  Nathaniel's suggestion of simulating from your data to determine which (if any) of the pMCMC, Wald, and t statistics give normative p-values is definitely worth following, too.

Best &amp;amp; hope this helps.

Roger

--

Roger Levy                      Email: rlevy-XkckGZ689+c&amp;lt; at &amp;gt;public.gmane.org&amp;lt;mailto:rlevy-XkckGZ689+c&amp;lt; at &amp;gt;public.gmane.org&amp;gt;
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://idiom.ucsd.edu/~rlevy









&lt;/pre&gt;</description>
    <dc:creator>Levy, Roger</dc:creator>
    <dc:date>2012-03-30T19:34:32</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/526">
    <title>Re: Conflicting p-values from pvals.fnc</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/526</link>
    <description>&lt;pre&gt;
I'm not sure what it means that two tests of unknown quality disagree
with a third test of unknown quality... but figuring out which tests
might be accurate is actually very easy. You already wrote code to
simulate data from your model; just run it 1000 times with no real
effects put in, and look at the distribution of p values you get. An
accurate test will produce p values that are uniformly distributed,
i.e., 0.05 * 1000 = ~50 of them should be less than 0.05.

(Also I notice your simulation code uses different weights for the
different conditions, but doesn't actually add more noise to one
condition than the other; you might want to fix that first.. You just
need something like err &amp;lt;- err * 1/sqrt(weights).)

&lt;/pre&gt;</description>
    <dc:creator>Nathaniel Smith</dc:creator>
    <dc:date>2012-03-29T13:45:01</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/525">
    <title>Re: Conflicting p-values from pvals.fnc</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/525</link>
    <description>&lt;pre&gt;
On Mar 27, 2012, at 3:19 PM PDT, Tom Gijssels wrote:

Hey Roger,

Thanks for the response!

Let me try to address your questions as good as possible (and in reverse order to make things a bit more concrete).

The reason for including weights is that there's a reasonable amount of variance in the relative importance of the data-points I'm using. To be a bit more specific: the experiment looks at whether people accommodate their speech rate to a conversational partner's speech rate. Speech rate was calculated by taking the words per second for each of the subject's utterances during Baseline and during Conversation. Since a subject's utterances vary in duration (e.g. some are 2s, others 10s), I want to make sure that the speech rate measure of a long utterance gets contributes more strongly to the model than a short utterance. Therefore, I wanted to include the duration for each utterance as a weight for the corresponding speech rate measure. Does this sound sensible?

As far as I can see, it seems that this is at least conceptually what the weights argument does in lm (the lmer documentation states 'weights' gets implemented exactly as in lm).

The vector supplied as the weights argument is used to calculate weighted least squares (minimizing sum(w*e^2)). The weights are included to smooth out differences in the variance of the different observations ('with the values in weights being inversely proportional to the variances', taken from the lm documentation). If I understand this correctly, in the current data set, the short duration observations would have higher variance than the long duration observations. Including duration as a weight, then, would lead to the former observations exerting less influence on the final fit. Is this correct?

Thanks for the additional information Tom.  I see your justification for using the weights argument as you're using it, but I'm not sure that lmer and pvals.fnc jointly are handling the weights in a totally consistent way.  Note that as you crank the weights up, the t statistic gets larger and larger (which would be what you expect on the reading that a weight of k should simulate k replicates of that datum), but the pMCMC "significance level" goes in the opposite direction.

I will follow up on r-sig-me.

Best

Roger



Finally, a colleague of mine posted the message appended below in response to the same post on R-Sig-ME. The correspondence between the Wald chi-square and the pMCMC results indeed seems to suggest that the pvals based on the t-tests are unreliable. Any ideas about this?

Thanks again for the input, it's much appreciated.

Cheers,

Tom


Hi listers,

If you get p-vals using Wald chi-square tests with lme4::anova, they look
pretty close to the pMCMC output.
fm.1 &amp;lt;- lmer(y ~ block * condition + (1 | as.factor(subject)),
             weights = weights, REML = FALSE)
fm.2 &amp;lt;- lmer(y ~ block + condition + (1 | as.factor(subject)),
             weights = weights, REML = FALSE)
anova(fm.1, fm.2)

This gives p = .35 for the block*condition interaction. For comparison,
pvals.fnc gave pMCMC = .3 and p(&amp;lt;|t|) &amp;lt; .001. So it looks like p-vals
derived from t-tests are just way off.

It seems to me like we should just totally ignore the P(&amp;lt;|t|) output. Does
anyone who knows more about how these work think otherwise?

&lt;/pre&gt;</description>
    <dc:creator>Levy, Roger</dc:creator>
    <dc:date>2012-03-29T02:13:14</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/524">
    <title>Re: Conflicting p-values from pvals.fnc</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/524</link>
    <description>&lt;pre&gt;Hey Roger,

Thanks for the response!

Let me try to address your questions as good as possible (and in reverse
order to make things a bit more concrete).

The reason for including weights is that there's a reasonable amount of
variance in the relative importance of the data-points I'm using. To be a
bit more specific: the experiment looks at whether people accommodate their
speech rate to a conversational partner's speech rate. Speech rate was
calculated by taking the words per second for each of the subject's
utterances during Baseline and during Conversation. Since a subject's
utterances vary in duration (e.g. some are 2s, others 10s), I want to make
sure that the speech rate measure of a long utterance gets contributes more
strongly to the model than a short utterance. Therefore, I wanted to
include the duration for each utterance as a weight for the corresponding
speech rate measure. Does this sound sensible?

As far as I can see, it seems that this is at least conceptually what the
weights argument does in lm (the lmer documentation states 'weights' gets
implemented exactly as in lm).

The vector supplied as the weights argument is used to calculate weighted
least squares (minimizing sum(w*e^2)). The weights are included to smooth
out differences in the variance of the different observations ('with the
values in weights being inversely proportional to the variances', taken
from the lm documentation). If I understand this correctly, in the current
data set, the short duration observations would have higher variance than
the long duration observations. Including duration as a weight, then, would
lead to the former observations exerting less influence on the final fit.
Is this correct?

Finally, a colleague of mine posted the message appended below in response
to the same post on R-Sig-ME. The correspondence between the Wald
chi-square and the pMCMC results indeed seems to suggest that the pvals
based on the t-tests are unreliable. Any ideas about this?

Thanks again for the input, it's much appreciated.

Cheers,

Tom

Hi listers,

If you get p-vals using Wald chi-square tests with lme4::anova, they look
pretty close to the pMCMC output.
fm.1 &amp;lt;- lmer(y ~ block * condition + (1 | as.factor(subject)),
             weights = weights, REML = FALSE)
fm.2 &amp;lt;- lmer(y ~ block + condition + (1 | as.factor(subject)),
             weights = weights, REML = FALSE)
anova(fm.1, fm.2)

This gives p = .35 for the block*condition interaction. For comparison,
pvals.fnc gave pMCMC = .3 and p(&amp;lt;|t|) &amp;lt; .001. So it looks like p-vals
derived from t-tests are just way off.

It seems to me like we should just totally ignore the P(&amp;lt;|t|) output. Does
anyone who knows more about how these work think otherwise?

&lt;/pre&gt;</description>
    <dc:creator>Tom Gijssels</dc:creator>
    <dc:date>2012-03-27T22:19:42</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/523">
    <title>Re: Conflicting p-values from pvals.fnc</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/523</link>
    <description>&lt;pre&gt;Hi Tom,

I'm not sure how lmer uses the weights argument -- I think I know how it *should* be used (setting a weight of k to a given observation should be equivalent to having seen k replicates of that observation), but what you've got here makes me wonder what's actually going on under the hood.  Could you maybe point us to some documentation as to what lmer claims to be doing with the weights argument, and maybe also explain why you want to use the weights you're using?

Best

Roger

On Mar 26, 2012, at 10:15 AM PDT, Tom Gijssels wrote:


Dear R-langers,

I'm trying to run a mixed effect model using the lmer() function and have
run into some issues in interpreting the p-values generated by
pvals.fnc(). The design is a between-subjects design, with two fixed
effects (condition &amp;amp; block; each with two levels), and one random effect
(subject). Additionally, I have a set of weights that I want to include.

When looking at the pvals.fnc() output,there appears to be a large
discrepancy between the pMCMC values and the t-statistic p-values. Whereas
one of the main effects and the interaction are far from significant
judging by the pMCMC values, they are highly significant when looking at
the t-statistic p-values (e.g. Condition: pMCMC = 0.2294; Pr(&amp;gt;|t|) = 0.0000
&amp;amp; Condition*Block: pMCMC = 0.3296; Pr(&amp;gt;|t|) = 0.0000) . I have read that
the t-statistic based p-values are less conservative, but the difference
between these two values seems really extreme.

Below some code that simulates the model and the data. The original data
set has two precise characteristics that might influence the results, so I
tried to simulate those characteristics in the mock data. That is: 1)
there's fewer observations in block A than in block B; and 2) the weights
for observations in block A generally are lower than those for block B.

Running this code reproduces the original observation of conflicting pMCMC
and p-T-test values. However, when excluding the weights argument from the
lmer model, these values seem to converge, suggesting that the weights
specification might be underlying these problems.

In short, my question is whether anyone knows why these values diverge and
what I could do to address this issue.

Many thanks in advance!

Tom

block &amp;lt;- as.factor(c(rep('a', times = 20), rep('b', times = 200)))
condition &amp;lt;- as.factor(c(rep(c('x', 'y'), each = 10), rep(c('x','y'), each
= 100)))
contrasts(block) &amp;lt;- c(-0.5, 0.5)
contrasts(condition) &amp;lt;- c(-0.5, 0.5)

subject &amp;lt;- c(rep(1:4, each = 5), rep(1:4, each = 50))

intercept &amp;lt;- 100
block.me&amp;lt;http://block.me/&amp;gt; &amp;lt;- 20
condition.me&amp;lt;http://condition.me/&amp;gt; &amp;lt;- 30
err &amp;lt;- rnorm(length(block), sd = 20)
weights &amp;lt;- c(rep(1, times = 20), rep(10, times = 200))

y &amp;lt;- intercept + ifelse(block == 'a', block.me&amp;lt;http://block.me/&amp;gt;, 0) + ifelse(condition ==
'x', condition.me&amp;lt;http://condition.me/&amp;gt;, 0) +
    ifelse(block == 'a' &amp;amp; condition == 'x', 30, 0) + (subject * 10) + err


fm.1 &amp;lt;- lmer(y ~ block * condition + (1 | as.factor(subject)),
             weights = weights, REML = FALSE)
fm.1.mcmc &amp;lt;- pvals.fnc(fm.1, addPlot=F)

        [[alternative HTML version deleted]]



--

Roger Levy                      Email: rlevy-XkckGZ689+c&amp;lt; at &amp;gt;public.gmane.org&amp;lt;mailto:rlevy-XkckGZ689+c&amp;lt; at &amp;gt;public.gmane.org&amp;gt;
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://idiom.ucsd.edu/~rlevy









&lt;/pre&gt;</description>
    <dc:creator>Levy, Roger</dc:creator>
    <dc:date>2012-03-26T21:56:07</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/522">
    <title>Conflicting p-values from pvals.fnc</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/522</link>
    <description>&lt;pre&gt;Dear R-langers,

I'm trying to run a mixed effect model using the lmer() function and have
run into some issues in interpreting the p-values generated by
pvals.fnc(). The design is a between-subjects design, with two fixed
effects (condition &amp;amp; block; each with two levels), and one random effect
(subject). Additionally, I have a set of weights that I want to include.

When looking at the pvals.fnc() output,there appears to be a large
discrepancy between the pMCMC values and the t-statistic p-values. Whereas
one of the main effects and the interaction are far from significant
judging by the pMCMC values, they are highly significant when looking at
the t-statistic p-values (e.g. Condition: pMCMC = 0.2294; Pr(&amp;gt;|t|) = 0.0000
&amp;amp; Condition*Block: pMCMC = 0.3296; Pr(&amp;gt;|t|) = 0.0000) . I have read that
the t-statistic based p-values are less conservative, but the difference
between these two values seems really extreme.

Below some code that simulates the model and the data. The original data
set has two precise characteristics that might influence the results, so I
tried to simulate those characteristics in the mock data. That is: 1)
there's fewer observations in block A than in block B; and 2) the weights
for observations in block A generally are lower than those for block B.

Running this code reproduces the original observation of conflicting pMCMC
and p-T-test values. However, when excluding the weights argument from the
lmer model, these values seem to converge, suggesting that the weights
specification might be underlying these problems.

In short, my question is whether anyone knows why these values diverge and
what I could do to address this issue.

Many thanks in advance!

Tom

block &amp;lt;- as.factor(c(rep('a', times = 20), rep('b', times = 200)))
condition &amp;lt;- as.factor(c(rep(c('x', 'y'), each = 10), rep(c('x','y'), each
= 100)))
contrasts(block) &amp;lt;- c(-0.5, 0.5)
contrasts(condition) &amp;lt;- c(-0.5, 0.5)

subject &amp;lt;- c(rep(1:4, each = 5), rep(1:4, each = 50))

intercept &amp;lt;- 100block.me &amp;lt;- 20condition.me &amp;lt;- 30
err &amp;lt;- rnorm(length(block), sd = 20)
weights &amp;lt;- c(rep(1, times = 20), rep(10, times = 200))

y &amp;lt;- intercept + ifelse(block == 'a', block.me, 0) + ifelse(condition ==
'x', condition.me, 0) +
    ifelse(block == 'a' &amp;amp; condition == 'x', 30, 0) + (subject * 10) + err


fm.1 &amp;lt;- lmer(y ~ block * condition + (1 | as.factor(subject)),
             weights = weights, REML = FALSE)
fm.1.mcmc &amp;lt;- pvals.fnc(fm.1, addPlot=F)

[[alternative HTML version deleted]]
&lt;/pre&gt;</description>
    <dc:creator>Tom Gijssels</dc:creator>
    <dc:date>2012-03-26T17:15:46</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/521">
    <title>Re: Simpler model with random slopes</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/521</link>
    <description>&lt;pre&gt;An update: I asked below whether one should treat a model with a main
effect Indep2 and random slopes for Indep2 similarly as one should
treat a model with a main effect Indep2 and an interaction between
Indep2 and another main effect. That is, a model with random slopes
for Indep2 should also contain the main effect Indep2 - one should not
remove the main effect Indep2 and leave the random slopes for Indep2
in.

For what it's worth, I just found the same question asked in R-sig-ME
in 2010: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q2/003876.html,
to which Ben Bolker answers in the affirmative (with an explanation).

Sverre

On Tue, Mar 20, 2012 at 11:49 AM, Sverre Stausland
&amp;lt;johnsen-fWAZDB8bsKe+fmr0zi+kZQ&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Sverre Stausland</dc:creator>
    <dc:date>2012-03-21T12:58:10</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/520">
    <title>Re: Simpler model with random slopes</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/520</link>
    <description>&lt;pre&gt;Thanks for pointing that out, Florian. I noticed just that when I
built my models with and without the covariance, and it makes sense
from Nathaniel's exemplary explanation.

I do have a follow-up question that I'm taking the liberty to ask in
this thread:

As an example, say my model includes random slopes for items with
respect to a variable Indep2, as in (a)

(a) glmer(formula = Dep ~ 1 + (1 | Subject) + (0 + Indep2 | Item) +
Indep1 + Indep2, data = my.data, family = binomial(link = "logit"))

To assess the contribution of variable Indep2 through model
comparison, should I then remove the main effect Indep2 as well as the
random slopes for items, as in (b) below?

(b) glmer(formula = Dep ~ 1 + (1 | Subject) + Indep1, data = my.data,
family = binomial(link = "logit"))

It seems slightly odd to me to allow the model to include random
variation among items with respect to a variable Indep2 if the model
doesn't assume that Indep2 plays a role as a main effect as well,
similar to how higher-order interactions between variables require
that those variables are also included as main effects in the model.
Is my understanding correct here?

In case the answer is yes, let me ask a follow-up to that [ignore this
if the answer is no]:

The model I have has random slopes (0 + Indep2 | Item) as the only
random effect (others proved unnecessary):

(c) glmer(formula = Dep ~ 1 + (0 + Indep2 | Item) + Indep1 + Indep2,
data = my.data, family = binomial(link = "logit"))

If my subset model without Indep2 needs to take out the random slopes
as well, I need to introduce a dummy random effect, since lmer won't
construct a model without any random effects. The question is, does it
matter what kind of dummy random effect this is? Could it be random
intercepts (d) just as well as random slopes (e), or would something
else be more appropriate?

dummy &amp;lt;- rep(1, nrow(my.data))
(d) glmer(formula = Dep ~ 1 + (1 | dummy) + Indep1, data = my.data,
family = binomial(link = "logit"))
(e) glmer(formula = Dep ~ 1 + (0 + dummy | Item) + Indep1, data =
my.data, family = binomial(link = "logit"))

Thanks
Sverre

On Tue, Mar 20, 2012 at 2:19 AM, T. Florian Jaeger
&amp;lt;tiflo-sMaJTQEtWn7RrdkEUGVx5Ydd74u8MsAO&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Sverre Stausland</dc:creator>
    <dc:date>2012-03-20T10:49:41</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/519">
    <title>Re: Simpler model with random slopes</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/519</link>
    <description>&lt;pre&gt;just to add to Nathaniel's beautifully clear explanation: going from (a) to
(b) is really the opposite of "reducing" the model. going from (b) to (a)
would remove one parameter (the covariance parameter)

flo

On Mon, Mar 19, 2012 at 2:45 PM, Sverre Stausland
&amp;lt;johnsen-fWAZDB8bsKe+fmr0zi+kZQ&amp;lt; at &amp;gt;public.gmane.org&amp;gt;wrote:

&lt;/pre&gt;</description>
    <dc:creator>T. Florian Jaeger</dc:creator>
    <dc:date>2012-03-20T01:19:31</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/518">
    <title>Re: Simpler model with random slopes</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/518</link>
    <description>&lt;pre&gt;Thanks for the explanation. I think I got it!

On Mon, Mar 19, 2012 at 5:54 PM, Nathaniel Smith &amp;lt;njs-e+AXbWqSrlAAvxtiuMwx3w&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>Sverre Stausland</dc:creator>
    <dc:date>2012-03-19T18:45:38</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/517">
    <title>Re: Simpler model with random slopes</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/517</link>
    <description>&lt;pre&gt;On Mon, Mar 19, 2012 at 4:37 PM, Sverre Stausland
&amp;lt;johnsen-fWAZDB8bsKe+fmr0zi+kZQ&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:

(1 | Item) means that items are assumed to have different intercepts
&lt;/pre&gt;</description>
    <dc:creator>Nathaniel Smith</dc:creator>
    <dc:date>2012-03-19T16:54:25</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/516">
    <title>Simpler model with random slopes</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/516</link>
    <description>&lt;pre&gt;Hi all,

this question is ultimately based on Florian's lecture1 slides here:
http://hlplab.wordpress.com/2010/05/10/mini-womm/

I'm doing a mixed model logistic regression, with random intercepts
for items and random slopes for items with respect to the fixed effect
Indep2 (cf. slide 85):

(a) glmer(formula = Dep ~ 1 + (1 | Item) + (0 + Indep2 | Item) +
Indep1 + Indep2, data = my.data, family = binomial(link = "logit"))

As per slide 88, I can also reduce the random effects to (1 + Indep2 | Item):

(b) glmer(formula = Dep ~ 1 + (1 + Indep2 | Item) + Indep1 + Indep2,
data = my.data, family = binomial(link = "logit"))

It's not exactly clear to me what (1 + Indep2 | Item) does, since the
output of both (a) and (b) includes random intercepts for items and
random slopes for items by Indep2. At the same time, model (a) and (b)
differ in their exact estimates.

I would appreciate if someone could explain what the difference
between model (a) and (b) is.

Thanks
Sverre

&lt;/pre&gt;</description>
    <dc:creator>Sverre Stausland</dc:creator>
    <dc:date>2012-03-19T16:37:15</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/515">
    <title>Re: Questions about reporting mixed-effects results</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/515</link>
    <description>&lt;pre&gt;Hi Ariel,

I hope that we're moving towards researchers giving full model summaries in
electronic appendices for those of us interested in the details, while
keeping the main text focused on a concise (but interpretable) summary of
the results. The appendices, in my view, should contain detailed
information about how variables were coded and entered into the regression.
Since many researchers in psycholinguistics and its neighboring disciplines
had their statistical training focused on ANOVA, I think using coding that
makes regressions maximally comparable to ANOVA is preferable. This means
that I would recommend to contrast code (or where justified Helmert or
polynomial code) factors rather than to employ the basic treatment coding
(R's default). I think centering of treatment coded factors, while it
doesn't address all concerns, usually doesn't result in reduced
interpretability either although for pretty balanced data sets you might as
well use contrast coding (thereby perhaps removing an issue that makes it
harder for others to understand why you did what you did).

As for some examples for how to report results (or rather, my highly
subjective preference, see
http://wiki.bcs.rochester.edu/HlpLab/StatsCourses?action=AttachFile&amp;amp;do=get&amp;amp;target=Groningen11.pdf).
Perhaps the following article is also helpful in terms of the lingo and in
how to talk about random effects (see in particular the sections where we
introduce GLMMs and re-report the Atkinson model):

  Jaeger, T. F., Graff, P., Croft, B., and Pontillo, D. 2011. Mixed effect
models for genetic and areal dependencies in linguistic typology:
Commentary on Atkinson. Linguistic Typology 15(2), 281–319.

available at e.g.
http://www.degruyter.com/view/j/lity.2011.15.issue-2/lity.2011.021/lity.2011.021.xml?format=INT

or

http://rochester.academia.edu/tiflo/Papers/774232/Jaeger_T._F._Graff_P._Croft_B._and_Pontillo_D._2011._Mixed_effect_models_for_genetic_and_areal_dependencies_in_linguistic_typology_Commentary_on_Atkinson._Linguistic_Typology_15_2_281-319

HTH,

Florian

On Mon, Feb 20, 2012 at 3:15 PM, Goldberg, Ariel M &amp;lt;Ariel.Goldberg-qzQkmLjiwf8&amp;lt; at &amp;gt;public.gmane.orgu

&lt;/pre&gt;</description>
    <dc:creator>T. Florian Jaeger</dc:creator>
    <dc:date>2012-02-23T20:19:35</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/514">
    <title>Re: Questions about reporting mixed-effects results</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/514</link>
    <description>&lt;pre&gt;With respect to power, once you have settled on a model and have the  
parameter estimates in hand, you can always run a simulation of the  
model a jillion times using those paraemter estimates and see what  
proportion of the jillion yielded significant results on each of the  
tests you care about.  Those proportions will be the estimated power  
of those tests.

Quoting Jason Kahn &amp;lt;jmkahn-vEoVa6Jh+/J4piUD7e9S/g&amp;lt; at &amp;gt;public.gmane.org&amp;gt;:




Richard S. Bogartz
Professor of Psychology
UMASS, Amherst 01003

Afghanistan:
100 Al Quaeda
100,000 U.S. troops
$139,200,000,000 a year

OUT NOW!!

"When I was 5 years old, my mother always told me? that happiness was  
the key to life. When I went to school, they asked me what I wanted to  
be when I grew up. I wrote down ?happy?. They told me I didn?t  
understand the assignment, and I told them they didn?t understand life."

? John Lennon

&lt;/pre&gt;</description>
    <dc:creator>bogartz-1677TEtHUQ3mFkVL2hffgA&lt; at &gt;public.gmane.org</dc:creator>
    <dc:date>2012-02-20T21:27:41</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/513">
    <title>Re: Questions about reporting mixed-effects results</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/513</link>
    <description>&lt;pre&gt;Dear Jason,

Thanks so much for your advice.  I definitely will check out your website.

I should mention that I made a mistake in my previous message… all of my models contain random intercepts, but a few contain random slopes for the variables of interest.  These random slopes were added when model comparisons indicated that they were warranted.

In the cases where the model contained random slopes, I have reported the t-values as well as the results of a model comparison looking at whether the addition of the one variable of interest significantly increases the fit of the model.

Thanks again!
AG


On Feb 20, 2012, at 3:55 PM, Jason Kahn wrote:

 associated with the parameter estimates, assuming a reasonably large (at the very least 500 observations, preferably 1000) dataset. A slightly more conservative estimate is on my website, and comes courtesy of Roger Levy (if I remember right), who cites Hox (2010).



&lt;/pre&gt;</description>
    <dc:creator>Goldberg, Ariel M</dc:creator>
    <dc:date>2012-02-20T20:59:22</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/512">
    <title>Re: Questions about reporting mixed-effects results</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/512</link>
    <description>&lt;pre&gt;Hi Ariel,

Having been through some of these same issues with reviewers recently, I
hope I can say something helpful. Most of this is summarized on my website,
which I would appreciate feedback/commentary on from anyone and everyone,
and which you can find here: http://jmkahn.web.unc.edu/modeling/

You should definitely report centered predictors if their theoretical
interpretation or collinearity-with-interactions depends on that
transformation. Centering will reduce (or, ideally, eliminate) the
collinearity between "first-order" predictors and the interactions they
participate in, unless you have unbalanced conditions. It will also change
the interpretation of your intercept, and the parameter estimates
associated with those variables. If you and/or your readers care about any
of those things, reporting the centered predictors seems necessary.
Otherwise, I don't believe it matters.

Things may have changed since I last used pvals.fnc(), but it used to work
for models with random intercepts, and even models with random slopes, but
not for ones with /correlations between/ random intercepts and random
slopes. There's reason to believe that much of the time, a model should
include as maximally-specified a random effects structure as possible
(read: as defined first by the experimental design, and second by
convergence of the models - /not/ by model-fitting procedures like
log-likelihood tests; opinions may differ on this, although if the paper
I'm thinking of is in press now, please do chime in - I'd love to be able
to cite it). In many cases, that leaves someone who wants to use MCMC
sampling in the lurch. This list has, in the past, told me to simply use
the *t*-values associated with the parameter estimates, assuming a
reasonably large (at the very least 500 observations, preferably 1000)
dataset. A slightly more conservative estimate is on my website, and comes
courtesy of Roger Levy (if I remember right), who cites Hox (2010).

For the collinearity among the control predictors, as long as they don't
correlate similarly with your predictors of interest, simply laying out
your model's structure should be sufficient. A recent manuscript submission
from my lab used that same technique, and it seems to have flown (knock on
wood).

As for power analysis, I would love to hear about such a thing. I was under
the impression there was no practical and/or agreed-upon method, same as
for overall estimates of variance explained by the model.

Please don't hesitate to correct me if I've said something wrong here!

Best,
Jason

On Mon, Feb 20, 2012 at 3:15 PM, Goldberg, Ariel M &amp;lt;Ariel.Goldberg-qzQkmLjiwf+HXe+LvDLADg&amp;lt; at &amp;gt;public.gmane.org

&lt;/pre&gt;</description>
    <dc:creator>Jason Kahn</dc:creator>
    <dc:date>2012-02-20T20:55:03</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/511">
    <title>Questions about reporting mixed-effects results</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/511</link>
    <description>&lt;pre&gt;Dear all,

I have a few questions about how to report the results of mixed-effects analyses for publication. I have been perusing the Jaeger &amp;amp; Kuperman presentation but a few questions remain.  

I have been asked by the reviewers to include a full regression table, which I take to comprise coefficient estimates, MCMC-based confidence intervals and MCMC-based p-value estimations.
-Should the model that I use to report these values contain uncentered predictors, centered predictors, or centered and scaled predictors?
-A few of my models involve random intercepts, and I believe that pvals.fnc() is not currently defined for models with random intercepts.  Do you have any suggestions for how I should report these models? 

My models contain many control variables and only one or two variables that I am actually concerned with.  As such, I have not worried about multicollinearity among the control variables.  I suppose I should just state this somewhere to facilitate the interpretation of the regression tables?

Lastly, is there any way to do a power analysis for mixed-effects models?  One reviewer asked whether this was possible and noted that there may be rough approximations such as "the t-approximation to the coefficient-wise test".

Thank you!
Ariel




&lt;/pre&gt;</description>
    <dc:creator>Goldberg, Ariel M</dc:creator>
    <dc:date>2012-02-20T20:15:19</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/510">
    <title>Re: negative deviances (again)</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/510</link>
    <description>&lt;pre&gt;Argh, sorry, I was too trigger happy. I just adapted your example to mixed
models:

library(stats)
library(lme4)
N&amp;lt;-100
z &amp;lt;- rep(1:5, 20)
x &amp;lt;- runif(N)
y1 &amp;lt;- rnorm(N,mean=x,sd=3) + rnorm(2,0,sd=)[z]
y2 &amp;lt;- rnorm(N,mean=x,sd=.3) + rnorm(2,0,sd=)[z]
logLik(lmer(y1 ~ x + (1|z)))
logLik(lmer(y2 ~ x + (1|z)))

and get the expected behavior (positive LLs for low residual SDs and
negative ones for larger SDs). Seems like I simply got unlucky in choosing
the test data sets I ran before (I should have run the sim you suggested
immediately). In retrospect, perhaps this is due that all the test data
contained log-transformed DVs ;). If one unlogs Baayen's RT measure in the
example I sent, the residuals are, of course, large, and then the deviance
is positive. phew.

So, I guess, higher LLs still are better fits since positive deviance are
in principle legit. Sorry, for pressing the panic button.

Florian

On Sat, Jan 28, 2012 at 1:48 PM, piantado &amp;lt;piantado-3s7WtUTddSA&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>T. Florian Jaeger</dc:creator>
    <dc:date>2012-01-28T19:14:37</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/509">
    <title>Re: negative deviances (again)</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/509</link>
    <description>&lt;pre&gt;Hi Steve,

I know that this can happen. I am worried because it seems to occur in ALL
models I've tried on different data sets.

I also meant to say that I am even more confused (troubled) about the
following. As reminder, here are the two models - notice that the negative
Deviance (deviance at ML max) goes down (i.e. becomes larger) as the
additional predictor is added to the model.

Linear mixed model fit by REML
Formula: RT ~ Frequency + (1 | Subject)
   Data: lexdec
    AIC    BIC logLik deviance REMLdev
 -858.4 -836.8  433.2   -880.9  -866.4
[snip]

Linear mixed model fit by REML
Formula: RT ~ Frequency + Trial + (1 | Subject)
   Data: lexdec
    AIC  BIC logLik deviance REMLdev
 -846.1 -819  428.1   -887.2  -856.1
[snip]

When I run an ANOVA over the two models in the previous email (and I've
replicated this for other data), it'll use the log-Likelihood estimate from
the deviance of the ML-fitted (rather than REML-fitted) model, so that the
more complex model has a larger (here: more positive; if the sign was
flipped, more negative) value, which really shouldn't be the case. I know
that this can happen for ML estimates of mixed models, but it happens in
all models I'm running on different data sets (and that did not use to be
the case; the deviance and REMLdeviance usually develop in parallel).

Data: lexdec
Models:
l1: RT ~ Frequency + (1 | Subject)
l2: RT ~ Frequency + Trial + (1 | Subject)
   Df     AIC     BIC logLik  Chisq Chi Df Pr(&amp;gt;Chisq)
l1  4 -872.86 -851.21 440.43
l2  5 -877.22 -850.15 443.61 6.3561      1     0.0117 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I might have tomatoes on my eyes (or in my brain), so if you see the
obvious that I am missing, pls let me know,

Florian

On Sat, Jan 28, 2012 at 1:46 PM, piantado &amp;lt;piantado-3s7WtUTddSA&amp;lt; at &amp;gt;public.gmane.org&amp;gt; wrote:

&lt;/pre&gt;</description>
    <dc:creator>T. Florian Jaeger</dc:creator>
    <dc:date>2012-01-28T18:51:47</dc:date>
  </item>
  <item rdf:about="http://permalink.gmane.org/gmane.comp.lang.r.linguistics/508">
    <title>Re: negative deviances (again)</title>
    <link>http://permalink.gmane.org/gmane.comp.lang.r.linguistics/508</link>
    <description>&lt;pre&gt;(Sorry my email crashed and I'm not sure this went through the first
time)


Hi Florian, 

What makes you sure there's a problem? You can get positive log
likelihoods (and negative deviances) from the residual distribution
having a low enough standard deviation, since the normal distribution
goes above the line y=1. 

It looks like that's what's up here:

l &amp;lt;- lmer(RT ~ Frequency + (1 | Subject), lexdec)
[1] 0.1798177

plot(dnorm(seq(-3,3,0.01), sd=0.18), type="l")


Here's an example where you can twiddle the SD and see the log
likelihood change sign:

library(stats)
N&amp;lt;-100
x &amp;lt;- runif(N)
y &amp;lt;- rnorm(N,mean=x,sd=0.18)
logLik(lm(y ~ x))


++Steve







&lt;/pre&gt;</description>
    <dc:creator>piantado</dc:creator>
    <dc:date>2012-01-28T18:48:27</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.lang.r.linguistics">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.lang.r.linguistics</link>
  </textinput>
</rdf:RDF>

