<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.lang.r.linguistics">
    <title>gmane.comp.lang.r.linguistics</title>
    <link>http://blog.gmane.org/gmane.comp.lang.r.linguistics</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/522"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/516"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/511"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/506"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/503"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/502"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/496"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/490"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/486"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/485"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/475"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/472"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/467"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/463"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/456"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/439"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/437"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/425"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/412"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.lang.r.linguistics/411"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/522">
    <title>Conflicting p-values from pvals.fnc</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/522</link>
    <description>&lt;pre&gt;Dear R-langers,

I'm trying to run a mixed effect model using the lmer() function and have
run into some issues in interpreting the p-values generated by
pvals.fnc(). The design is a between-subjects design, with two fixed
effects (condition &amp;amp; block; each with two levels), and one random effect
(subject). Additionally, I have a set of weights that I want to include.

When looking at the pvals.fnc() output,there appears to be a large
discrepancy between the pMCMC values and the t-statistic p-values. Whereas
one of the main effects and the interaction are far from significant
judging by the pMCMC values, they are highly significant when looking at
the t-statistic p-values (e.g. Condition: pMCMC = 0.2294; Pr(&amp;gt;|t|) = 0.0000
&amp;amp; Condition*Block: pMCMC = 0.3296; Pr(&amp;gt;|t|) = 0.0000) . I have read that
the t-statistic based p-values are less conservative, but the difference
between these two values seems really extreme.

Below some code that simulates the model and the data. The original data
set has two precise characteristics that might influence the results, so I
tried to simulate those characteristics in the mock data. That is: 1)
there's fewer observations in block A than in block B; and 2) the weights
for observations in block A generally are lower than those for block B.

Running this code reproduces the original observation of conflicting pMCMC
and p-T-test values. However, when excluding the weights argument from the
lmer model, these values seem to converge, suggesting that the weights
specification might be underlying these problems.

In short, my question is whether anyone knows why these values diverge and
what I could do to address this issue.

Many thanks in advance!

Tom

block &amp;lt;- as.factor(c(rep('a', times = 20), rep('b', times = 200)))
condition &amp;lt;- as.factor(c(rep(c('x', 'y'), each = 10), rep(c('x','y'), each
= 100)))
contrasts(block) &amp;lt;- c(-0.5, 0.5)
contrasts(condition) &amp;lt;- c(-0.5, 0.5)

subject &amp;lt;- c(rep(1:4, each = 5), rep(1:4, each = 50))

intercept &amp;lt;- 100block.me &amp;lt;- 20condition.me &amp;lt;- 30
err &amp;lt;- rnorm(length(block), sd = 20)
weights &amp;lt;- c(rep(1, times = 20), rep(10, times = 200))

y &amp;lt;- intercept + ifelse(block == 'a', block.me, 0) + ifelse(condition ==
'x', condition.me, 0) +
    ifelse(block == 'a' &amp;amp; condition == 'x', 30, 0) + (subject * 10) + err


fm.1 &amp;lt;- lmer(y ~ block * condition + (1 | as.factor(subject)),
             weights = weights, REML = FALSE)
fm.1.mcmc &amp;lt;- pvals.fnc(fm.1, addPlot=F)

[[alternative HTML version deleted]]
&lt;/pre&gt;</description>
    <dc:creator>Tom Gijssels</dc:creator>
    <dc:date>2012-03-26T17:15:46</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/516">
    <title>Simpler model with random slopes</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/516</link>
    <description>&lt;pre&gt;Hi all,

this question is ultimately based on Florian's lecture1 slides here:
http://hlplab.wordpress.com/2010/05/10/mini-womm/

I'm doing a mixed model logistic regression, with random intercepts
for items and random slopes for items with respect to the fixed effect
Indep2 (cf. slide 85):

(a) glmer(formula = Dep ~ 1 + (1 | Item) + (0 + Indep2 | Item) +
Indep1 + Indep2, data = my.data, family = binomial(link = "logit"))

As per slide 88, I can also reduce the random effects to (1 + Indep2 | Item):

(b) glmer(formula = Dep ~ 1 + (1 + Indep2 | Item) + Indep1 + Indep2,
data = my.data, family = binomial(link = "logit"))

It's not exactly clear to me what (1 + Indep2 | Item) does, since the
output of both (a) and (b) includes random intercepts for items and
random slopes for items by Indep2. At the same time, model (a) and (b)
differ in their exact estimates.

I would appreciate if someone could explain what the difference
between model (a) and (b) is.

Thanks
Sverre

&lt;/pre&gt;</description>
    <dc:creator>Sverre Stausland</dc:creator>
    <dc:date>2012-03-19T16:37:15</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/511">
    <title>Questions about reporting mixed-effects results</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/511</link>
    <description>&lt;pre&gt;Dear all,

I have a few questions about how to report the results of mixed-effects analyses for publication. I have been perusing the Jaeger &amp;amp; Kuperman presentation but a few questions remain.  

I have been asked by the reviewers to include a full regression table, which I take to comprise coefficient estimates, MCMC-based confidence intervals and MCMC-based p-value estimations.
-Should the model that I use to report these values contain uncentered predictors, centered predictors, or centered and scaled predictors?
-A few of my models involve random intercepts, and I believe that pvals.fnc() is not currently defined for models with random intercepts.  Do you have any suggestions for how I should report these models? 

My models contain many control variables and only one or two variables that I am actually concerned with.  As such, I have not worried about multicollinearity among the control variables.  I suppose I should just state this somewhere to facilitate the interpretation of the regression tables?

Lastly, is there any way to do a power analysis for mixed-effects models?  One reviewer asked whether this was possible and noted that there may be rough approximations such as "the t-approximation to the coefficient-wise test".

Thank you!
Ariel




&lt;/pre&gt;</description>
    <dc:creator>Goldberg, Ariel M</dc:creator>
    <dc:date>2012-02-20T20:15:19</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/506">
    <title>negative deviances (again)</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/506</link>
    <description>&lt;pre&gt;Hi,

I am writing because I'm trying to run an lmer model and I keep getting
negative deviances (and positive log-likelihoods, etc.). I've reinstalled R
and updated all packages:

platform       x86_64-pc-mingw32
arch           x86_64
os             mingw32
system         x86_64, mingw32
status
major          2
minor          14.1
year           2011
month          12
day            22
svn rev        57956
language       R
version.string R version 2.14.1 (2011-12-22)

lme4 version 0.999375-42

But the problem persists. It's not due to the data set. For example, I've
rerun a simple model from Harald's languageR library (on the data set
lexdec):

library(languageR)
data(lexdec)
lmer(RT ~ Frequency + (1 | Subject), lexdec)

Linear mixed model fit by REML
Formula: RT ~ Frequency + (1 | Subject)
   Data: lexdec
    AIC    BIC logLik deviance REMLdev
 -858.4 -836.8  433.2   -880.9  -866.4
[snip]

Linear mixed model fit by REML
Formula: RT ~ Frequency + Trial + (1 | Subject)
   Data: lexdec
    AIC  BIC logLik deviance REMLdev
 -846.1 -819  428.1   -887.2  -856.1
[snip]

The likelihood seems to develop in the expected (inverted) direction and so
does the deviance estimate for the maximum REML model (REMLdev). Has anyone
on this list been able to resolve this? I saw this behavior for version
2.13.1, but it persists after updating to 2.14.1. Do you get the same? I
thought this had been resolved.

Florian
&lt;/pre&gt;</description>
    <dc:creator>T. Florian Jaeger</dc:creator>
    <dc:date>2012-01-28T18:27:29</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/503">
    <title>questions about logit mixed model with R</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/503</link>
    <description>&lt;pre&gt;(I re send the message because it seems that it was not sent properly the last time)

In my experiment, subjects were exposed to artificial languages with different word orders (two of them frequent among world languages: SOV, SVO and two of them infrequent: VSO, OSV). After training, subject had to classify new sentences as "correct" or incorrect, according to what they have learned. Sentences could either be correct, contain a syntax violation or a semantic violation (mismatch between a scene and the sentences describing it). Dependent variables were response latency and accuracy (right or wrong answer). I'm trying to analyze the accuracy (1 = right answer, 0 = wrong answer) data using a mixed logit model with "word order (OSV, SVO, SOV, VSO)" and "type of sentence" (correct, semantic violation, syntax violation) as fixed factors, and subject as a random factor. Word order is a between subjects variable, while type of sentences is a repeated measures factor. 

My questions are:

1) In order to contrast each level of each factor with all the others, as well as their interactions: should I ran different models changing the reference category? Does this mean I should run 4 x 3 = 12 models?
2) Would it be correct to compare interaction levels with post hoc Tukey contrasts (for instance: OSV - correct vs. OSV semantic violation, SVO correct vs. OSV correct and so on?).
3) How do I interpret a significant interaction? For instance:

ModeloAngel = lmer(respuest=="1" ~ 1 + grupo * tipoF + (1|sujeto), data=DatosAngel, family="binomial") 

Fixed effects:
                       Estimate Std. Error z value Pr(&amp;gt;|z|)    
(Intercept)             1.79585    0.19196   9.356  &amp;lt; 2e-16 ***
grupoOSV                0.25816    0.26740   0.965   0.3343    
grupoSOV                0.70875    0.29315   2.418   0.0156 *  
grupoSVO                0.59607    0.26769   2.227   0.0260 *  
tipoFVsemanti          -1.01756    0.14765  -6.892 5.51e-12 ***
tipoFVsintact          -1.46088    0.14566 -10.029  &amp;lt; 2e-16 ***
grupoOSV:tipoFVsemanti -0.29214    0.20841  -1.402   0.1610    
grupoSOV:tipoFVsemanti -0.39714    0.23265  -1.707   0.0878 .  
grupoSVO:tipoFVsemanti  0.03181    0.21459   0.148   0.8821    
grupoOSV:tipoFVsintact  0.83284    0.21107   3.946 7.95e-05 ***
grupoSOV:tipoFVsintact  0.42079    0.23408   1.798   0.0722 .  
grupoSVO:tipoFVsintact  0.16667    0.21136   0.789   0.4304    

If the reference levels are VSO and "correct": does this mean that performance of OSV in syntax violations trials is better than that of VSO in syntax violation trials. Or does this mean that OSV - syntax violations performance is better than VSO - "correct" performance?
&lt;/pre&gt;</description>
    <dc:creator>Angel Tabullo</dc:creator>
    <dc:date>2012-01-17T15:07:08</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/502">
    <title>Bielefeld Mixed Models Workshop 2012</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/502</link>
    <description>&lt;pre&gt;Dear colleagues, 
we would like to alert you to our upcoming workshop, which we believe may be interesting for many of the users of this list:


*** BiMM 2012: Bielefeld Mixed Models Workshop ***

Mixed-effects models are a powerful tool for the statistical analysis and modelling of psycho-/linguistic data and are increasingly used in experimental and corpus-oriented work. The Bielefeld Mixed Models Workshop (BiMM 2012) offers an opportunity to gain insight into the application of mixed-effects models. Focus will be placed on combining theoretical background with practical data analysis (using R). Participants can look forward to lively discussions with and between experts about how to use, report and interpret these methods.
The workshop is targeted at students and researchers from psycho-/linguistics and related disciplines working or willing to work with mixed-effects models. No prior knowledge of these models is required. However, participants should be familiar with general statistics and R.

Place and dates: 
Bielefeld University, Germany 
*** 22-24 February 2012 ***

Topics: 
-Random effects models 
-Hierarchical Models 
-ANOVA vs. Mixed models 
-Model validation 
-Unbalanced designs 
-Outliers and data transformation, correlated predictors 
-Interpretation, visualisation and reporting

Experts: 
-Hugo Quené (Utrecht University) 
-Dale Barr (Glasgow University) 
-Holger Mitterer (Max Planck Institute for Psycholinguistics, Nijmegen) 
-Shravan Vasishth (Potsdam University) 
-Marco van de Ven (Radboud Universität)

Registration: 
To request a place, please send an email with a short description of your work and a motivation for participating in the workshop to BiMM2012-gM/Ye1E23mwN+BqQ9rBEUg&amp;lt; at &amp;gt;public.gmane.org by 
*** 22 January 2011 ***. 
The workshop fee (50 EUR) should be paid after receiving confirmation from the organisers.

For further information please see http://www.spectrum.uni-bielefeld.de/BiMM2012/ 

With best regards,
the organizers: 
Annett Jorschick 
Helene Kreysa 
Zofia Malisz 
Andreas Windmann
Marcin Włodarczak 


*********************************************************************************************
Helene Kreysa
Post-doctoral researcher
Language and Cognition Group

Cognitive Interaction Technology (CITEC)
Room H1-132 
Morgenbreede 39
33615 Bielefeld
Germany

tel: +49 (0)521 106 12248 (office)





&lt;/pre&gt;</description>
    <dc:creator>Helene Kreysa</dc:creator>
    <dc:date>2012-01-03T09:37:34</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/496">
    <title>Collinearity and centering multi-level (more than 2levels) fixed predictors</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/496</link>
    <description>&lt;pre&gt;Dear r-lang users,

I have a set of binary data from a 2 by 3 design study. I centered the
two-level predictor ('LHcenter': local &amp;amp; LD) but did not centered the
3 level predictor ('cond': A, B, &amp;amp; C). As you can see below in the
triangular matrix towards the end of the lmer output, there is
significant collinearity; the absolute values of some of the
correlations are above 0.6.

---8&amp;lt;----------------------------------------------------------------------------------------------------------------------------

Generalized linear mixed model fit by the Laplace approximation
Formula: true ~ cond * LHcenter + (1 | subject) + (1 | items)
   Data: offlineTarget
   AIC   BIC   logLik deviance
  747.8 785.9 -365.9    731.8

Random effects:
 Groups        Name    Variance     Std.Dev.
 items     (Intercept)     0.13838     0.37199
 subject   (Intercept)    1.44652     1.20271
Number of obs: 864, groups: items, 30; subject, 29

Fixed effects:
                          Estimate    Std. Error     z value      Pr(&amp;gt;|z|)
(Intercept)           -0.09775      0.30932      -0.316      0.75199
condB                  0.62445      0.25534       2.446      0.01446   *
condC                  0.70552      0.27286       2.586      0.00972   **
LHcenter              4.86821      0.42482      11.459      &amp;lt; 2e-16   ***
condB:LHcenter   -2.31392      0.51162      -4.523      6.1e-06    ***
condC:LHcenter   -1.00381      0.54643      -1.837       0.06621   .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
                       (Intr)    condB    condC    LHcntr    cnB:LH
condB            -0.526
condC            -0.491    0.607
LHcenter        -0.085    0.130      0.118
cndB:LHcntr    0.070   -0.053     -0.093      -0.801
cndC:LHcntr    0.066   -0.093      0.033      -0.735     0.607
---8&amp;lt;----------------------------------------------------------------------------------------------------------------------------

I also tested whether the interaction is significant by comparing the
full model with one without the interaction term:

glmer(true ~ cond * LHcenter + (1 | subject) + (1 | items),
data=offlineTarget, family=binomial)
glmer(true ~ cond  +  LHcenter + (1 | subject) + (1 | items),
data=offlineTarget, family=binomial)

And I observed a significant difference, suggesting that there is
significant interaction.

So I proceeded to conduct planned comparisons. To do so, I created a
new predictor in the table object by merging the two factors (a 3
level factor and a 2 level factor) together, resulting in a new factor
named 'posthoc_cond' with 6 levels: Local_A, Local_B, Local_C, LD_A,
LD_B, &amp;amp; LD_C

I conducted glmer WITHOUT centering the 6-level fixed predictor 'posthoc_cond'.

posthoc_result = glmer(true~posthoc_cond + (1|subject) + (1|item),
data=offlineTarget, family="binomial")

and then use glht() from 'multcomp' to conduct paired comparisons.

The problem with posthoc_result, again, is high collinearity (see below)
---8&amp;lt;----------------------------------------------------------------------------------------------------------------------------
Correlation of Fixed Effects:
                         (Intr)   pst_A_   p_B_LD   pst_B_   p_C_LD
psthc_cndA_    -0.865
psthc_cB_LD    -0.835  0.712
psthc_cndB_    -0.944  0.896         0.734
psthc_cC_LD    -0.928  0.810        0.779      0.896
psthc_cndC_     -0.891  0.925        0.710      0.914      0.834
---8&amp;lt;----------------------------------------------------------------------------------------------------------------------------

So my question is, in my case, what can be done to reduce
collinearity. It seems that centering those multi-level predictors is
not applicable in my case since I am interested in whether different
levels of the fixed predictors have different means. Centering these
multilevel predictors would not allow me to test that.

Thank you in advance for your help!!


Best,
Xiao


&lt;/pre&gt;</description>
    <dc:creator>Xiao He</dc:creator>
    <dc:date>2011-11-27T23:57:42</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/490">
    <title>Figuring out maximum random effects in mixed-effectregression models</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/490</link>
    <description>&lt;pre&gt;To whom it may concern:

I would like to use a mixed-effect regression approach to examine the
adaptation effect (i.e., learning) during sentence comprehension.

Right now, I am trying to figure out the maximum random effects.

One of the models produced the following summary



Random effects:

Groups

Name

Variance

Std.Dev.

Corr

Subject

(Intercept)

19867.54

140.952





cPrimeType

367.20

19.162

-1.000



cCondition

5558.49

74.555

-1.000



clogLength

8128.49

90.158

0.954



cLogPresOrder

3033.62

55.078

-0.216



cPrimeType:cCondition

17606.58

132.690

0.164



cPrimeType:cCondition:cPresOrder

351.92

18.759

0.142

Item

(Intercept)

2870.65

53.578





cCondition

3391.10

58.233

0.668



cCondition:cPrimeType

1224.53

34.993

-0.652

Residual



31654.56

177.917





Baayen, Davidson, and Bates (2008) noted that "the high correlation of the
intercept and slope for the subject random effects (-1.00) indicates that
the model has been overparameterized" (p. 395).

So, I inspected the correlations and identified three high correlations
from the table. Therefore, I simplified the model by removing the
by-subject adjustments to the slopes of cPrimeType, cCondition, and
clogLength.

The simplified model produced the following summary.



Random effects:

Groups

Name

Variance

Std.Dev.

Corr

Subject

(Intercept)

19443.07

139.438





cLogPresOrder

2832.97

53.226

-0.341



cPrimeType:cCondition

1043.35

32.301

-0.133



cPrimeType:cCondition:cPresOrder

204.57

14.303

0.364

Item

(Intercept)

2935.82

54.183





cCondition

3545.04

59.540

0.740



cCondition:cPrimeType

1878.62

43.343

-0.656

Residual



36990.34

192.329




Now, the correlation values seem to be in the range of acceptable values
(i.e., not too high).

Finally, in order to verify that the simpler model is justified, I carried
out a likelihood ratio test.




Df

AIC

BIC

logLik

Chisq Chi

Df

Pr(&amp;gt;Chisq)

riming.lmer

27

22598

22744

-11272







rev_priming.lmer

45

22474

22718

-11192

159.35

18

&amp;lt; 2.2e-16 ***
 ---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

My understanding is that this likelihood ratio test indicates that the
removal of the by-subject adjustments to the slopes is NOT justified.

My question is which model should I choose to report my results?
priming.lmer or rev_priming.lmer?

Thank you very much for your suggestions on this matter in advance!

Best,
Sunfa
&lt;/pre&gt;</description>
    <dc:creator>Sunfa Kim</dc:creator>
    <dc:date>2011-11-19T11:14:18</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/486">
    <title>Helmert (or really any) contrasts, log-likelihood,and specificity</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/486</link>
    <description>&lt;pre&gt;Hello everyone,

I'm trying to set up a model with a three-way contrast: between linguistic,
non-linguistic, and control. I'd like to compare all three to each other,
but for the moment, I'll focus on one set of Helmert contrasts:

  [,1] [,2]
l   -1   -1
n    1   -1
c    0    2

This should compare 1) linguistic to non-linguistic, and 2) 2 times control
to the average of linguistic and non-linguistic. So far so good.

My "baseline" model includes some control predictors, and more importantly,
random intercepts and slopes for subject and item. The slopes are by
"condition," and in this case I've used the contrasts, as follows:

baseline&amp;lt;-lmer(logRT~(1+contr1+contr2|subject)+(1+contr1+contr2|item)+controlpredictors)

This, in order to specify the "maximal" structure allowed/justified by the
data. So far still so good (right?).

Then I add the fixed effect contrasts:

ofInterest&amp;lt;-update(baseline,.~.+contr1+contr2)

The fixed effect output of this type of model indicates that, for one set
of RT's, the t-value for both contrasts "should be" significant. The output
for an identical model for a different set of RT's indicates that one (but
not the other) "should be" significant.

To get a usable significance, I have to use log-likelihood tests, because
pvals.fnc() won't work with random slopes (or is it just covariances?
Regardless...). But because my model comparison includes both contr1 and
contr2, the test isn't going to spit out individual significance values for
each, t-values or not. Is there some way to tell when contr1 is
significant, independent of the significance of contr2, in a model like the
one I've described? Can I rely on those t-values at all?

I'm happy to provide more information if it would clarify anything I've
muddied.

Jason
&lt;/pre&gt;</description>
    <dc:creator>Jason Kahn</dc:creator>
    <dc:date>2011-11-10T21:49:07</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/485">
    <title>pvals.fnc and F statistic</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/485</link>
    <description>&lt;pre&gt;Hello all,

I wonder if anyone can help me with a question about pvals.fnc and lmer.

I'm running some re-analyses on a dataset analysed some years ago (2007), working with the languageR package.

In the earlier analyses, to obtain values for reporting, we used the then current version of pvals.fnc, with the following code:

x=pvals.fnc(model.lmer)
x$summary
x$anova

x$summary gave outputs like:

#                  Estimate Std.Error  DF t.value   pvals ci950 ci990 ci999
#(Intercept)        42.4514    2.1493 973  19.751 0.00000  TRUE  TRUE  TRUE
#typePsPr          -21.2240    1.7903 973 -11.855 0.00000  TRUE  TRUE  TRUE
#stressPN           -5.0640    0.8967 973  -5.648 0.00000  TRUE  TRUE  TRUE
#MorDMis             2.5627    1.7903 973   1.431 0.15275 FALSE FALSE FALSE
#typePsPr:stressPN   5.8140    1.0298 973   5.646 0.00000  TRUE  TRUE  TRUE
#stressPN:MorDMis    1.6569    1.0296 973   1.609 0.10794 FALSE FALSE FALSE

while x$anova gave e.g.

#            Df  SumSq MeanSq Denom         F   pvals
#type         1 7379.4 7379.4   973 113.83044 0.00000
#stress       1  398.6  398.6   973   6.14858 0.01332
#MorD         1  256.3  256.3   973   3.95354 0.04705
#type:stress  1 2069.7 2069.7   973  31.92602 0.00000
#stress:MorD  1  167.9  167.9   973   2.58993 0.10787

These days, based on a more recent version of languageR, my usual practice when running lmer is to do:

pvals.fnc(model.lmer)$fixed

This gives an output broadly similar to that of x$summary above, but with two sets of p values, one based on Monte Carlo Markov chains.

What I can't seem to obtain is an output like that of x$anova above, i.e. one that includes SumSq, MeanSq, df and F.

Does anyone know if this is possible?

If not, what can I report as the equivalent of (in ANOVA) the df, F and p for a main effect or interaction term?

Any advice very gratefully received.

Best wishes,

Rachel
&lt;/pre&gt;</description>
    <dc:creator>Rachel Smith</dc:creator>
    <dc:date>2011-11-09T00:34:33</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/475">
    <title>Main effects of categorical predictors in lmer</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/475</link>
    <description>&lt;pre&gt;Dear R users,

I’m using mixed effects models (lmer) to predict a binary dependent variable
as a function of 1.a categorical predictor (A)with 2 levels (A1 and A2) , 2.
another categorical predictor (B) with three levels (B1, B2 and B3) and 3.
The interaction between these two predictors. I have tried two models but
they return different results and I’m not sure which one is correct. I’m
interested in the main effect of B and the interaction between A and B
(because A alone has a significant effect in both models). My problem is
that there seem to be two sensible ways of examining the main effect of B:
1. to helmert code and 2. to center.  But these two methods produce opposite
results! I don’t know which one I should use. Here are the two models with
some details and their outputs:


Model 1: ‘A’ is centered. ‘B’ is helmert coded (‘B1’(baseline)=2, ‘B2’=-1,
‘B3’=-1) so that I can get a main effect of B by checking to see whether
baseline condition in B differs from the mean of B1 and B2 . The lmer output
returns a significant effect of B and no significant AxB interaction.
However, as is highlighted below (in pink), the correlation between B and
the ‘AxB’ interaction is high (-54%).





Generalized linear mixed model fit by the Laplace approximation

Formula: response ~ A * B+ (A + 1 | sub) + (1 | item)

   Data: mydata

 AIC   BIC logLik deviance

 783 822.6 -383.5      767

Random effects:

 Groups Name        Variance Std.Dev. Corr

 item   (Intercept) 0.7293   0.85399

 sub    (Intercept) 2.0871   1.44468

        A          1.3812   1.17524  0.562

Number of obs: 1038, groups: item, 42; sub, 36



Fixed effects:

                  Estimate Std. Error z value Pr(&amp;gt;|z|)

(Intercept)        1.05261    0.30283   3.476 0.000509 ***

A                -3.91080    0.32239 -12.131  &amp;lt; 2e-16 ***

B                  0.36128    0.09751   3.705 0.000211 ***

A:B            -0.29638    0.18681  -1.586 0.112626

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Correlation of Fixed Effects:

                        (Intr)   A      B

A                  0.155

B                  0.160 -0.278

A:B              -0.156  0.238 -0.540



Model 2: ‘A’ and ‘B’ are both centered. The lmer output returns no
significant effect of B but the A:B interaction is significant. The
correlations between predictors are generally lower and the correlation
between B and A:B is reduced to -26%.


Generalized linear mixed model fit by the Laplace approximation

Formula: resonse ~ A * B + (A + 1 | sub) + (1 | item)

   Data: mydata

   AIC   BIC logLik deviance

 756.1 795.7 -370.1    740.1

Random effects:

 Groups Name        Variance Std.Dev. Corr

 item   (Intercept) 0.87028  0.93289

 sub    (Intercept) 2.41707  1.55469

        A         1.23669  1.11206  0.533

Number of obs: 1038, groups: item, 42; sub, 36



Fixed effects:

                Estimate Std. Error z value Pr(&amp;gt;|z|)

(Intercept)       1.1004     0.3239   3.398 0.000679 ***

A               -4.0941     0.3248 -12.605  &amp;lt; 2e-16 ***

B                -0.1461     0.1400  -1.043 0.296851

A:B             1.7923     0.2818   6.360 2.01e-10 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Correlation of Fixed Effects:

                       (Intr)      A        B

A                    0.138

B                   -0.148  0.185

A:B                0.106 -0.292 -0.265



I personally think Model 2 is better but the thing is that I have centered a
categorical predictor with *three* levels. In my searches in the web, I have
never seen a three-level predictor to be centered; they were all two-level
categorical predictors.

I have used the scale() function to center the predictors (I first converted
them to numeric variables and then used the scale () function to center
them). As I mentioned, my problem is that I don’t know how to get a main
effect of B as well as a *main* A:B interaction. On the one hand, it seems
logical to compare ‘B1’ (baseline) with the mean of the other two B
conditions to see if the B manipulation has a general effect. On the other
hand, I hear that one needs to center variables to get a main effect.


I would be grateful of you could please help me


Regards,


Hossein
&lt;/pre&gt;</description>
    <dc:creator>hossein karimi</dc:creator>
    <dc:date>2011-10-10T14:05:10</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/472">
    <title>Advice needed on reading large text file in R</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/472</link>
    <description>&lt;pre&gt;I have a text file encoded in CSV format that I plan to run a series of
linguistic analysis upon using R. My first thought is to use the read.csv
function to get the data into R. But suspect that this might be naive as the
file itself contains over 250,000 records and occupies over 1.5Gbytes of
disk space in its raw form. Each record has a large text field that extends
of multiple lines of the file; in some records this is equivalent to a few
paragraphs in other the text would run to pages. There is a second multiline
text field but this is much shorter --- maybe 20 different phrases. There
appear to escaped quotes throughout; though with a file this size verifying
this is difficult and some text editor crash trying to read it all in. I am
therefore dubious that read.csv is the right mechanism and seeking a better
method.

I am open to suggestions for input method and even segmentation if
necessary, provided segmentation does not prevent analysis of the entire
data. Would access to the records by R be any quicker if I pre-processed the
file into some relational/object database prior to analysis?

I will note in passing I did not collect the original data neither did I
decide to use CSV. This is the format in which the file was given to me for
the purposes of my study. CSV is the worst possible format for the
originator to have used for such a large collection of text.

Regards, Trevor.

&amp;lt;&amp;gt;&amp;lt; Re: deemed!
&lt;/pre&gt;</description>
    <dc:creator>Trevor Jenkins</dc:creator>
    <dc:date>2011-09-08T12:34:53</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/467">
    <title>contrast coding and lrm</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/467</link>
    <description>&lt;pre&gt;Dear R users,

We created a logistic regression model to investigate the influence of different factors on Dutch word order variation. To avoid collinearity and to increase interpretability of the effects, we decided to center the predictors. For one predictor with two levels and one ordinal (5 point scale) predictor, we did this by subtracting the mean; for a third predictor with three levels, we used contrast coding. The levels of the latter predictor (called PP_TYPE_3) were coded as follows:

     [,1] [,2]
abs     1    0
loc     0    1
temp   -1   -1

The model summary gives us the following:

                  Coef     S.E.    Wald Z P     
Intercept         -0.26760 0.10303 -2.60  0.0094
cDEF3S            -0.27503 0.06216 -4.42  0.0000
cANIM2_S          -0.17869 0.18324 -0.98  0.3295
PP_TYPE_3=loc      0.60699 0.13471  4.51  0.0000
PP_TYPE_3=temp     0.08269 0.12773  0.65  0.5174
cDEF3S * cANIM2_S -0.38080 0.12298 -3.10  0.0020

The names suggest that the model gives the estimates for the levels 'loc' and 'temp', taking 'abs' as reference level. But this is not what we want, is it? Shouldn't the model have taken the columns of the contrast matrix as the recoded factors?

We are also a bit confused about the correct interpretation of the estimates of the three-level predictor. Do these represent the difference between one level and the intercept (=the grand mean), or between one level and the mean of the other levels of that predictor?

As a side question: is it correct to center the ordinal predictor the way we did?


Thanks for your help!

Best,
Jorrig Vogels
Geertje van Bergen


&lt;/pre&gt;</description>
    <dc:creator>J. Vogels</dc:creator>
    <dc:date>2011-08-30T12:57:21</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/463">
    <title>Question about power</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/463</link>
    <description>&lt;pre&gt;Hi,

I have a question about making inferences when power might be an issue.  I'm examining whether a variable has a significant effect in different parts of the syllable.  To do this, I have 2 different data sets, Onset and Coda, which I'm using to determine if the variable has effects in the syllable onset and coda, respectively.  The variable is significant (very small p-value) in the onset but is marginally significant in the coda (p= .055 in the full model, and model comparison with a baseline model that does not contain this variable gives a p-value of .07).  

While it's always difficult to know how to interpret a marginally significant effect, one issue that complicates the matter is that the Coda dataset has fewer items and trials than the Onset dataset.  One thing that I'd like to do is determine whether the marginal effect could simply be due to a lack of power.  My idea was to take a random sample of the Onset dataset so that it matches the size of the coda dataset and see if the variable of interest remains significant even in this reduced dataset.  I figure that I would need to do this sampling many times (e.g., 10,000 times) to make sure that the effect is robust.

Is this a sensible approach?  Am I going to run into a Type I/II error situation by doing 10,000 model comparisons?

Thank you,
Ariel

&lt;/pre&gt;</description>
    <dc:creator>Ariel M. Goldberg</dc:creator>
    <dc:date>2011-08-03T21:26:13</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/456">
    <title>Positive and negative logLik and BIC in model comparison(lmer)</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/456</link>
    <description>&lt;pre&gt;Dear list,

 I have had a problem with model comparison for several months, so now
 I finally worked up my courage to ask for your help and hope that you
 can settle the question.

 I have frequently encountered positive logLik values and now heard
 that this might be due to bug in the lmer function. However, I also
 recently found Douglas Bates stating that "a positive log-likelihood
 is acceptable in a model for a continuous response" in an S-list.
 Positive logLiks appear in Baayen's 2008 introductory book, always
 together with negative AIC and BIC. He does not seem to treat them as
 erroneous. Instead, if I understood correctly, he chooses the model
 with more negative AIC/BIC (smaller value) and more positive logLik
 (larger value) as the better model in these comparisons.
 So did I get it right and is this the way to go or is there a bug that
 inverts the polarity of the numbers?

 As second question: Is there a general rule of thumb for cases when
 AIC and BIC point into different directions? Does it depend on the
 data set? Or is it a matter of taste how much one wants to avoid
 overfitting? Should one trust the value that agrees with the logLik?

 Many thanks in advance
Anja

&lt;/pre&gt;</description>
    <dc:creator>Anja Arnhold</dc:creator>
    <dc:date>2011-08-01T08:29:40</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/439">
    <title>p-values from pvals.fnc</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/439</link>
    <description>&lt;pre&gt;Dear R-users,

I have been wondering about something with the pvals.fnc function. As we
know, the pvals function gives two p-values, one based on the posterior
distribution (pMCMC) and one based on the t-distribution. In my experience
most of the time the two values are very similar. However, I have recently
come across situations where they are wildly different. I have been
particularly surprised to see t-values above 2 that have associated pMCMC
values that are not even close to significance, while at the same time the
t-distribution based p-value is significant. For example, a recent model I
worked with looked something like this:

model1 = lmer(RT~x*y+(1+x|Subject)+(1|Item)

and gave me a t-value of 2.07 for the interaction, with a pMCMC p-value of
0.4756 and a t-distribution p-value of 0.0381. Obviously I like one of these
better than the other! I know that the latter p-value is anticonservative,
but the magnitude of the discrepancy is nonetheless surprising to me, given
the t-value. I'd be very grateful for any advice on how to proceed in cases
like this. I'm using lme4 version 0.99875-6.

Many thanks,

Jakke
&lt;/pre&gt;</description>
    <dc:creator>Jakke Tamminen</dc:creator>
    <dc:date>2011-07-29T19:58:25</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/437">
    <title>Mixed model for eyetracking data anlaysis</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/437</link>
    <description>&lt;pre&gt;Dear ling-r-lang users,

I'm writing to get some advice on the use of logit mixed model for
eyetracking data anlaysis. My experiment has three IVs with two levels for
each - language (English vs.Korean),stress pattern (trochaic vs.  iambic),
phonation type (aspirated vs.lax), and  one continuous IV, time. And the
DV is binary, either 0 or 1. What I want in the test is where in the time
course of word recognition, the trochaic and iambic words are different in
their activation of the target words, and how they are interacting with
the phonation types in the two language groups.

For this, I first tried logit mixed effect model as below.

lmer(gaze~stress*lg*phonation*time+(1|subj)+(1|item), data
,family=="binomial")

The issue that I had with this model is that it doesn't show interactions
between specific levels of factors. For example, I couldn't test whether
English speakers' behavior for aspiraed trochaic words (default level) is
different from the one for aspirated iambic ones.

So I have made a dummy combinatorial variable column, "int", which
combines the levels of lg, stress pattern, and phontion type (e.g.,
"eiasp" as a combination of English spk's respose for aspirated iambic
words) and ran the model as below:

lmer(gaze~int*time+(1|subj)+(1|item), data, family=="binomial")

My question is whether having such a dummy combinatorial variable is a
legitimate for the mixed effect model. If it's not legitimate, I'd like to
know what's the way to examine the interactions between specified levels
of different factors of interest.

My another question is how can we test where in the timecourse the two
levels of interest are sigificantly different from each other (in terms of
slope change). For this , I have segmented time into every 100ms window
and treated the window as a factor. It looks the outcome supports slope
change in plot (in logit) and compares two levels of interest in each time
window. But again, I'm not sure whether this is a right way to examine the
time effect. If not, what model or approach do I have to make?

My questions might have been arisen by my misunderstanding of the model,
so it would be greatly appreaciated if you would be able to give me your
valuable advice.

Thank you!

Best regards,
Jeonghwa Shin




&lt;/pre&gt;</description>
    <dc:creator>Jeonghwa Shin</dc:creator>
    <dc:date>2011-07-28T17:07:56</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/425">
    <title>Embedding phonetic symbols in R</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/425</link>
    <description>&lt;pre&gt;Hi all,

I'm sure I'm not the only Windows user out there trying to embed IPA
symbols in R graphics and then make a pdf out of it. If anyone knows
how to do that, I would appreciate some help. Because I have not
succeeded.

Here is what I have tried. I'm creating a plot in R with IPA phonetic
symbols, using the standard
font Doulos SIL
(http://scripts.sil.org/cms/scripts/page.php?item_id=DoulosSILfont&amp;amp;_sc=1)

When attempting to create a pdf of this plot in R using the pdf()
function, I will get the following error messages:

Error in text.default(my.x, my.y,  :
 Invalid font type
In addition: Warning messages:
1: In text.default(my.x, my.y,  :
 font family not found in PostScript font database

I've been advised that I can resolve this problem using the Cairo
package for R. But I have not succeeded. Here is my call:


It produces a pdf file, but all the IPA symbols come out as boxes.

Please note that R has no difficulties producing this plot in its
graphic window with IPA symbols. It's only the pdf output I'm having
difficulties with.

Any help would be appreciated! Further details follow below.

R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Cairo_1.4-9

loaded via a namespace (and not attached):
[1] tools_2.13.1

[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

&lt;/pre&gt;</description>
    <dc:creator>Sverre Stausland</dc:creator>
    <dc:date>2011-07-20T19:33:49</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/412">
    <title>lmer: Significant fixed effect only when random slope isincluded</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/412</link>
    <description>&lt;pre&gt;Dear R users,

I have a logit mixed model with two categorical predictors (two types of salience measures) and a categorical dependent variable (pronoun used Y/N). One predictor has 2 levels, and the other has 3. I centered the 2-level predictor, and transformed the 3-level predictor into two binary predictors using contrast (sum) coding. I determined the random-effects structure by starting from a full model, and eliminating step by step all terms without a significant contribution to the model.

In the final model, I end up with random intercepts for subjects and items, and a by-subject random slope for my 2-level predictor. In this model, I get significant interactions between the fixed factors, which I had not expected to be significant by just looking at the data. Removing the random slope from the model completely eliminates these interactions, but model comparison suggests the random slope should be included. I have attached the two model summaries below.

Now my question is: is it normal to find such a large influence of random effects on the fixed effects structure? How do I know the interaction effects are not spurious? And what exactly do these findings mean? Participants varied greatly in their reaction to predictor B, but when this variation is accounted for, predictor B affects pronoun use, but differently for each level of predictor A?


Jorrig Vogels
PhD candidate
Tilburg Univ., Netherlands

================================================================

Model with random slope: 

Generalized linear mixed model fit by the Laplace approximation
Formula: PRO ~ cAGTOP * cAGVIS + (1 + cAGVIS | SUBJ) + (1 | ITEM)
   Data: vislingag
   AIC   BIC logLik deviance
318.4 361.4 -149.2    298.4
Random effects:
Groups Name        Variance Std.Dev. Corr
SUBJ   (Intercept) 49.6457  7.0460
        cAGVIS      21.8342  4.6727   0.663
ITEM   (Intercept)  1.3205  1.1491
Number of obs: 544, groups: SUBJ, 48; ITEM, 12

Fixed effects:
               Estimate Std. Error z value Pr(&amp;gt;|z|)
(Intercept)      -2.578      1.217  -2.117  0.03422 *
cAGTOP1          -6.627      0.913  -7.259 3.90e-13 ***
cAGTOP2           9.868      1.502   6.569 5.05e-11 ***
cAGVIS           -1.699      1.008  -1.685  0.09207 .
cAGTOP1:cAGVIS   -3.223      1.170  -2.755  0.00587 **
cAGTOP2:cAGVIS    3.120      1.371   2.275  0.02289 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) cAGTOP1 cAGTOP2 cAGVIS cAGTOP1:
cAGTOP1      0.075
cAGTOP2     -0.041 -0.867
cAGVIS       0.535  0.108  -0.059
cAGTOP1:AGV  0.074  0.562  -0.346   0.128
cAGTOP2:AGV -0.049 -0.480   0.528  -0.054 -0.668


Model without random slope:

Generalized linear mixed model fit by the Laplace approximation
Formula: PRO ~ cAGTOP * cAGVIS + (1 | SUBJ) + (1 | ITEM)
   Data: vislingag
   AIC   BIC logLik deviance
324.3 358.7 -154.2    308.3
Random effects:
Groups Name        Variance Std.Dev.
SUBJ   (Intercept) 21.63217 4.65104
ITEM   (Intercept)  0.61539 0.78447
Number of obs: 544, groups: SUBJ, 48; ITEM, 12

Fixed effects:
               Estimate Std. Error z value Pr(&amp;gt;|z|)
(Intercept)    -1.41142    0.77639  -1.818   0.0691 .
cAGTOP1        -4.59707    0.52139  -8.817   &amp;lt;2e-16 ***
cAGTOP2         7.13115    0.84489   8.440   &amp;lt;2e-16 ***
cAGVIS         -0.35538    0.40416  -0.879   0.3792
cAGTOP1:cAGVIS -0.59940    0.58255  -1.029   0.3035
cAGTOP2:cAGVIS -0.08268    0.56682  -0.146   0.8840
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) cAGTOP1 cAGTOP2 cAGVIS cAGTOP1:
cAGTOP1      0.070
cAGTOP2     -0.040 -0.867
cAGVIS       0.000  0.061  -0.036
cAGTOP1:AGV  0.038  0.082  -0.012   0.102
cAGTOP2:AGV -0.020  0.008  -0.037   0.037 -0.575&lt;/pre&gt;</description>
    <dc:creator>Jorrig Vogels</dc:creator>
    <dc:date>2011-05-11T09:50:22</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/411">
    <title>Random effect modelling and Zipf distributed corpus data</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/411</link>
    <description>&lt;pre&gt;Dear R-lang-ers,


I am currently trying to do some modelling on corpus extracted data with my students using a random intercept model.

Our model tries to predict the position of French attributive adjectives wrt to their head noun given several variables.
Our setting is a logistic regression with a random variable (random intercept) set on an adjective lemmata variable.

Using lmer(...) we have that :
(1) the distribution of the conditional modes is all but normal.
(2) lmer does not converge properly : (the deviance function is getting flat as the algorithm progresses towards the solution, hence lmer iterates way too many times and yields an overfitted model)

We tried to solve issue 2 : on convergence, we have been able to get a better convergence with the library lme4a.

However problem (1) remains : the words lemmatas being Zipf distributed, most random intercepts have estimates close to 0 (the grand mean).
These random intercepts are also mostly those for the words with low frequency in the data (hapax or quasi hapax  words) for which the standard error of estimation is largest.

Words with a higher frequency have better estimates, and their intercepts apparently follow a normal pattern.
But the overall distribution looks like a mixture of (1) a peak around 0 made mostly (but not only) of poorly estimated ranefs and (2) a flat normal pattern of better estimated ranefs whose apparent mean is clearly much greater than 0.

We tried to fix this,
- by replacing in the data low frequency words  forms (below some given threshold) by a unique word form, say 'hapax' aiming at reducing the above mentioned estimation problem for low frequency words.
- by adding and removing as fixed effect an other word frequency variable in the model (with which the random effect could interact)

yet that does not really help. The distribution remains highly skewed...

This question seems to be a general one, since I suspect people using corpus distributed data should experience similar problems.
I wonder whether someone has already run into similar problems, and which kind of solution he might have found...
any hint would help...

many thanks,
Benoit

&lt;/pre&gt;</description>
    <dc:creator>Benoit Crabbé</dc:creator>
    <dc:date>2011-03-03T13:23:56</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.lang.r.linguistics/410">
    <title>ANOVA type main effects</title>
    <link>http://comments.gmane.org/gmane.comp.lang.r.linguistics/410</link>
    <description>&lt;pre&gt;Hi dear r-lang users,

I have a question about ANOVA type of main effects. From what I've read, one
way to obtain the type of main effect one would get with ANOVA is to do
model comparisons, like below:

mod1&amp;lt;-lmer(dv ~ iv1 + iv2 + (1|subject), data)
mod2 &amp;lt;-lmer(dv ~iv1 + (1|subject), data)
anova(mod1, mod2).

A significant difference indicates that the factor iv2 make significant
contribution to the model.

However, I wonder if there are other ways to obtain the same information.
Specifically, I have a 2X2 design, I wonder if, after I center the two
2-level factors, the results would be equivalent to ANOVA type of main
effects as opposed to simple effects. Thank you in advance!


Xiao
&lt;/pre&gt;</description>
    <dc:creator>Xiao He</dc:creator>
    <dc:date>2011-02-23T19:42:43</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.lang.r.linguistics">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.lang.r.linguistics</link>
  </textinput>
</rdf:RDF>

