gmane.comp.lang.r.general
http://blog.gmane.org/gmane.comp.lang.r.general
hourly11901-01-01T00:00+00:00Gmanehttp://gmane.org/img/gmane-25t.png
http://gmane.org
Randomly sample data frame points relative to raster grid cells
http://comments.gmane.org/gmane.comp.lang.r.general/311820
<pre>In R, I have a raster entitled "raster_crude" with the following details:
class : RasterLayer dimensions : 320, 392, 125440 (nrow, ncol, ncell)
resolution : 0.125, 0.125 (x, y) extent : -152, -103, 30, 70 (xmin, xmax,
ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84
+towgs84=0,0,0
In R, I also have a data frame entitled "rangewide_absences" of longitude
and latitude points, within the extent of raster_crude, and with the
following details:
LON LAT
Min. :-134.3 Min. : 0.00 1st Qu.:-120.3 1st Qu.:39.56
Median :-116.0 Median :43.02
Mean :-115.0 Mean :42.72 3rd Qu.:-110.5 3rd Qu.:46.27
Max. : 0.0 Max. :59.95
Oftentimes, there is more than one point of rangewide_absences falling
within the grid cells of raster_crude. How do I make a new data frame of
points, maintaining the LON and LAT columns, such that only one point is
sampled PER raster grid cell?
I have been using the package "raster." With the function "extract," one
can extract grid cells to poi</pre>Jennifer Gruhn2014-07-22T22:08:55Partition of sums of squares (ANOVA)
http://comments.gmane.org/gmane.comp.lang.r.general/311813
<pre>Hi all r-mailling listers:
Can anyone explain the theory (or the formula) about computing Sum Sq
(color highligh below) related to regression items? The link of Wikipedia (
http://en.wikipedia.org/wiki/Partition_of_sums_of_squares) gives an
introduction on how to calculate the total, model, and regression sum of
squares. Is it similar to the Sum Sq computation? Is the regression sum of
squares equal to (0.000437+ 0.002545+ 0.060984+ 0.062330+ 0.060480)?
Any suggestion will be greatly appreciated.
Thank you!
David
TraingData<-data.frame(
x1=c(3.532,2.868,2.868,3.532,2.868,2.536,3.864),
x2=c(1.992,1.992,1.328,1.328,1.328,1.66,1.66),
y=c(9.040330254,8.900894412,8.701929163,9.057944749,8.701929163,8.74317832,9.10859913)
)
lm.sol<-lm(y~1+x1+x2+I(x1^2)+I(x2^2)+I(x1*x2),data=TraingData)
anova(lm.sol)
Analysis of Variance Table
Response: y
Df *Sum Sq* Mean Sq F value Pr(>F)
x1 1 0.000437 0.000437 0.1055 0.8001
x2 1 0.002545 0.002545 </pre>Marino David2014-07-22T17:53:44Multiple Imputation of longitudinal data in MICE and statistical analyses of object type mids
http://comments.gmane.org/gmane.comp.lang.r.general/311807
<pre>Dear all,
I have a problem with performing statistical analyses of longitudinal data after the imputation of missing values using mice. After the imputation of missings in the wide data-format I convert the extracted data to the longformat. Because of the longitudinal data participants have duplicate rows (3 timepoints) and this causes problems when converting the long-formatted data set into a type mids object. Does anyone know how to create a mids object or something else appropriate after the imputation? I want to use lmer,lme for pooled fixed effects afterwards. I tried a lot of different things, but still cant figure it out.
Thanks in advance and see the code below for a minimal reproducible example:
------------------------------------------------------------
# minimal reproducible example
## Make up some data
set.seed(2)
# ID Variable, Group, 3 Timepoints outcome measure (X1-X3)
Data <- data.frame(
ID = sort(sample(1:100)),
GROUP = sample(c(0, 1), 100, replace = TRUE),
matrix(sample(c</pre>Julian Schulze2014-07-22T08:00:54repeated anova
http://comments.gmane.org/gmane.comp.lang.r.general/311806
<pre>Hi,
I have a problem with doing a repeated measures ANOVA. I will first give
you an idea of what my dataset looks like. We have 20 ponds, and for each
pond we took some individuals (waterfleas), say 10 and tested them along
two treatments (A and B). Now, for each pond we also know the fish
background (Fish versus No Fish) and Land use intensity (High versus Low).
Now, we measure the offspring of the first clutch and second clutch of each
individual, and we want to see if there is an effect of each of the other
factors (treatment, background, land use), so we need to do a repeated
measurement.
So I rearranged my data where I put the first and second clutch in one
column, something like this:
Ponds Treatment Background LandUse Clutch
SizeClutch
Pond1 A Fish High
1 10
Pond1 B Fish High
1 15
Pond2 A </pre>Lynn Govaert2014-07-22T11:59:39Expressing a multinomial GLM as a series of binomial GLMs
http://comments.gmane.org/gmane.comp.lang.r.general/311803
<pre>Dear all,
I am trying to express a multinomial GLM (using nnet) as a series of GLM models.
However, when I compare the multinom() predictions to those from GLM, I see differences that I can´t
explain. Can anyone help me out here?
Here comes a reproducible example:
##
# set up data: (don´t care what they are, just for playing)
set.seed(0)
cats=c("oligolectic","polylectic","specialist","generalist")
explan1=c("natural","managed")
explan2=c("meadow","meadow","pasture","pasture")
multicats=factor(sample(cats,replace=T,100,prob=c(0.5,0.2,0.1,0.5)))
multiplan1=factor(rep(explan1,50))
multiplan2=factor(rep(explan2,25))
########################
library(nnet)
m2=multinom(multicats~multiplan1)
# predictions from multinomial model
predict(m2,type="probs")
########################
# now set up contrasts for response variable "multicats" (which has 4 levels):
ii=as.numeric(multicats)
g1=glm(I(ii%in%c(1,2)) ~ multiplan1, family = "binomial")
g2=glm(I(ii%in%c(2,3)) ~ multiplan1, family = "binomial")
g3=glm(I(ii%</pre>Scherber, Christoph2014-07-22T14:47:17Code formatting question - too ugly?
http://comments.gmane.org/gmane.comp.lang.r.general/311802
<pre>I like to keep the individual lines in my source files relatively
short. Mainly so that I can print them, email them, or display them on
a narrow screen without needing to shift left & right. So, for a
really long character string, such as an SQL query, I do something
like:
query=paste0("select CONVERT(smalldatetime,Int_Start_Date,11) as
Int_Start_Date,",
" CONVERT(smalldatetime,CASE WHEN Int_Start_Time is NULL
then '00:00' ",
"else
LEFT(Int_Start_Time,2)+':'+SUBSTRING(Int_Start_Time,3,2) end +",
"':00', 14) as Int_Start_Time",
", Int_duration, RTRIM(INTTYPE) AS INTTYPE,"
" RTRIM(Int_descr) AS Int_descr",
", RTRIM(INTSUBT) as INTSUBT, "
"INDEXX, RTRIM(Label) AS Label",
", RTRIM(CHANGED) AS CHANGED, "
"RTRIM(ALERT) AS ALERT, "
"RTRIM(RELEASE) AS RELEASE",
" FROM CPINTVL where Int_Start_Date BETWEEN '",
startDateChar,"' and '",endDateChar,"'"
</pre>John McKown2014-07-22T14:36:18lattice -xyplot
http://comments.gmane.org/gmane.comp.lang.r.general/311798
<pre>Dear Community,
Just a short and simple question, but the code does not come to my mind.
I want to plot the Resids by quarterly Data. But there are many quarters.
Just rotate them, does not make the plot clearer/ easier to read.
Is there a possibility to just show every 4th value on the x-axis ? How do I
change the scale?
require(lattice)
xyplot(resid(fixed.reg1.1) ~ quartal,type="h", data=data.plm,scales = list(x
= list(rot = 90)) ,ylim=c(-5,5))
Thanks a lot!
And have a nice day.
Katie
[[alternative HTML version deleted]]
</pre>Katharina Mersmann2014-07-22T10:29:30how to put two plots of scatterplotMatrix side by side in oneplot?
http://comments.gmane.org/gmane.comp.lang.r.general/311797
<pre>Hello Friends,
I want to put two plots of scatterplotMatrix side by side in one plot.
I have tried below command. But somehow below code doesnt join them
together side by side.
Can anybody suggest me a way to join them together side by side.
Regards,
mahe
[[alternative HTML version deleted]]
</pre>MLSC2014-07-22T10:13:142 remaining seats on stats course at Murdoch University
http://comments.gmane.org/gmane.comp.lang.r.general/311796
<pre>There are 2 remaining seats on the following course:
Course: Data exploration, regression, GLM & GAM with
introduction to R.
Location: Murdoch University. Murdoch. Australia
When: 28 July - 1 August, 2014
Course flyer: http://www.highstat.com/Courses/Flyer2014_08Murdoch.pdf
Registration: http://www.highstat.com/CourseReg1.htm
Course website: http://www.highstat.com/statscourse.htm
Kind regards,
Alain
</pre>Highland Statistics Ltd2014-07-22T09:21:55I need help in seeing the code
http://comments.gmane.org/gmane.comp.lang.r.general/311793
<pre>Hi there,
I am Darius and I am taking the R Programming course in Coursera. I have a
problem that I had spent so much looking for the problem. I wrote my code
and I believe that the code works perfectly fine because it produces the
result as what the course demanded. However, when I tried to submit it, it
says that my code is wrong. I do believe I make mistake, but I cannot seem
to find it. the code is as follow:
Complete.R
complete <- function(directory, id = 1:332) {
file <- list.files(directory, full.names=TRUE)
nobs <- c()
for (i in id){
file1 <- read.csv(file[i])
nobs1 <- sum(complete.cases(file1))
nobs <- c(nobs, nobs1)
df <- data.frame(nobs)
}
return(data.frame(id,df))
}
Please give me a hint where I should look at.
Thank you very much for your time and concern. I look forward hearing back
from you.
Sincerely,
Darius Mulia.
[[alternative HTML version deleted]]
</pre>Darius Mulia2014-07-22T04:46:55odd, even indices of a vector
http://comments.gmane.org/gmane.comp.lang.r.general/311785
<pre>Might be a trivial question but how to identify the odd and even indices of a vector?
x = c(1,z,w,2,6,7)
el of odd indices= 1,w,6
el of even indices= z,2,7
given the def of odd and even in https://stat.ethz.ch/pipermail/r-help/2010-July/244299.html
should a loop be used?
for (i in 1: length(x))
if (is.odd(i)) print (i)
Carol
[[alternative HTML version deleted]]
</pre>carol white2014-07-21T20:33:05Maximum likelihood estimation (stats4::mle)
http://comments.gmane.org/gmane.comp.lang.r.general/311784
<pre>Dear R-Community,
I'm trying to estimate the parameters of a probability distribution
function by maximum likelihood estimation (using the stats4 function
mle()) but can't seem to get it working.
For each unit of observation I have a pair of observations (a, r)
which I assume (both) to be log-normal distributed (iid). Taking the
log of both values I have (iid) normally distributed random variables
and the likelihood function to be estimated is:
L = Product(F(x_i) - F(y_i), i=1..n)
where F is the Normal PDF and (x,y) := (log(a), log(r)). Taking the
log and multiplying by -1 gives the negative loglikelihood
l = Sum(log( F(x_i) - F(y_i) ), i=1..n)
However estimation by mle() produces the error "vmmin is not finite"
and "NaN have been created" - even though put bound on the parameters
mu and sigma (see code below).
library("stats4")
gaps <- matrix(nrow=10, ncol=4, dimnames=list(c(1:10),c("r_i", "a_i",
"log(r_i)", "log(a_i)")))
gaps[,1] <- c(2.6, 1.4, 2.2, 2.9, 2.9, 1.7, 1.3, 1.7, 3.8, 4.5)
gaps[,2] <- c</pre>Ronald Kölpin2014-07-21T19:10:241st el of a list of vectors
http://comments.gmane.org/gmane.comp.lang.r.general/311782
<pre>Hi,
If we have a list of vectors of different lengths, how is it possible to retrieve the first element of the vectors of the list?
l = list(c(1,2), c(3,5,6), c(7))
1,3,7 should be retrieved
Thanks
Carol
[[alternative HTML version deleted]]
</pre>carol white2014-07-21T19:55:52Application design.
http://comments.gmane.org/gmane.comp.lang.r.general/311778
<pre>I'm designing an R based application for my boss. It's not much, but
it might save him some time. What it will be doing is reading data
from an MS-SQL database and creating a number of graphs. At present,
he must log into one server to run a vendor application to display the
data in a grid. He then cuts this data and pastes it into an Excel
spreadsheet. He then generates some graphs in Excel. Which he then
cuts and pastes into a Power Point presentation. Which is the end
result for distribution to others up the food chain.
What I would like to do is read the MS-SQL data base using RODBC and
create the graphs using ggplot2 instead of using Excel. I may end up
being told to create an Excel file as well.
My real question is organizing the R programs to do this. Basically
what I was thinking of was a "master" program. It does the ODBC work
and fetches the data into one, or more, data.frames. I was then
thinking that it would be better to have separate source files for
each graph produced. I would use the sourc</pre>John McKown2014-07-22T02:24:22anova.lme
http://comments.gmane.org/gmane.comp.lang.r.general/311772
<pre>I would like to know the sum of squares for each term in my model. I used
the following call to fit the model
fit.courseCross <- lme(fixed= zGrade ~ Rep + ISE
+P7APrior+Female+White+HSGPA+MATH+Years+Course+Course*P7APrior ,
random= ~1|SID,
data = Master.complete[Master.complete$Course != "P7A",])
and called an anova on it and get:
anova(fit.courseCross)
numDF denDF F-value p-value
(Intercept) 1 58161 1559.6968 <.0001
Rep 1 58161 520.7263 <.0001
ISE 1 6266 21.3713 <.0001
P7APrior 2 58161 358.4827 <.0001
Female 1 6266 89.2614 <.0001
White 1 6266 235.9984 <.0001
HSGPA 1 6266 1156.4116 <.0001
MATH 1 6266 1036.1354 <.0001
Years 1 58161 407.6096 <.0001
Course 12 58161 68.9875 <.0001
P7APrior:Course 24 58161 10.2464 <.0001
The documentation for anova.lme says:
When only one fitted model object i</pre>Robert Lynch2014-07-21T19:40:56Error message for corAR1()
http://comments.gmane.org/gmane.comp.lang.r.general/311768
<pre>Hi,
I am trying to answer the see if density.km (response) is affected by Direction (continuous, integer), Layer (nominal with 12 levels) and direction (nominal with 8 levels). There is an interaction between Layer and Direction. Platform.field is a list of 9 different platforms and is being treated as a random effect.
I had previously ran this model without the correlation argument and checked the residuals in an acf plot which showed a high level of autocorrelation.
I am having issues applying the correlation argument into my model but keep getting an error message and am not sure what else to try. I should mention that Direction is being applied as the time covariate because each distance has an associated time stamp.
G2<-gamm(density.km~f.Layer+direction+s(Distance.A,by=f.Layer),
random=list(Platform.field=~1),corr=corAR1(form=~Distance.A|Platform.field/f.Layer),
family=poisson,data=fish1)
Maximum number of PQL iterations: 20
iteration 1
Error in Initialize.corAR1(X[[2L]], ...) :
covariate </pre>Wilson, Jenny2014-07-21T16:22:20duplicated rows of a matrix
http://comments.gmane.org/gmane.comp.lang.r.general/311767
<pre>Hi,
is it possible to find the duplicated rows of a matrix without a loop or i have to loop over the rows? duplicated doesn't seem to be helpful
Thanks
Carol
[[alternative HTML version deleted]]
</pre>carol white2014-07-21T14:54:04Estimation of Zero Inflated Over dispersed Beta Binomial UsingglamADMB()
http://comments.gmane.org/gmane.comp.lang.r.general/311766
<pre>Dear All,
I have been facing problem running the following code by using
---glamADMB()--
glmmadmb(y_zibb~x+factor(z)+g, data= data_mis_model, family =
"betabinomial", link = "logit", zeroInflation=T)
where "y_zibb" contains zero inflated Beta Binomial response ,
"x" is a normal random variate
"z" is a binomial random variate
"g" is a exponential random variate
the error message----
Error in glmmadmb(y_zibb ~ x + factor(z) + g, data = data_mis, family =
"betabinomial", :
The function maximizer failed (couldn't find STD file) Troubleshooting
steps include (1) run with 'save.dir' set and inspect output files; (2)
change run parameters: see '?admbControl'
In addition: Warning message:
running command 'C:\Windows\system32\cmd.exe /c
"C:/Users/rajibulmian/Documents/R/win-library/3.1/glmmADMB/bin/windows64/glmmadmb.exe"
-maxfn 500 -maxph 5 -noinit -shess' had status 22
I have tried the "change run parameters: see '?admbControl'" in different
combinations but couldn't help. I am giving part of the data w</pre>Rajibul Mian2014-07-21T15:38:14Weight, weight - do tell me
http://comments.gmane.org/gmane.comp.lang.r.general/311765
<pre>This is a question only about terminology.
Suppose I have data categorized by three factors A, B, and C, with cell means ybar_ijk and cell frequencies n_ijk, where I, j, and k index A, B, and C respectively. And suppose I want to summarize the results for factor A by computing some sort of weighted means WM_i, averaging over indices j and k with weights w_jk. Consider these four weighting schemes:
1. Use equal weights, w_jk = 1
2. Use weights of w_jk = n_+jk (where "+" shows I summed over that index)
3. Use weights w_jk = n_+j+ * n_++k (outer product of the one-factor marginal frequencies)
4. Use weights w_ijk = n_ijk (only one where we use a different set of weights for each i)
Scheme 1 yields the "unweighted" or "least-squares" means, and scheme 4 yields the ordinary means for A, ignoring B and C altogether. Scheme 3 yields weighted averages over k of weighted averages over j (or vice versa).
My question is what to call these schemes, e.g., as a character argument in an R fu</pre>Lenth, Russell V2014-07-21T15:13:32Generating nonlinear Poisson time series data
http://comments.gmane.org/gmane.comp.lang.r.general/311764
<pre>We are attempting to create a short Poisson time series (between 10 and 50
datapoints) for a simulation. We want these time series to have no counts
of over 100 and not be zero-inflated. We also are trying generate various
nonlinearities, particularly of a cyclic nature. We have been attempting to
generate this data using higher order polynomials. An example with a
seventh-order trend (and a treatment effect):
Time <- 0:(T-1) ##T is the desired number of time points in the time series.
beta <- c(B0 = 1.585, B1 = Btrt, B2 = 1.229, B3 = -3.364e-01, B4 =
-6.610e-02, B5 = 2.697e-02, B6 = -2.905e-03, B7 = 1.304e-04, B8 =
-2.130e-06)
pmat <- cbind(const = 1, tx = tx, Time = Time, Time2 = Time^2, Time3 =
Time^3, Time4 = Time^4, Time5 = Time^5,Time6 = Time^6, Time7 = Time^7)
##Btrt is the treatment effect
y <- pmat %*% beta
y <- rpois(T, exp(y))
This code works. However, when manipulating the factors such as the length
of the time series, the same beta coefficients do not always produce the
same desirable prope</pre>Kristynn Sullivan2014-07-21T17:46:36Semi Markov warnings ( for dummies)
http://comments.gmane.org/gmane.comp.lang.r.general/311759
<pre>*Hello,*
I never worked with R before my supervisor asks me to run a semiMarkov
analysis a month ago. After a long struggle, to date, the code works, but I
still get some warnings. However, because of my lack of knowledge in R I am
not possible to figure out the problems or say anything about the influence
of these warnings on my outcome. Hopefully, someone would help me with
these ( I think basic) questions.
What is the case? I want to do a semiMarkov analysis with 3 states (state
n, state s and state e). Wherefore I want to run the analysis to see
whether there is a difference in Hazard Ratio between transitions nÃ e and sÃ e.
Therefore, Iâve got data of nearly 60 persons. In excel, every worksheet
reflects a person. To answer my research question I tried to run the
script for one person.
Unfortunately, I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "state", value = "s") :
replacement has 1 row, data has 0
Called from: `$<-`(`*tmp*`, "state", value = "s")
Furthermore,</pre>M.A. Pet2014-07-21T14:04:02Search EngineSearch the mailing list at Gmanequery
http://search.gmane.org/?group=$group=gmane.comp.lang.r.general