I realize that there is a surfeit of posts on SE/SO about optimizing one's R for() loops. I'm afraid, though, that I don't understand all of the guidance in the answers there, or how to apply their lessons to the loop I'm working on.
First, the code:
HHCovarEvals <- data.frame()
for(k in 1:length(hhcovarmodels))
{
q.lm <- lm(as.formula(c("DeltaHHCGI~",hhcovarmodels[k])))
q.bptest <- bptest(q.lm)
HHCovarEvals[k,1] <- hhcovarmodels[k]
HHCovarEvals[k,2] <- length(split(seq(nchar(hhcovarmodels[k])),unlist(strsplit(hhcovarmodels[k],"")))[['+']])+2
HHCovarEvals[k,3] <- summary(q.lm)$r.squared
HHCovarEvals[k,4] <- summary(q.lm)$adj.r.squared
HHCovarEvals[k,5] <- AIC(q.lm)
HHCovarEvals[k,6] <- BIC(q.lm)
HHCovarEvals[k,7] <- NA
HHCovarEvals[k,8] <- max(colldiag(q.lm)$condindx)
if(HHCovarEvals[k,2]==2) HHCovarEvals[k,9]<-NA else
HHCovarEvals[k,9] <- max(vif(q.lm))
HHCovarEvals[k,10] <- q.bptest$statistic
HHCovarEvals[k,11] <- q.bptest$p.value
q.moran <- lm.morantest(q.lm,q.listw)
HHCovarEvals[k,12] <- q.moran$statistic[1,1]
HHCovarEvals[k,13] <- q.moran$p.value[1,1]
q.lagrange <- lm.LMtests(q.lm,q.listw,test=c("LMerr","RLMerr","LMlag","RLMlag","SARMA"))
HHCovarEvals[k,14] <- q.lagrange$LMerr$statistic
HHCovarEvals[k,15] <- q.lagrange$LMerr$p.value
HHCovarEvals[k,16] <- q.lagrange$RLMerr$statistic
HHCovarEvals[k,17] <- q.lagrange$RLMerr$p.value
HHCovarEvals[k,18] <- q.lagrange$LMlag$statistic
HHCovarEvals[k,19] <- q.lagrange$LMlag$p.value
HHCovarEvals[k,20] <- q.lagrange$RLMlag$statistic
HHCovarEvals[k,21] <- q.lagrange$RLMlag$p.value
HHCovarEvals[k,22] <- q.lagrange$SARMA$statistic
HHCovarEvals[k,23] <- q.lagrange$SARMA$p.value
q.err <- errorsarlm(formula=as.formula(c("DeltaHHCGI~",hhcovarmodels[k])),data=deltasnna,q.listw,tol.solve=1.0e-18)
HHCovarEvals[k,24] <- q.err$LL
HHCovarEvals[k,25] <- 2*(q.err$parameters-2)-2*q.err$LL
HHCovarEvals[k,26] <- -2*q.err$LL+(q.err$parameters-2)*log(nrow(deltasnna))
HHCovarEvals[k,27] <- summary(q.err)$Wald1$statistic
HHCovarEvals[k,28] <- summary(q.err)$LR1$statistic[1]
q.bptest <- bptest.sarlm(q.err)
HHCovarEvals[k,29] <- q.bptest$statistic
HHCovarEvals[k,30] <- q.bptest$p.value
q.durbin <- lagsarlm(formula=as.formula(c("DeltaHHCGI~",hhcovarmodels[k])),data=deltasnna,q.listw,tol.solve=1.0e-18,type="mixed")
durbin.test <- LR.sarlm(q.durbin,q.err)
HHCovarEvals[k,31] <- durbin.test$statistic
HHCovarEvals[k,32] <- 1-pchisq(durbin.test[[1]][1],q.lm$rank-1)
HHCovarEvals[k,33] <- (summary(q.err)$LR1$statistic)[1]
HHCovarEvals[k,34] <- (summary(q.err)$LR1$p.value)[1]
q.lmtest <- lm.LMtests(q.err$residuals,q.listw,test="LMerr")
HHCovarEvals[k,35] <- q.lmtest$LMerr$statistic[1,1]
HHCovarEvals[k,36] <- q.lmtest$LMerr$p.value[1,1]
}
This loop takes hhcovarmodels, a list of the right-hand side of 5000 linear models (as characters, e.g. "Intercept + X + Y + Z", fits lm objects and writes some diagnostics to a data frame, and then fits errorsarlm objects (the Spatial Error Model) and writes some other diagnostics to the same data frame.
One suggestion I've seen elsewhere is to instantiate the output data frame to its full size before running the loop, as opposed to growing it by accretion. Would this be properly accomplished with something like
HHCovarEvals <- data.frame(matrix(ncol=36,row=5000))
?
I've also seen it suggested that any code that can be vectorized should be vectorized. I'm afraid that I'm not sure what this means in my own example. Are there components that could clearly be vectorized?
Does my repeated use of
as.formulato take a character-formatted definition of the right-hand side of these models and fitlmanderrorsarlmslow things down?
(For reference, with 5000 iterations, this operation takes about 3 hours.)
- I've also seen it suggested that it may be faster to write the results as a vector and then add that vector to the data frame. I understand how and why this might work if the result is a column that needs to be added to the output data frame. In this instance, though, my loop adds rows to the data frame, and I can't see how I would add to the data frame, except with
rbind, which seems inefficient.
My apologies if the code is so profoundly ugly that it causes you aesthetic or moral harm =) I'm a beginner.
Thanks in advance ...