I'm very confused by this and I'm sure it's something simple, hopefully someone can point me in the right direction.
I am working on a text mining project with the TM package and when I run the code in the console it works perfectly however when I call the function itself, the final output is empty.
Here's some sample code:
func <- function(filename, count=100, full=FALSE){
packages <- c("ggplot2", "tm")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
library(tm)
library(ggplot2)
## get data
data <- read.csv(filename)
##Create corpus and remove formatting from text
Tickets <- Corpus(DataframeSource(data))
Tickets = tm_map(Tickets, removePunctuation)
Tickets = tm_map(Tickets, tolower)
##Create stopwords vector to remove complete list from data
stopwords <- read.csv("stopwords.csv", header=FALSE)
stopwords <- as.character(stopwords[,1])
stopwords <- c(stopwords("english"), stopwords)
## create full analasis of whole data, if selected by user
if(full==TRUE){
Tickets = tm_map(Tickets, PlainTextDocument) ##convert back to a text document we can analyse
Tickets.TDM <- TermDocumentMatrix(Tickets) ## create matrix for analysis
TDM.frame <- data.frame(as.matrix(Tickets.TDM))
write.csv(TDM.frame, "Full_queue_analysis.csv")
}
## Remove Stopwords and irrelevant data then convert to TDM for analysis
Tickets = tm_map(Tickets, removeWords, stopwords)
Tickets = tm_map(Tickets, removeNumbers)
Tickets = tm_map(Tickets, stripWhitespace)
Tickets = tm_map(Tickets, PlainTextDocument)
Tickets.TDM <- TermDocumentMatrix(Tickets)
## matrix to frame for additional calculations
TDM.frame <- data.frame(as.matrix(Tickets.TDM))
##count each word once word per entry and only display those which count more than user specified amount
Counts.df <- data.frame(rowSums(TDM.frame > 0))
colnames(Counts.df) <- "count"
Counts.df <- subset(Counts.df, count > count)
## create csv file for final counts
write.csv(Counts.df, "Queue_analysis.csv")
##Print basic analysis based on user option
cat("Terms which appear more than",count,"times:")
findFreqTerms(Tickets.TDM, count)
Things seem to go wrong when initialising the Counts.df vector, I can run this perfectly through the console and it populates with the correct data however when run in the funciton it is completely empty, though it does exist.
There's no errors and the function ends as expected but when opening the csv file, it's empty with just the "count" header.
Thanks for any advice!
Edit - added function itself, sorry!