2

I have a couple of R scripts that processes data in a particular input folder. I have a few folders I need to run this script on, so I started writing a bash script to loop through these folders and run those R scripts.

I'm not familiar with R at all (the script was written by a previous worker and it's basically a black box for me), and I'm inexperienced with passing variables through scripts, especially involving multiple languages. There's also an issue present when I call source("$SWS_output/Step_1_Setup.R") here - R isn't reading my $SWS_output as a variable, but rather a string.

Here's my bash script:

#!/bin/bash

# Inputs
workspace="`pwd`"
preprocessed="$workspace/6_preprocessed"

# Output
SWS_output="$workspace/7_SKSattempt4_results/"

# create output directory
mkdir -p $SWS_output

# Copy data from preprocessed to SWS_output
cp -a $preprocessed/* $SWS_output

# Loop through folders in the output and run the R code on each folder
for qdir in $SWS_output/*/; do
        qdir_name=`basename $qdir`
        echo -e 'source("$SWS_output/Step_1_Setup.R") \n source("$SWS_output/(Step_2_data.R") \n  q()' | R --no-save

done

I need to pass the variable "qdir" into the second R script (Step_2_data.R) to tell it which folder to process.

Thanks!

3
  • 1
    I think it will helpful for you . milanor.net/blog/… Commented Jun 26, 2019 at 17:53
  • In general, it's much better practice to not try to generate R code from bash. Instead, write your R code to retrieve the variables it needs from the environment with R.getenv(), and export those variables from bash, thus making them available to getenv() calls in any other language you run. Commented Jun 26, 2019 at 18:15
  • ...the code-generation approach is inherently open to security vulnerabilities; someone who controlled your filenames (even with no control at all of the contents!) could run arbitrary code. Commented Jun 26, 2019 at 18:16

2 Answers 2

4

My previous answer was incomplete. Here is a better effort to explain command line parsing.

It is pretty easy to use R's commandArgs function to process command line arguments. I wrote a small tutorial https://gitlab.crmda.ku.edu/crmda/hpcexample/tree/master/Ex51-R-ManySerialJobs. In cluster computing this works very well for us. The whole hpcexample repo is open source/free.

The basic idea is that in the command line you can run R with command line arguments, as in:

R --vanilla -f r-clargs-3.R --args runI=13 parmsC="params.csv" xN=33.45

In this case, my R program is a file r-clargs-3.R and the arguments that the file will import are three space separated elements, runI, parmsC, xN. You can add as many of these space separated parameters as you like. It is completely at your discretion what these are called, but it is required they are separated by spaces and there is NO SPACE around the equal signs. Character string variables should be quoted.

My habit is to name the arguments with suffix "I" to hint that it is an integer, "C" is for character, and "N" is for floating point numbers.

In the file r-clargs-3.R, include some code to read the arguments and sort through them. For example, my tutorial's example

cli <- commandArgs(trailingOnly = TRUE) 
args <- strsplit(cli, "=", fixed = TRUE)

The rest of the work is sorting through the args, and this is my most evolved stanza to sort through arguments (because it looks for suffixes "I", "N", "C", and "L" (for logical)), and then it coerces the inputs to the correct variable types (all input variables are characters, unless we coerce with as.integer(), etc):

for (e in args) {
    argname <- e[1]
    if (! is.na(e[2])) {
        argval <- e[2]
        ## regular expression to delete initial \" and trailing \"
        argval <- gsub("(^\\\"|\\\"$)", "", argval)
    }
    else {
        # If arg specified without value, assume it is bool type and TRUE
        argval <- TRUE
    }

    # Infer type from last character of argname, cast val
    type <- substring(argname, nchar(argname), nchar(argname))
    if (type == "I") {
        argval <- as.integer(argval)
    }
    if (type == "N") {
        argval <- as.numeric(argval)
    }
    if (type == "L") {
        argval <- as.logical(argval)
    }
    assign(argname, argval)
    cat("Assigned", argname, "=", argval, "\n")
}

That will create variables in the R session named paramsC, runI, and xN.

The convenience of this approach is that the same base R code can be run with 100s or 1000s of command parameter variations. Good for Monte Carlo simulation, etc.

Sign up to request clarification or add additional context in comments.

5 Comments

Works for me means I think it is secure. If you write a bash script that you own, and nobody else can run it but you, and it uses data files that you provide in your user account, then it is secure enough. You seem to suppose the author is offering the script on the Web to strangers.
What do you think is going to go wrong if I use commandArgs like this: gitlab.crmda.ku.edu/crmda/hpcexample/blob/master/…
The code you linked isn't doing any of the things I'm telling the OP not to do, so I don't know what "advice discouraging your proposed strategy" you're talking about. There's no generation of R code from bash in there at all; just passing content through the argv, which is perfectly safe.
I understand. This was my second -1 today, shouldn't be trying to read these things in a phone.
3

Thanks for all the answers they were very helpful. I was able to get a solution that works. Here's my completed script.

#!/bin/bash

# Inputs
workspace="`pwd`"
preprocessed="$workspace/6_preprocessed"

# Output
SWS_output="$workspace/7_SKSattempt4_results"

# create output directory
mkdir -p $SWS_output

# Copy data from preprocessed to SWS_output
cp -a $preprocessed/* $SWS_output

cd $SWS_output

# Loop through folders in the output and run the R code on each folder
for qdir in $SWS_output/*/; do
        qdir_name=`basename $qdir`
        echo $qdir_name
        export VARIABLENAME=$qdir
        echo -e 'source("Step_1_Setup.R") \n source("Step_2_Data.R") \n q()' | R --no-save --slave

done

And then the R script looks like this:

qdir<-Sys.getenv("VARIABLENAME")
pathname<-qdir[1]

As a couple of comments have pointed out, this isn't best practice, but this worked exactly as I wanted it to. Thanks!

2 Comments

This is actually much better / less-insecure code than what you had in the question! There are still some bugs in it (try running this from a directory with spaces in its name), but those are pretty easy to fix; shellcheck.net will point them out.
Do you mind if I edit a bit to tighten up quoting, avoid using echo -e, &c. while otherwise keeping the semantics the same?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.