3

Currently, my bash script splits by number of lines. However, I'd like to split a file into X pieces, each of those having total lines equal to the file length divided by X. The script is run as follows:

./script.sh input_file.tsv

So far, in the script, I have this:

INPUT_FILE=$1
SPLIT_NUM_THREADS=15
TOTAL_LINES=$(wc -l < $INPUT_FILE)
SPLIT_NUM=$( echo "scale=6; $TOTAL_LINES / $SPLIT_NUM_THREADS" | bc)

The following issues exist:

  • Using $INPUT_FILE to get TOTAL_LINES gets me the error "ambiguous redirect", but using simply "input.tsv" does not. What's wrong there?
  • SPLIT_NUM is a float, how do I convert it to an int so it can split by lines?

How can I resolve these issues and split a file by number of pieces?

5
  • 1
    I don't get that "ambiguous redirect" error (GNU bash, Version 4.2.53). It appears if an unset or empty variable is used. Please put echo "$INPUT_FILE" before the line with the error (though I don't see a possible problem yet). Commented Nov 19, 2014 at 23:09
  • Oh FFS I was running the script and forgetting to put the input file in the command, DERP. That's fixed, thank you. All I need to do is get a rounded number for splitting, any idea there? Commented Nov 19, 2014 at 23:15
  • 1
    Try SPLIT_NUM=$(expr '(' $TOTAL_LINES + $SPLIT_NUM_THREADS - 1 ')' / $SPLIT_NUM_THREADS ). There are more compact ways to do this, depending on your shell. Commented Nov 19, 2014 at 23:26
  • @MarkPlotnick that worked perfectly, thanks so much! Commented Nov 19, 2014 at 23:53
  • Maybe you can also make use of something like this for the float part. Commented Nov 20, 2014 at 3:14

1 Answer 1

2

Each part gets the integer divide ($((a/b))). If the line number modulo the number of parts ($((a%b))) is not zero then you have to distribute the spare modulo number over the parts. One solution is to give the modulo value number of parts an additional line.

SPLIT_NUM_THREADS=15
TOTAL_LINES=52
for((i=0;i<$((TOTAL_LINES%SPLIT_NUM_THREADS));i++)); do
  echo $((TOTAL_LINES/SPLIT_NUM_THREADS+1))
done
4
4
4
4
4
4
4
for((i=$((TOTAL_LINES%SPLIT_NUM_THREADS));i<SPLIT_NUM_THREADS;i++)); do
  echo $((TOTAL_LINES/SPLIT_NUM_THREADS))
done
3
3
3
3
3
3
3
3

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.