Grouping lines into heterogeneous subsets

Question

I have file with n lines. (Each line refers to a “question”, and therefore they are labeled Q.1, Q.2, Q.3, ..., Q.n.) Each line (question) has a “Marks” attribute, which has the value 2, 3, 4, 5, or 6. There are ⁿ⁄₅ lines with each value.

For example: A 10-line file (i.e., n=10) might look like

amol@mypc:~$ cat questions.txt
Q.1 2 Marks
Q.2 5 Marks
Q.3 4 Marks
Q.4 3 Marks
Q.5 6 Marks
Q.6 4 Marks
Q.7 3 Marks
Q.8 2 Marks
Q.9 6 Marks
Q.10 5 Marks

I know I can split this into five homogeneous (i.e., all the same) files with something like

amol@mypc:~$ grep " 2 Marks" questions.txt > questions2Marks.txt
amol@mypc:~$ grep " 3 Marks" questions.txt > questions3Marks.txt
amol@mypc:~$ grep " 4 Marks" questions.txt > questions4Marks.txt
amol@mypc:~$ grep " 5 Marks" questions.txt > questions5Marks.txt
amol@mypc:~$ grep " 6 Marks" questions.txt > questions6Marks.txt

Each of the resulting files will have ⁿ⁄₅ lines.

I want to do the inverse operation – i.e., produce a transpose of the above result. I want to split my questions.txt file into ⁿ⁄₅ files: questions1.txt, questions2.txt, questions3.txt, ..., questionsM.txt (using M to represent ⁿ⁄₅) where each file is five lines long and is heterogeneous (i.e., all different).

questions1.txt should contain

the first line in questions.txt with 2 Marks,
the first line in questions.txt with 3 Marks,
the first line in questions.txt with 4 Marks,
the first line in questions.txt with 5 Marks, and
the first line in questions.txt with 6 Marks,

in that order. questions2.txt should contain the second line of each, etc.

So, for n=10, M obviously is 2. I would want my example questions.txt from above broken down into these two files:

amol@mypc:~$ cat questions1.txt            
Q.1 2 Marks
Q.4 3 Marks
Q.3 4 Marks
Q.2 5 Marks
Q.5 6 Marks

amol@mypc:~$ cat questions2.txt            
Q.8 2 Marks
Q.7 3 Marks
Q.6 4 Marks
Q.10 5 Marks
Q.9 6 Marks

How can I achieve that using *nix tools (sed, awk, perl, shell script, etc...)?

So you want to read the file sequentially, and each time you get a group of values 2-3-4-5-6 from the second column, sort the group on that column, and write it to a numbered file? — dhag
– dhag, Commented Jul 24, 2015 at 13:33

chaos · Accepted Answer · 2015-07-24 09:33:02Z

6

sort -n -k2 -k1.3 file | awk '{$2!=a?x=1:x++} {print > "file"x; a=$2}'

First , we need to sort the file correctly. -n sorts the file numerically, -k2 sorts according to the second field (the marks 2-6), -k1.3 then sorts within this order the first field starting from the 3rd character numerically (irgnoring the leading Q.). Now awk splits the output between ascending files (file1, file2, file3, filen....).

The output looks like this, file1:

$ cat file1
Q.1 2 Marks
Q.4 3 Marks
Q.3 4 Marks
Q.2 5 Marks
Q.5 6 Marks

And file2:

$ cat file2
Q.8 2 Marks
Q.7 3 Marks
Q.6 4 Marks
Q.10 5 Marks
Q.9 6 Marks

edited Jul 24, 2015 at 9:33

answered Jul 24, 2015 at 8:16

chaos

49.3k11 gold badges127 silver badges147 bronze badges

Could also do awk '{print > "file"!(NR%2)+1}'

123
– 123

2015-07-24 08:50:24 +00:00
Commented Jul 24, 2015 at 8:50
@chaos : You are quite correct. But I have file with lots of questions say 100 questions in a file of 2,3,4,5 and 6 Marks. How can I divide them into file1, file2, file3..............upto file20 for 100 questions. So that it will create 20 files of each 2,3,4,5 and 6 Marks..Hope you understand my query....

amolveer
– amolveer

2015-07-24 09:21:22 +00:00
Commented Jul 24, 2015 at 9:21
@amolveer I edited my answer, now it's working with multiple files.

chaos
– chaos

2015-07-24 09:33:24 +00:00
Commented Jul 24, 2015 at 9:33
@chaos : Perfect, cool man

amolveer
– amolveer

2015-07-24 09:37:32 +00:00
Commented Jul 24, 2015 at 9:37

Add a comment |

glenn jackman · Accepted Answer · 2015-07-24 15:02:05Z

3

an awk answer: this will keep the order the questions the same as in the source file.

$ awk '{filename = "questions" ++n[$2] ".txt"; print > filename}' questions.txt 
$ cat questions1.txt 
Q.1 2 Marks
Q.2 5 Marks
Q.3 4 Marks
Q.4 3 Marks
Q.5 6 Marks
$ cat questions2.txt 
Q.6 4 Marks
Q.7 3 Marks
Q.8 2 Marks
Q.9 6 Marks
Q.10 5 Marks

answered Jul 24, 2015 at 15:02

glenn jackman

88.5k16 gold badges124 silver badges179 bronze badges

Add a comment |

Stack Exchange Network

Grouping lines into heterogeneous subsets

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Grouping lines into heterogeneous subsets

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions