10

Input:

1
hgh
h2b
h4h
2
ok
koko
lkopk
3
uh
ju
nfjvn
4

Expected output:

1
2
3
4

So, I need to have only 1st, 5th, 9th, 13th value of the file in the output file. How to do this?

2

8 Answers 8

29

Using AWK:

awk '!((NR - 1) % 4)' input > output

Figuring out how this works is left as an exercise for the reader.

3
  • 23
    NR % 4 == 1 would be more legible IMO. Commented May 9, 2019 at 8:42
  • 13
    Agreed @Stéphane; this is probably questionable on my part, but for potentially homework questions I try to obfuscate my answers a little... Commented May 9, 2019 at 9:47
  • 1
    @StephenKitt obfuscate your answers? Really? This is not the place to do that. Commented Jun 3, 2019 at 10:26
23

Using split (GNU coreutils):

split -nr/1/4 input > output
  • -n generate CHUNKS output files

and CHUNKS as

  • r/K/N use round robin distribution and only output Kth of N to stdout without splitting lines/records
1
  • 1
    Mind blown. Answers like this are why I love this SE. Thanks! Commented May 9, 2019 at 12:04
22

With GNU sed:

sed '1~4!d' < input > output

With standard sed:

sed -n 'p;n;n;n' < input > output

With 1 and 4 in $n and $i variables:

sed "$n~$i!d" # GNU only
awk -v n="$n" -v i="$i" 'NR >= n && (NR % i) == (n % i)'
7

Adding the obligatory perl solution:

perl -ne 'print if $. % 4 == 1' input > output
4

Python version, just for fun:

with open('input.txt') as f:
    for i, line in enumerate(f.readlines()):
        if i%4 == 0:
            print(line.strip())
3
  • enumerate(f) should be able to do the job while consuming less memory Commented May 10, 2019 at 20:17
  • @iruvar That's so neat! Never realized that before; will be using in the future. Feel free to edit it into this answer; I'm not really going to maintain it with optimizations since the other Bash answers (especially this one) are definitely the way to go. Commented May 10, 2019 at 20:37
  • If you're going to use readlines (hence slurping the whole file into memory), you can use f.readlines()[::4] to get every fourth line. So you can use print(''.join(f.readlines()[::4])). Commented May 14, 2019 at 16:40
3

POSIX sed: this method uses the posixly sed and so can be run everywhere, or atleast those seds that respect posix.

 $ sed -ne '
   /\n/!{
    H;s/.*//;x
   }

   :loop
       $bdone
       N;s/\n/&/4
       tdone
   bloop

   :done
   s/.//;P
 ' input.file

Another is a programmatic sed code generation for scalability purposes:

$ code=$(yes n | head -n 4 | paste -sd\; | sed s/n/p/)
$ sed -ne "$code" input.file

Perl: we fill-up array A till it is 4 in size. Then we print its first element and also clear out the array.

$ perl -pe '
   $A[@A] = @A ? <> : $_ while @A < 4;
   $_ = (splice @A)[0];
' input.file
1

Call with scriptname filename skip (4 in your case) It works by pulling iter lines from the top of the file and then only outputting the last. It then increments iter by skips and repeats as long as the value of iter hasn't exceeded the lines in file.

#!/bin/bash
file="$1"
lines=`wc -l < "$file"`
skips="$2" || "4"
iter=1
while [ "$iter" -le "$lines" ]; do
 head "$file" -n $iter | tail -n 1
 iter=$(( $iter + $skips ))
done
1

Pure Bash:

mapfile -t lines < input
for (( i=0; i < ${#lines[@]}; i+=4 ))
do printf "%s\n" "${lines[$i]}"
done

mapfile is a builtin added in Bash 4 which reads standard input into an array, here named lines, with one line per entry. The -t option strips the final newlines.

If you want to print every fourth line starting from line 4, then you can do that in one command using mapfile's callback option -C, which runs the provided code every so many lines, with the interval given by -c. The current array index and the next line to be assigned are given to the code as arguments.

mapfile -t -c4 -C 'printf "%.0s%s\n"' < input

This uses the printf builtin; the format code %.0s suppresses the first argument (the index), so only the line is printed.

You could use the same command to print every fourth line starting from line 1, 2, or 3, but you'd have to prepend 3, 2, or 1 lines to input before feeding it to mapfile, which I think is more trouble than it's worth.

This also works:

mapfile -t lines < input
printf "%s%.0s%.0s%.0s\n" "${lines[@]}"

Here, printf consumes four entries of the array lines at a time, only printing the first and skipping the other three with %.0s. I don't like this since you have to manually fiddle with the format string for different intervals or starting points.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.