7

I have two files. File1 contains some sentences, and File2 contains the line numbers I want to keep in File1.

For example, File1:

He is a boy.
She is a cook.
Okay.
She went to school.
She is pretty.

File2:

1
4

Output:

He is a boy.
She went to school.

Is there a way I could do that using sed, grep, or awk? I don't want to manually write the line number as here.

4 Answers 4

13

We could transform the list of numbers into a sequence of sed commands and run them as a sed editing script in a single sed invocation:

sed 's/$/p/' lines.list | sed -n -f /dev/stdin file.txt

Here, the first sed creates a sed script consisting of commands such as 1p, 4p etc., by simply inserting p at the end of each line. This script is then sent to the second sed after the pipe, which reads it with -f /dev/stdin and applies it with the text file as input.

This would require reading each file only once.


Using awk, read the line numbers into an associative array as keys, then, while reading the other file, see if the current line number is one of the ones that was previously made a key in the array:

awk 'FNR == NR { lines[$0]; next } (FNR in lines)' lines.list file.txt

In awk, the special variables NR and FNR are the total number of records (lines) read so far, and the total number of records (lines) read in the current file, respectively. If NR is equal to FNR, we're reading from the first input file, and we create an array entry using the current line, $0, as the key (no value is given), and immediately skip to the next line of input.

If we're not reading from the current line, we test with FNR in lines to see whether FNR, the line number in the current file, is a key in the array called lines. If it is, the current line will be printed.


Without heavy support from other tools, the grep utility is not really made for performing this type of task. It extracts lines from text files whose contents match (or do not match) a given pattern. The pattern is therefore supposed to match the line, not the line number.

The following is just for fun and should not be considered a suggestion for how to actually solve this issue.

You can insert line numbers with grep using

grep -n '.*' file.txt

This inserts line numbers at the start of all lines in the file, directly followed by : and the original contents of the line.

We may then, as with the sed solution, modify the pattern file to make it match a selection of those specific numbers:

sed 's/.*/^&:/' lines.list

This would output regular expressions such as ^1: and ^4:, each matching a particular line number at the start of a line.

We may then get grep to use these expressions (here with the help of a process substitution). Finally, we remove the temporary line numbers using cut:

grep -n '.*' file.txt | grep -f <(sed 's/.*/^&:/' lines.list) | cut -d : -f 2-

... but this is too contrived to even be considered a reasonable solution.


Each of the above solutions will always display the selected lines in the order in which they occur in the text file. If you want to lines outputted in the order they occur in the line number file, then you may instead use sed (or awk, see further down):

sed 's/$/p/' lines.list | ed -s file.txt

Again, we create an editing script from our line number file by simply adding p at the end of each line.

This script is then passed as the command input to the ed editor, which applies the commands, in order, to the text file.

Testing:

$ cat lines.list
4
1
$ sed 's/$/p/' lines.list | ed -s file.txt
She went to school.
He is a boy.

Note that sed reads the whole file into memory, just like the following equivalent awk program does:

awk 'NR == FNR { lines[FNR] = $0; next } { print lines[$0] }' file.txt lines.list

Note that the input files are switched in comparison to the previous awk solution. This allows us to first read the text file into the lines array, line by line, and then select lines randomly out of that while reading the file with line numbers.

2
  • 1
    FWIW to avoid reading the whole of file.txt into memory if you want to print the matching lines in the order of lines.txt, you could do awk 'NR==FNR{lines[$0]; ordr[++cnt]=$0; next} {if (FNR in lines) lines[FNR]=$0; else delete lines[FNR]} END{for (i=1; i<=cnt; i++) if (ordr[i] in lines) print lines[ordr[i]]}' lines.list file.txt so you only save in memory the lines from files.txt that match the line numbers from lines.txt. Commented Aug 29, 2022 at 15:05
  • 2
    @EdMorton I gave the various answers due to the user's list of "sed, grep, or awk" in the question. I agree that a single awk invocation might be the most useful one. Commented Aug 29, 2022 at 15:54
3

Let's say your file is file.txt and lines.txt contains line numbers. Using xargs:

# extract digit sequences from lines.txt and make sed arguments
sed 's/[^[:digit:]]*\([[:digit:]]\+\)[^[:digit:]]*/-e \1p /g' lines.txt \
    | xargs /bin/sh -c '[ $# -gt 0 ] && sed -n "$@" file.txt' sh
2

Using Raku (formerly known as Perl_6)

#Sample Input:

~$ cat data.txt
He is a boy.
She is a cook.
Okay.
She went to school.
She is pretty.

Successive examples to show indexing:

~$ raku -e '.put for lines[ 0,3 ];'  data.txt
He is a boy.
She went to school.

#Take line-number index inline (subtract 1 to make zero-indexed):

~$ raku -e 'my @index = <1 4>; .put for lines[ @index.map: *-1 ];' data.txt
He is a boy.
She went to school.

Take index as a file-path, take data off command-line:

~$ raku -e 'my @ind = "/path/to/index.txt".IO.lines; 
            .put for lines[ @index.map: *-1 ];'  data.txt
He is a boy.
She went to school.

Take both files off the command-line (data.txt then index.txt):

~$  raku -e 'my @data  = @*ARGS[0].IO.lines;
             my @index = @*ARGS[1].IO.lines;
             .put for @data[ @index.map: *-1 ];'  data.txt  index.txt
He is a boy.
She went to school.

https://docs.raku.org
https://raku.org

1

in case the target line numbers in File2 are in increasing order, to this approach will do.

sed -e 's/$/b/;$a d' < File2 |
sed -f - File1

Generates a series of sed commands

1b
4b
d

In the generalized case of printing linesfrom File1 in the order given in File2 we use either of the two given below.

Using awk we generate an associative array keyed on line number in File1 and value as that line contents. But only the lines mentioned in File2 are saved and an extra for non matching. Another array is maintained that is keyed on numerical increasing index starting from and having the value field as the line numbers targeted in Fikle2.

awk '
{if (X) c[b[FNR]]=$0; else b[a[NR]=$1]=$1}
END {for (i=1; i in a; i++) print c[a[i]]}
' File2 X=1 File1

Using GNU sed in slurp mode -z. But first, we generate commands looking at the contents of File2

sed -e 's:.*:s/^(.*\\n){&}/\\1\\d0/M;P;g:' File2 |
sed -zEn -e h -f - File1 | tr -d '\0'

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.