0

I have a file, having multiple records, each with a number of fields. File content is like below.

# cat inputfile

name: AAA
age:  38
city: C1
state: S1

age: 29
city: C2
name: BBBbbbB
state: S2

state: S3
age: 21
city: C3
name: ccccccC 

I would like to order the fields of each record in the order given by the argument to a shell script.

If I run the script like :

# sh sortout.sh <inputfile> name age city state

The output should be like below:

name: AAA
age:  38
city: C1
state: S1

name: BBBbbbB
age: 29
city: C2
state: S2

name: ccccccC 
age: 21
city: C3
state: S3
2
  • Describe your sorting algorithm. Commented Aug 18, 2019 at 10:50
  • I am looking for that sorting algorithm only :) Can anyone help me ? Please.. Commented Aug 18, 2019 at 11:03

2 Answers 2

1

With Perl you operate in paragraph mode, meaning, letting perl, gulp a para at a time using the -00 option.

Then from the current record, grab the first field (delimited by colon) and store in a hash.

$ perl -l -00ane '
    my %h = reverse /^(([^:]+):.*)$/mg;
    print $h{$_} for qw/name age city state/;
' input.file

With your specific requirements, you could do this:

cat - <<\eof > code.sh
if=$1;shift
perl -ls -00ane '
  my %h = reverse /^(([^:]+):.*)$/mg;
  print $h{$_} for split /\s+/, $order;
' -- -order="$*" "$if"
eof

Then after having created the code file, execute it:

sh code.sh inputfile name age city state
3
  • Thank you for the response Rakesh. how to call this with in perl script ? Commented Aug 18, 2019 at 12:26
  • I have added the solution for the specific requirement. Commented Aug 18, 2019 at 13:05
  • Thanks a lot Rakesh. Much appreciated. Could you help me to understand the logic. I am not aware of perl. Commented Aug 18, 2019 at 13:25
0

Since you aren't aware of Perl, I'll be slightly verbose.

First off, Perl is a Linux utility that takes your input file and transforms by way of it's commands, to generate the desired output.

Normally Perl examines the input file a line at a time. A line is separated from the next by means of the ascii character \012 aka \n called as a newline. But in this case we'd rather be reading a paragraph at a time. And how does Perl identify a para?

-00 option will process paras. They get stored in the current record scalar $_

Note that a record now shall have multiple lines in it.

I visualize it as : ^....$ ^...$ ^....$ Basically contiguous islands of lines. The islands are all separated by \n.

Perl options used:

-l this does two things, remove the input record separator from the current record, $_, and while printing puts it back: $/ = $\ = "\n"

-s this turns on rudimentary command line switch parsing. With it we can specify the order to be printed variable from the command line itself.

-00 is the IRS separator set to paragraph mode= empty string. This will slurp paragraphs from the input data one at a time and store in the $_ for each iteration.

-n this puts a loop around the file, meaning that it shall read from the input file (actually a file handle, but that's immaterial for our level) but will not print it at the end when the transformations have all been applied to the current record. You have to do it explicitly.

-e this is the option that tells perl that what follows it is valid Perl code that will be applied to the current record.

-- =>end of Perl's command line options and what follows now are switches (which begin with a dash) and then files all the way. If you might have filenames starting with dash, better to start them with ./ or give full or relative path, or place one more -- to signal end of switches.

#

Now comes the algorithm part:

my %h = reverse /^(([^:]+):. *)$/mg;

In Perl, hashes or associative arrays are identified with a percent % before their name. So in our case, we are building a hash %h and placing a my before it means it will be lexical and goes out if scope whenever the next record is read in. Meaning, a fresh spanking new hash is created for every record.

What does the expression /..../mg mean? First off, all regex expressions are always tied to some scalar variable or expression, by means if the =~ operator. But here we don't see one. Implicitly it is tied to the $_ variable, which in this means the current record.

To be continued---

1
  • You may want to merge this with your accepted answer. This is not in itself an answer to the given question. Commented Aug 19, 2019 at 5:59

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.