Return to Revisions

2 of 3

added 164 characters in body

edited Mar 26, 2013 at 7:31

30.1k
5
57
77

@Gilles has a lot of good to say. Here are some other notes:

sed is a stream editor (s)tream(ed)itor. Read introduction section from Wikipedia. Important part is things like pattern space etc.

For the most part I assume you know Regexp and wont go much in detail on that.

This became a bit long, but OK.

The "easy" part is to replace GID for users compared to shell. This is in this first section. The more interesting part is translating first letter of account/user-name and padding it to GID. That would be section two sed - lookup tables below – finishing off with listing 6 which have a more or less functional procedure for alpha to digit in GID.

A lot of this might seem "why oh WHY?" – but it is a good training in concepts (in lack of a better word).

Section 1: Swapping GID by shell##

You could add a function to get GID for a named group ,here using sed instead of cut, IFS or other "easier" ways:

#!/bin/bash

get_gnr()
{
    # -n    Do not print unless I say so.
    # s///  Substitute lines beginning with argv 1:
    # p     Print if there was a substitution.
    # $1    Arg 1 to bash function.
    sed -n 's/^'$1':[^:]*:\(.*\):/\1/p' /etc/group
}

# Assign what ever get_gnr() prints to gr_pulse
gnr_bash=$(get_gnr "bash")
gnr_tcsh=$(get_gnr "tcsh")

printf "Group %5s = %d\n" "bash" "$gnr_bash"
printf "Group %5s = %d\n" "tcsh" "$gnr_tcsh"

You should have more error checking. E.g. test that you actually have a group named bash.

Then you would probably have some variable to store the GID where you want to translate first alpha to tail on GID. It is unclear however, from your task description if this should be done before or after the bash/tcsh switch of groups.

Anyhow. One thing you can utilize if you wrap the sed in a bash script is to use bash variables by temporarily escaping sed. Further you can group sed commands as with awk using e.g.:

/pattern/ { exec if match}
/pattern/ ! { exec if no match }

Here is a sample showing what I mean. In this concrete example though, it becomes a bit redundant. I have also added some extra output, which can be nice while writing to clearly and quickly see what is done:

gid_tr_to_uname=121

sed '
/:\/bin\/bash$/ {
    # Add an arrow only to visualize that line has changed
    s/^/--> /p
    # Susbtitute group
    s/\(^[^:]*:[^:]*:[^:]*:\)\([^:]*\)/\1'$gnr_bash'/
}
/:\/bin\/tcsh$/ {
    # Add an arrow only to visualize that line has changed
    s/^/--> /p
    # Susbtitute group
    s/\(^[^:]*:[^:]*:[^:]*:\)\([^:]*\)/\1'$gnr_tcsh'/
}
/[^:]*:[^:]*:[^:]*:'$gid_tr_to_uname':/ {
    # Insert line to visualize change [ old/new ]
    i\
tr group alpha name [
    p
    s/a\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/a\11\3/
    s/b\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/b\12\3/
    s/c\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/c\13\3/
    s/d\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/d\14\3/
    s/e\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/e\15\3/
    s/f\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/f\16\3/
    s/g\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/g\17\3/
    s/h\([^:]*:[^:]*:[^:]*:[^:]*\)\([0-9]\)\(:[^:]*:[^:]*:.*\)/h\115\3/
    # ....
    # Append line to visualize end
    a\
]
}
' "$in_file"

The alpha thing doesn't look to nice - whereby section two below.

If you could use bash over sed one could simplify the alpha translation by piping the result to a bash loop where IFS (like FS or field separator in awk) is set to ::

#      capture group 1             capture group 2
#   s (everything before gid) gid (everything after gid) trigger / \1 new gnr \2
sed \
-e 's/\(^[^:]*:[^:]*:[^:]*:\)[^:]*\(.*:\/bin\/bash$\)/\1'$gnr_bash'\2/' \
-e 's/\(^[^:]*:[^:]*:[^:]*:\)[^:]*\(.*:\/bin\/tcsh$\)/\1'$gnr_tcsh'\2/' \
"$1" |
while IFS=: read account password uid gid gecos directory shell; do
    case "$gid" in
    "$gid_tr_to_uname")
        gid=$(translate "$account" "$gid")
    esac
    printf "%s:%s:%d:%d:%s:%s:%s\n"\
        "$account" "$password" "$uid" "$gid" "$gecos" "$directory" "$shell"
done

And some translate function as in:

ascii_a=$(printf "%d" "'a")
ascii_A=$(printf "%d" "'A")
translate()
{
    local first_letter="${1:0:1}"    # First character in arg 1
    local -i gid_lhs="${2:0: -1}"    # Everything but last digit in arg 2
                                     # Get ascii 10 base value / digit
    local -i ascii_val=$(printf "%d" "'$first_letter")
    local -i alphanr                 # a=1 b=2, A=27 etc
    if (( $ascii_val >= ascii_a )); then
        (( alphanr = ascii_val - ascii_a + 1 ))
    else
        (( alphanr = ascii_val - ascii_A + 27 ))
    fi
    # If you want to debug:
    # printf "[[[%s = %d => %d || %d ]]]"\
    #        "$first_letter" "$ascii_val" "$alphanr" "$gid_lhs"
    printf "%d%d" "$gid_lhs" "$alphanr"
}

But then one could also easily add a case switch for shell as well and sed goes completely out of the picture.

In sed you also have a tr like functionality by y:

sed '/0x[0-9a-zA-Z]*/ y/abcdef/ABCDEF' file

But it has to be even pairs so you can't use this for a -> 1, ... p- > 16 etc.

##Section 2: sed - lookup tables##

By far, the only way I can think of appending first letter of account to GID is by lookup table.

To simplify I'm taking this in stages:

###Listing 1###

#!/bin/bash

listing1()
{
sed '
    # Pad line with lookup table
    s/$/0zero1one2two3three4four5five6six7seven8eight9nine/

    # Match something (here 1) and match it again in lookup-table
    # and grab the letters following 1 (in lookup-table) to match
    # group 2. Finally replace \1 with \2
    s/\(.\).*\1\([^0-9]*\).*/\2/

' < <(printf "12345\n" )
}

printf "Listing 1:\n"
listing1

Result:

Listing 1:
one

The idea is to pad our line with a lookup-table and replace first match in input with corresponding pair in table.

We can expand this by repeating the substitution:

###Listing 2###

listing2()
{
sed '
    s/$/.0zero1one2two3three4four5five6six7seven8eight9nine/
    s/\([0-9]\)\(.*\)\1\([^0-9]*\)\(.*\)/\3\2\4/
    s/\([0-9]\)\(.*\)\1\([^0-9]*\)\(.*\)/\3\2\4/
    s/\([0-9]\)\(.*\)\1\([^0-9]*\)\(.*\)/\3\2\4/
    s/\([0-9]\)\(.*\)\1\([^0-9]*\)\(.*\)/\3\2\4/
    s/\([0-9]\)\(.*\)\1\([^0-9]*\)\(.*\)/\3\2\4/
    s/\([0-9]\)\(.*\)\1\([^0-9]*\)\(.*\)/\3\2\4/
' < <(printf "12345\n" )
}

Result:

Listing 2:
onetwothreefourfive.0zero6six7seven8eight9nine

But this doesn't look much better then what we started with in section one.

###Labels / Branches###

Here is where labels come in. In sed one can specify labels, or branches, and jump to these based on two functions:

:my_label
     s/foo/bar/
     b my_label

The b simply say jump to my_label. In this example that would mean an eternal loop. Thus mostly this is used as in:

:my_label
/\./ {          # If . exists in line
     s/#/+/     # substitute # with +
     s/\./P/    # substitute . with P
     b my_label # goto my_label
}

Not the best example but hopefully you get the idea.

The second way is using test or t. This say if line changed then go to label.

:my_label
     s/foo/bar/   # Substitute foo with bar
     t my_label   # If there was a change aka; a substitution was done
                  # then goto my_label.

By this we can simplify our previous listing as follows. Here with added comma to make it more pleasant to read:

###Listing 3###

listing3()
{
sed '
    s/$/.0zero1one2two3three4four5five6six7seven8eight9nine/
:loop
    s/\([0-9]\)\(.*\)\1\([^0-9]*\)\(.*\)/\3,\2\4/
    t loop    # If we has a substitution goto loop

    s/,\..*// # Remove trailing comma and our lookup table rest.

' < <(printf "123458\n" )
}

Result:

Listing 3:
one,two,three,four,five,eight

We want alpha to digit. Also usig dot as separator can be somewhat risky as our input can have a . in it – so we change it to use ASCII 0x7f, or DEL.

It also works with e.g. 0x00

###Listing 4###

listing4()
{
sed '
    p # Print original line to visualize
    
    # Our new lookup-table:
    s/$/\x7fa1b2c3d4e5f6g7h8i9j10k11l12m13n14o15p16q17r18s19t20u21v22w23x24y25z26/
:loop
    s/\([a-z]\)\(.*\)\1\([^a-z]*\)\(.*\)/\3,\2\4/
    t loop

    s/,\x7f.*//
' < <(printf "abcdefghijklmnopqrstuvwxyz\n" )
}

Result:

Listing 4:
abcdefghijklmnopqrstuvwxyz
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26

If we had more then one of each we would change it to something like this:

###Listing 5###

listing5()
{
sed '
    i\
input:
    p
    s/$/\x7fa1b2c3d4e5f6/
:loop
    s/\([a-z]\)\(.*\)\1\([^a-z]*\)\(.*\)/\3,\2\1\3\4/g
    t loop
    i\
output:
    s/,\x7f.*//
' < <(printf "aabcdefac\n" )
}

Result:

Listing 5:
input:
aabcdefac
output:
1,1,2,3,4,5,6,1,3

Now we are finally ready to implement it in the task. Here by example:

###Listing 6###

listing6()
{
sed '
    i\
input:
    p
    s/$/\x7fa1b2c3d4e5f6g7h8i9j10k11l12m13n14o15p16q17r18s19t20u21v22w23x24y25z26/
    s/^\(.\)\([^:]*\)\(:[^:]*\)\(:[^:]*\)\(:[^:]*\)\([0-9]\)\(:.*\)\x7f.*\1\([^a-z]*\).*/\1\2\3\4\5\8\7/
    #  1alpha 2rest    3pwd      4uid      5gid    6last-digit 7rest           8number
    i\
output:
    s/,\x7f.*//
' < <(printf "master:power:110:118:Light Display Manager:/var/lib/lightdm:/bin/false\n" )
}

Output:

Listing 6:
input:
master:power:110:118:Light Display Manager:/var/lib/lightdm:/bin/false
output:
master:power:110:1113:Light Display Manager:/var/lib/lightdm:/bin/false

Thats it.

You should read Bruce Barnett's sed introduction.

Other refs:

For some more hard-core things look at e.g.:

Greg Ubben's sed dc. With a short explanation.
Sed tetris with along going bash wrapper.

Good luck.

answered Mar 26, 2013 at 7:25

Runium

30.1k
5
57
77