2

I am trying to edit a big text file. What I am trying to achieve is swap lines from positions between 3 and 14 to positions between 16 and 27. I have extracted UPDATE statements from MySQL bin-logs and put them in a file. The problem is that, the position of WHERE clause and SET clause is reversed in bin-logs. I need to put SET clause before WHERE clause. Currently WHERE clause is before SET clause. There are 27 lines in one UPDATE statement and there are thousands of such UPDATE statements. Thus all the 13 lines from the SET clause (including the SET line from lines 15 to 27) in each UPDATE statement should be put at positions 2 and 14 and vice-a-versa.

I have no clue if this is possible or not. I tried to understand it from the following article, but don't understand it at all. I don't understand regular expressions or SED or AWK completely. Swap lines in text file only where containing strings using sed or ed?

Following are the sample output and the desired output. Sample UPDATE statement

### UPDATE `db`.`tb`
### WHERE
###   @1=39741631
###   @2=49969113
###   @3=1
###   @4=34
###   @5='{"CustomerName":"S","CustomerEmail":"[email protected]","CustomerMobile":"9","VersionId":"5","InSrc":"3","Eagerness":"-1"}'
###   @6=NULL
###   @7=0
###   @8='2021-11-09 19:11:49'
###   @9=NULL
###   @10=1
###   @11=29
###   @12=NULL
### SET
###   @1=39741631
###   @2=49969113
###   @3=1
###   @4=34
###   @5='{\n  "CustomerName": "S",\n  "CustomerEmail": "[email protected]",\n  "CustomerMobile": "9",\n  "VersionId": "5",\n  "InSrc": "39",\n  "Eagerness": "-1"}'
###   @6='33195861'
###   @7=1
###   @8='2021-11-09 19:11:49'
###   @9='2021-11-09 19:11:50'
###   @10=1
###   @11=20
###   @12='Pushed to CVL panel'
--

Desired output

### UPDATE `db`.`tb`
### SET
###   @1=39741631
###   @2=49969113
###   @3=1
###   @4=34
###   @5='{\n  "CustomerName": "S",\n  "CustomerEmail": "[email protected]",\n  "CustomerMobile": "9",\n  "VersionId": "5",\n  "InSrc": "39",\n  "Eagerness": "-1"}'
###   @6='33195861'
###   @7=1
###   @8='2021-11-09 19:11:49'
###   @9='2021-11-09 19:11:50'
###   @10=1
###   @11=20
###   @12='Pushed to CVL panel'
### WHERE
###   @1=39741631
###   @2=49969113
###   @3=1
###   @4=34
### @5='{"CustomerName":"S","CustomerEmail":"[email protected]","CustomerMobile":"9","VersionId":"5","InSrc":"3","Eagerness":"-1"}'
###   @6=NULL
###   @7=0
###   @8='2021-11-09 19:11:49'
###   @9=NULL
###   @10=1
###   @11=29
###   @12=NULL
--
4
  • Are the line numbers actually part of the file? Does each line begin with a number, a space and ###? Commented Nov 15, 2021 at 12:58
  • @terdon I have used vi editor, to make it easy to understand with line numbers. The line numbers are not part of the file. The ### are indeed part of the file. I can replace those later and the 28th line I will convert later into semi colon. I have updated the question to mention the same. Commented Nov 15, 2021 at 13:03
  • 2
    Please remove the numbers then. We need to see the exact input and output formats. Commented Nov 15, 2021 at 13:04
  • Does a line with just -- really terminate the "SET" section of the input? If not please replace it with whatever does terminate the "SET" section. Commented Nov 15, 2021 at 16:03

2 Answers 2

2

Using sed:

#n

/^### WHERE/ {
        h
        :again
        n
        /^### SET/ !{
                H
                b again
        }
}

/^--$/ {
        H
        x
}

p

This sed editing script first turns off the default outputting of the buffer at the end of each cycle using #n on line 1. This stops the n command from outputting anything (we use this to read a new line into the buffer).

The script then prints each line read until it finds a line containing the string ### WHERE. When that string is located at the start of a line, it stores that line in the "hold space" (a secondary buffer whose contents is not modified between cycles of the editing script). It then appends lines from the input to the hold space until a line containing the string ### SET is found at the start of the line. This is done using an explicit loop (the again label and the b again conditional branch instruction).

The script then continues to read and print lines until it finds a line containing only --. At this point, it adds that line to the hold space and swaps the hold space into the primary buffer using x. This is then printed by the p at the end (which also prints all other lines not directly affected by the rest of the code).

You would run this from the command line like so:

sed -f script file

... where script is the file containing the sed editing script, and file is the file containing your data.


A shorter sed script that, instead of an explicit loop to read lines that needs to be saved for later, simply uses a range address and stores all the lines in the range (except for the last) in the hold space:

/^### WHERE/,/^### SET/ {
        /^### SET/ !H
        /^### WHERE/ h
        d
}

/^--$/ { H; x; }

I'm not really suggesting you use ed to solve this issue, but you could, and you tagged your question with , so here we go:

g/SET/ .,/--/-1 m ?WHERE?-1

This single ed expression applies an m command to all lines containing the string SET.

The m command in the ed editor moves one or several lines to another location.

The lines to move are addressed in front of (to the left of) the m command itself, and the destination for those lines is given after the m.

In this case, we're moving the lines ranging from the current line (the line containing SET) down to the line before the line containing the string --. The address of this range of lines is written .,/--/-1. You could also use .,/@12=/ if the last line to move always contains the string @12=, or .,+13 if you always want to move a fixed number of lines.

The line above the most recent line containing the word WHERE is where to move these lines. The ?WHERE? expression searches backward in the buffer for a line matching that expression and the -1 selects the line previous to that matching line. You could also use -14 for the destination address if you always move a fixed number of lines.

You would use this from the command line in the following way:

printf '%s\n' 'g/SET/ .,/--/-1 m ?WHERE?-1' ',p' 'Q' | ed -s file

This adds the two commands ,p (print/output the whole buffer) and Q (quit unconditionally), meaning the result would be printed onto standard output.

If you change the ,p and Q into the single command wq, or w and q, you'd perform an in-place edit.

Always test things like this on data that is properly backed up.

To make the expressions more specific, use

g/^### SET/ .,/^--$/-1 m ?^### WHERE?-1

This applies the m command to all lines starting with the string ### SET and moves the range of lines from there to the next line containing only --, to before the most recent line starting with the string ### WHERE.

The issue with using ed for edits like these is that the editor reads the document into memory. Therefore, using ed is not as memory-efficient as using, e.g., sed or other stream-editing approaches, and maybe impossible if the data is too big.

1

Try

gawk '
  BEGIN {RS = "--\n"; FS = "###[[:blank:]]+(WHERE|SET)\n"}
  {printf "%s### SET\n%s### WHERE\n%s%s", $1, $3, $2, RS}
' file

To edit the file in-place (DON'T do this until you're confident this code works)

  1. with a temp file:
    f=$(mktemp)
    gawk '...' file > "$f" && mv "$f" file
    
  2. with the moreutils package:
    gawk '...' file | sponge file
    
  3. with GNU awk
    gawk -i inplace '...' file
    
3
  • Thanks Mr Jackman, the position of WHERE and SET is not changed. Rest I think worked but it's very complicated to compare. I used the following command gawk 'BEGIN {RS = "--\n"; FS = "###[[:blank:]]+(WHERE|SET)\n"} {printf "%s### WHERE\n%s### SET\n%s%s", $1, $3, $2, RS}' update_test.sql and will it also modify the file on which the operation is done. I don't understand AWK or GAWK :-( Commented Nov 15, 2021 at 13:22
  • @AvinashPawar, awk does not update the file in-place. A good place to start with awk is stackoverflow.com/tags/awk/info Commented Nov 15, 2021 at 14:14
  • Thanks for the tip Commented Nov 16, 2021 at 6:07

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.