Skip to main content
Minor edit.
Source Link
macxpat
  • 29
  • 1
  • 10

So here we are. About one week ago, I didn't know anything about awk and very few about RegExp. Now, thanks to the @Ed's contribution, I've been able to write "my" first awk script, better understand the RegExp world, and complete the task at hand. More importantly, I'm now confident enough dive deeper by myself into RegExp, awk and other text processing shell tools. It also motivates me to contribute more to SE.
I just wanted to share my personal experience, and give hope to others who may like me be stuck on a problem, like facing a mountain.

So here we are. About one week ago, I didn't know anything about awk and very few about RegExp. Now, thanks to the @Ed's contribution, I've been able to write "my" first awk script, better understand the RegExp world, and complete the task at hand. More importantly, I'm now confident enough dive deeper by myself into RegExp, awk and other text processing shell tools. It also motivates me to contribute to SE.
I just wanted to share my personal experience, and give hope to others who may like me be stuck on a problem, like facing a mountain.

So here we are. About one week ago, I didn't know anything about awk and very few about RegExp. Now, thanks to the @Ed's contribution, I've been able to write "my" first awk script, better understand the RegExp world, and complete the task at hand. More importantly, I'm now confident enough dive deeper by myself into RegExp, awk and other text processing shell tools. It also motivates me to contribute more to SE.
I just wanted to share my personal experience, and give hope to others who may like me be stuck on a problem, like facing a mountain.

Added code to escape special characters in file paths. Minor text edits.
Source Link
macxpat
  • 29
  • 1
  • 10

Here isHere's the code that gives the desired output mentioned in the question, for people who'd be interested. It's just a tiny adaptation of @Ed's really smart code.

BEGIN { print "#!/bin/bash" }
/^#/ { prt(); print; next }
{ files[$0] }
END { prt() }

function prt(   file, isDate, isKeep, isDelete, backup, latest, pats) {
    # file exists in a current backup directory (yes|no)
    backup = "no"
    # latest historical backup date
    latest = "000000"
    for (file in files) {
        if ( file ~ /\/Library\// ) {
            # files to check manually
            isKeep[file]
        }
        else if ( file ~ /\/(labs data|backup-current)\// ) {
            # backup files to keep
            isKeep[file]
            backup = "yes"
        }
        else if ( match(file, /\/(backup-disk-name\/|backup-)([0-2][0-9][0-1][0-9][0-3][0-9])\//, pats) >!= 0 ) {
            # files in historical backup directories
            if ( pats[2] > latest ) {
                latest = pats[2]
            }
            isDate[file] = pats[2]
        }
        else {
            # unclassified filefiles to check manually
            isKeep[file]
        }
    }
    for (file in isDate) {
        if ( isDate[file] == latest && backup == "no") {
            isKeep[file]
        }
        else {
            isDelete[file]
        }
    }
    for (file in isKeep) {
        print "#", file
    }
    for (file in isDelete) {
        # use single quotes to escape special characters in file
        # use gensub() to escape single quotes in file
        print "rm", "'" gensub(/'/,"'\\\\''", "g", file) "'"
    }
    delete files
}

Finally, I would like to share some thoughts. I hope I'm not disgressing too much.
A few weeks ago I resolved to finally cleanup that monstruous backup data (some files have more than 10 duplicates). But I couldn't find a tool to automate the task. And I didn't want to fire up a C program for that and didn't want to go the Perl way. So I knew I had to (and I wanted to) go the shell way. But I didn't know where to start and got stuck on the first lines.

After reading a lot, I was still very confused. So I decided to post my question on SE.
SoWhen I first read @Ed's code I thought "What the hell!". Then, when I got it, I realized it's a brilliant piece of code, highly efficient and clear.

So here we are. About one week ago, I didn't know anything about awk and very few about RegExp. Now, thanks to the @Ed's contribution, I've been able to write "my" first awk script, better understand the RegExp world, and complete the task at hand. More importantly, I'm now confident enough to dive deeper by myself into RegExp, awk and other text processing shell tools. It also motivates me to contribute to StackExchange.

When I first read @Ed's code I thought "What the hell!". And then, when I got it, I realized it's a brilliant piece of code, highly efficient and clearSE. I
I just wanted to share my personal experience, and give hope to others who may like me getbe stuck on a problem, like facing a mountain.

Here is the code that gives the desired output mentioned in the question, for people who'd be interested. It's just a tiny adaptation of @Ed's really smart code.

BEGIN { print "#!/bin/bash" }
/^#/ { prt(); print; next }
{ files[$0] }
END { prt() }

function prt(   file, isDate, isKeep, isDelete, backup, latest) {
    # file exists in a current backup directory (yes|no)
    backup = "no"
    # latest historical backup date
    latest = "000000"
    for (file in files) {
        if ( file ~ /\/Library\// ) {
            # files to check manually
            isKeep[file]
        }
        else if ( file ~ /\/(labs data|backup-current)\// ) {
            # backup files to keep
            isKeep[file]
            backup = "yes"
        }
        else if ( match(file, /\/(backup-disk-name\/|backup-)([0-2][0-9][0-1][0-9][0-3][0-9])\//, pats) > 0 ) {
            # files in historical backup directories
            if ( pats[2] > latest ) {
                latest = pats[2]
            }
            isDate[file] = pats[2]
        }
        else {
            # unclassified file to check manually
            isKeep[file]
        }
    }
    for (file in isDate) {
        if ( isDate[file] == latest && backup == "no") {
            isKeep[file]
        }
        else {
            isDelete[file]
        }
    }
    for (file in isKeep) {
        print "#", file
    }
    for (file in isDelete) {
        print "rm", file
    }
    delete files
}

Finally, I would like to share some thoughts. I hope I'm not disgressing too much.
A few weeks ago I resolved to finally cleanup that monstruous backup data (some files have more than 10 duplicates). But I couldn't find a tool to automate the task. And I didn't want to fire up a C program for that and didn't want to go the Perl way. So I knew I had to (and I wanted to) go the shell way. But I didn't know where to start and got stuck on the first lines.
So here we are. About one week ago, I didn't know anything about awk and very few about RegExp. Now, thanks to @Ed's contribution, I've been able to write "my" first awk script, better understand the RegExp world, and complete the task at hand. More importantly, I'm now confident enough to dive deeper into RegExp, awk and other text processing shell tools. It also motivates me to contribute to StackExchange.

When I first read @Ed's code I thought "What the hell!". And then, when I got it, I realized it's a brilliant piece of code, highly efficient and clear. I just wanted to share my personal experience, and give hope to others who may like me get stuck on a problem, like facing a mountain.

Here's the code that gives the desired output mentioned in the question, for people who'd be interested. It's just a tiny adaptation of @Ed's really smart code.

BEGIN { print "#!/bin/bash" }
/^#/ { prt(); print; next }
{ files[$0] }
END { prt() }

function prt(   file, isDate, isKeep, isDelete, backup, latest, pats) {
    # file exists in a current backup directory (yes|no)
    backup = "no"
    # latest historical backup date
    latest = "000000"
    for (file in files) {
        if ( file ~ /\/Library\// ) {
            # files to check manually
            isKeep[file]
        }
        else if ( file ~ /\/(labs data|backup-current)\// ) {
            # backup files to keep
            isKeep[file]
            backup = "yes"
        }
        else if ( match(file, /\/(backup-disk-name\/|backup-)([0-2][0-9][0-1][0-9][0-3][0-9])\//, pats) != 0 ) {
            # files in historical backup directories
            if ( pats[2] > latest ) {
                latest = pats[2]
            }
            isDate[file] = pats[2]
        }
        else {
            # unclassified files to check manually
            isKeep[file]
        }
    }
    for (file in isDate) {
        if ( isDate[file] == latest && backup == "no") {
            isKeep[file]
        }
        else {
            isDelete[file]
        }
    }
    for (file in isKeep) {
        print "#", file
    }
    for (file in isDelete) {
        # use single quotes to escape special characters in file
        # use gensub() to escape single quotes in file
        print "rm", "'" gensub(/'/,"'\\\\''", "g", file) "'"
    }
    delete files
}

Finally, I would like to share some thoughts. I hope I'm not disgressing too much.
A few weeks ago I resolved to finally cleanup that monstruous backup data (some files have more than 10 duplicates). But I couldn't find a tool to automate the task. And I didn't want to fire up a C program for that and didn't want to go the Perl way. So I knew I had to (and I wanted to) go the shell way. But I didn't know where to start and got stuck on the first lines.

After reading a lot, I was still very confused. So I decided to post my question on SE.
When I first read @Ed's code I thought "What the hell!". Then, when I got it, I realized it's a brilliant piece of code, highly efficient and clear.

So here we are. About one week ago, I didn't know anything about awk and very few about RegExp. Now, thanks to the @Ed's contribution, I've been able to write "my" first awk script, better understand the RegExp world, and complete the task at hand. More importantly, I'm now confident enough dive deeper by myself into RegExp, awk and other text processing shell tools. It also motivates me to contribute to SE.
I just wanted to share my personal experience, and give hope to others who may like me be stuck on a problem, like facing a mountain.

Source Link
macxpat
  • 29
  • 1
  • 10

Here is the code that gives the desired output mentioned in the question, for people who'd be interested. It's just a tiny adaptation of @Ed's really smart code.

BEGIN { print "#!/bin/bash" }
/^#/ { prt(); print; next }
{ files[$0] }
END { prt() }

function prt(   file, isDate, isKeep, isDelete, backup, latest) {
    # file exists in a current backup directory (yes|no)
    backup = "no"
    # latest historical backup date
    latest = "000000"
    for (file in files) {
        if ( file ~ /\/Library\// ) {
            # files to check manually
            isKeep[file]
        }
        else if ( file ~ /\/(labs data|backup-current)\// ) {
            # backup files to keep
            isKeep[file]
            backup = "yes"
        }
        else if ( match(file, /\/(backup-disk-name\/|backup-)([0-2][0-9][0-1][0-9][0-3][0-9])\//, pats) > 0 ) {
            # files in historical backup directories
            if ( pats[2] > latest ) {
                latest = pats[2]
            }
            isDate[file] = pats[2]
        }
        else {
            # unclassified file to check manually
            isKeep[file]
        }
    }
    for (file in isDate) {
        if ( isDate[file] == latest && backup == "no") {
            isKeep[file]
        }
        else {
            isDelete[file]
        }
    }
    for (file in isKeep) {
        print "#", file
    }
    for (file in isDelete) {
        print "rm", file
    }
    delete files
}

Finally, I would like to share some thoughts. I hope I'm not disgressing too much.
A few weeks ago I resolved to finally cleanup that monstruous backup data (some files have more than 10 duplicates). But I couldn't find a tool to automate the task. And I didn't want to fire up a C program for that and didn't want to go the Perl way. So I knew I had to (and I wanted to) go the shell way. But I didn't know where to start and got stuck on the first lines.
So here we are. About one week ago, I didn't know anything about awk and very few about RegExp. Now, thanks to @Ed's contribution, I've been able to write "my" first awk script, better understand the RegExp world, and complete the task at hand. More importantly, I'm now confident enough to dive deeper into RegExp, awk and other text processing shell tools. It also motivates me to contribute to StackExchange.

When I first read @Ed's code I thought "What the hell!". And then, when I got it, I realized it's a brilliant piece of code, highly efficient and clear. I just wanted to share my personal experience, and give hope to others who may like me get stuck on a problem, like facing a mountain.