-2

I'm looking for a sed solution like the one here: Print everything after nth delimiter

sed -E 's/^([^\]*[\]){3}//' infile

but to extract the text before nth delimiter, instead of nth delimiter like the example. Something that works with all sed variants. And operate on all lines like the example.

The delimiter in this example is \ but could be for any other. Should work with any version of sed.

6
  • 1
    Please edit your question and i) tell us your operating system or at least which sed implementation you are using, ii) show us an example input file and iii) the output you expect. We cannot help you parse data you don't show. You mention a delimiter, what is that delimiter? Should we guess you have `` as the delmiter? Commented Nov 15, 2023 at 13:02
  • @terdon the delimiter is clearly the backslash as the link shows, but could use any other. Unless stated, should work with any version of sed. Commented Nov 15, 2023 at 13:31
  • Regarding "The delimiter in this example is \ but could be for any other. Should work with any version of sed." - There are no possible solutions that would work for any delimiter in any version of sed. Reduce the scope of your question and provide sample input and expected output if you'd like a solution that works portably for your actual needs. Commented Nov 15, 2023 at 21:55
  • @EdMorton the solution provided by "don_crissti" works perfectly with both GNU sed and BSD sed: sed -E 's/(^([^\]*[\]){3}[^\]*).*/\1/' infile. You can replace "\" with a letter like "e" or a number and keeps running: sed -E 's/(^([^e]*[e]){3}[^e]*).*/\1/' infile. So you are not willing to put the effort to elaborate your answer dude. Commented Nov 16, 2023 at 6:53
  • I haven't posted an answer dude because there is no answer to the question you asked and so far you haven't done what I suggested in my comment which is to reduce the scope of your question. GNU and BSD sed aren't "any version of sed" and "a letter like e or a number" aren't "any other" delimiter. If you want a solution that'll work in GNU or BSD sed with any single letter or number as a delimiter then say THAT in your question as there obviously is a solution for THAT scope, but not for "The delimiter in this example is \ but could be for any other. Should work with any version of sed.". Commented Nov 16, 2023 at 12:15

4 Answers 4

5

Why don't you use cut?

cut -d '\' -f 1-3 infile

With sed, instead of deleting the match, capture it and use a back-reference to replace the whole line with the captured group:

sed -E 's/(^([^\]*[\]){3}).*/\1/' infile

Though that would also print the trailing backslash... To avoid that you could run

sed -E 's/(^([^\]*[\]){2}[^\]*).*/\1/' infile
0
2

a shorter awk:

awk NF=3 FS='\\' OFS='\\'
  • define input and output field separators
  • set number of fields to keep
5
  • 1
    What happens when you change NF is undefined behavior per POSIX so don't count on it doing any one thing across all awk variants. In most awks increasing NF will create empty fields at the the end (same as $NF=$7 to increase NF to 7 which IS portable to all awks so there's no point doing NF=7), in some awks decreasing it will remove fields from the end, in other awks decreasing it will do nothing. There's no portable way to decrease NF other than sub() or match() to remove some number of fields. Commented Nov 15, 2023 at 21:49
  • @EdMorton good point. freebsd awk is such an example of doing nothing, but even there awk 'NF=3,$3=$3' ... seems to work. do you know any other implementations that don't truncate? Commented Nov 16, 2023 at 5:12
  • no, sorry, I'd want to check /usr/xpg[46]/bin/awk and nawk on Solaris, tawk, mawk1 (I expect mawk2 behaves like gawk as they share a lot of code), and busybox awk as a start and there's also mks awk and the awk variants written in Go (goawk) and Rust (frawk) but I don't have access to any of those. Commented Nov 16, 2023 at 12:00
  • 1
    Rather than having to know the behavior (which could change by release) of all the awk variants in my answers I try to always say either a) "using any awk" if my script uses all POSIX constructs except RE intervals and character classes, or b) "any POSIX awk" if it also uses those, or c) "using GNU awk or other that supports ..." and state the extensions/POSIX-undefined-behavior otherwise and then it's up to the user to determine if it'd also work in their awk if it's doesn't fit that description. For your script I'd say "using GNU awk or other that supports truncating fields by assigning NF". Commented Nov 16, 2023 at 12:03
  • 1
    I exclude RE intervals and character classes from my "any awk" category as when I used to use Solaris we had nawk and /usr/xpg4/bin/awk and one of them supported RE intervals while the other supported character classes but neither supported both, I think I've heard of other awks that don't support both (maybe mawk1?), and older versions of gawk didn't support RE intervals unless you added the --posix or --re-intervals options. Oh, and I also exclude a script that does gsub(/^"|"$/,"") from "any awk" as that fails in mawk1 and tawk. Commented Nov 16, 2023 at 12:08
2

You could replace the nth delimiter with newline (which cannot otherwise occur in the pattern space) and then delete everything starting with that newline. Here for n == 3:

sed 's/delim/\
/3; P; d'

Or if the nth delimiter must be retained in the output:

sed 's/delim/&\
/3; P; d'

To skip the lines that don't have n delimiters:

sed -n 's/delim/\
/3; t1
d; :1
P'
2
  • 1
    This one has the advantage (over mine) that it works with multi-char delimiters... I think the 1st and 2nd could be golfed shorter with P;d instead of s/\n.*// Commented Nov 15, 2023 at 11:16
  • @don_crissti, thanks. I'll add that in as it would address the potential problems with .* choking on non-characters. Commented Nov 15, 2023 at 13:26
2

Using awk:

$ awk -v var=3 'BEGIN{FS=OFS="\\"}
(NF>=var){ split($0,arr,OFS); 
$0=""; 
for (i=1; i<=var; ++i) $(NF+1)=arr[i];
print}'

To keep nth delimiter the following command may be used.

$ awk -v var=3 'BEGIN{FS=OFS="\\"} 
(NF>=var){ for (i=1; i<=var; ++i) printf "%s%s", $i, OFS; print ""}'

$ nawk '(match($0, /^([^\\]*[\\]){3}/)) 
{ print substr($0,RSTART,RLENGTH)}'

With GNU awk:

The following command uses back-referencing a captured group. This is an awk command taken from this answer. Thanks to @don_crissti

$ awk -F "\\" -v col=3 '(NF>=col){print gensub(/(^([^\\]*[\\]){3}).*/, "\\1", "g")}'

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.