4

I have predictable piped input, I want to iterate over each row and change two characters. the character positions are 19 and 20 (on every row.) The 19th character is a comma, I want to cut that. The 20th charater is a space, I want to replace it with an 'x'

775448167763476486, 783834143007506433, 35972, 35972,
775448167763476486, 844395243412914178, 408008, 408008,
775448167763476486, 891964514511355905, 8003, 8003,
783834143007506433, 891655551753846784, 66633, 66633,

Should become

715448167763476486x783834143007506433, 35972, 35972,
775448167763476486x844395243412911178, 408008, 408008,
705448167763476486x891964513511335905, 8003, 8003,
723834143007506433x891655551753846784, 66633, 66633

If there is an even simpler way to achieve the same effect, that'd be even better.

Edit.

So far I have tried a multitude of different approaches with

sed primarily..

sed 's/^ *([^,]*) //;s/, \([^,]*\),/\1x,/g'

As I understand is useful as looking for the first instance of a comma *([^,]*) I'm not sure of how the rest of the expression follows on after the ;

sed 's/, \([^,]*\),/\1x,/g'

I tried modifying the command this way but found an appended 'x' at the end of each row.

sed 's/\(.*\)\(.\{18\}\)\(.*\)/\1x\3/'

Same problem here. Two other expressions I tried to look at..

sed -E 's/^ *([^,]*) // 
sed -e 's/\(.\)$/,\1/'

with the 'regular expression flag '-e' but to be honest I don't really understand what that means yet.

cut

cut -c-18 -c21-  

When I tried this, the error I got was:

cut: only one type of list may be specified Try 'cut --help' for more information.

As I understand this is condensing or compressing with the '-c' flag, I'm assuming that the error might be related to the piped input coming from sed but I'm not sure, have to investigate with the man pages.

awk

awk '{gsub(/ /, "", $1); gsub(/ /, "x", $2); print}'

I felt this is a pretty clear way of seeing the idea of replacing whitespace with the desired charater, but had no effect on the input (unchanged.)

Which is why I posted here, not just looking for answers (sorry if the original post sounded like that, I just try to be brief and to the point.) Really just looking for ideas on how to approach such a problem.

9
  • 1
    pretty simple. Have you tried anything? What tools are you familiar with? We're not a code writing service, but we'll gladly help you understand a way to a solution. Commented Jun 26, 2024 at 12:15
  • 5
    People: please don't close vote questions you don't like. This is not a case of "needs more focus". If you feel the OP didn't put enough effort or research into it, that is what voting is for. Commented Jun 26, 2024 at 12:26
  • 2
    Why does the subject of the question say every Nth character? That doesn't seem to tie up with the description. Commented Jun 26, 2024 at 12:54
  • 1
    @MarcusMüller I have expanded on my question. Apologies if it it came across like I'm not here to learn. Far from it. I actually view Unix & Linux stackexchange as a valued resource that I only consult with questions when I'm really at a deadend. I'm really far from the background or a programmer or experienced computer literate user, with the exception of self-learning over the recent few years. Commented Jun 26, 2024 at 12:58
  • 1
    @StéphaneChazelas Perhaps you're right. The exact thing I'm struggling with is the ability to use text-processing tools to evaluate rows/columns and modify characters at particular positions. I could not think of a more accurate description. I assumed the text processor would count from the beginning of each line, or so many characters in between each instance, therefore would that not be every Nth character? Commented Jun 26, 2024 at 13:04

5 Answers 5

12

The simplest way given your example input is to just replace the first comma-and-space with an x (here, file has your example):

$ sed 's/, /x/' file 
775448167763476486x783834143007506433, 35972, 35972,
775448167763476486x844395243412914178, 408008, 408008,
775448167763476486x891964514511355905, 8003, 8003,
783834143007506433x891655551753846784, 66633, 66633,

That has the benefit of working irrespective of the length of the first field. If you must change the 19th and 20th characters, you could do:

$ sed 's/../x/10' file 
775448167763476486x783834143007506433, 35972, 35972,
775448167763476486x844395243412914178, 408008, 408008,
775448167763476486x891964514511355905, 8003, 8003,
783834143007506433x891655551753846784, 66633, 66633,

The trick here is that the /10 means "repace the tenth occurrence of the pattern" and since the pattern is .., so two characters, it will change the 19th and 20th.

6
  • Thanks. I'm stunned at how simple a soution that is. I don't understand why it just operates on the first occurance on each row, rather than finding and moifying each occurance of , Commented Jun 26, 2024 at 13:23
  • The second example also works as expected for my need, but I don't understand what the pattern .. and 10 do in terms of finding the 19th and 20th character, Commented Jun 26, 2024 at 13:23
  • 4
    @XJMZX because that's how the s/old/new/ substitution operator works. It would only replace all occurrences if you explicitly tell it do do so using g, like this: sed 's/, /x/g' file . Commented Jun 26, 2024 at 13:24
  • 2
    @XJMZX since the pattern is .., which is two characters long, the first match for .. would be the first two characters. Consider echo abcd | sed 's/../xx/2' which returns abxx. That is because the first occurrence of .. would be ab and the second is cd. In other words, the second pair of characters corresponds to characters 3 and 4, just lke the first pair would be 1 and 2. Similarly, the 10th pair of characters corresponds to the 19th and 20th character. Commented Jun 26, 2024 at 13:28
  • 1
    Thanks for the explanations. Makes sense now. I guess I was approaching this all wrong from not understanding how it works. Since this is the simplest solution, I'll mark this correct answer. Perhaps I should have wrote the question differently. My bad. Commented Jun 26, 2024 at 13:36
7

If you need to replace the 19th and 20th character of each line with x if and only if they are , and space respectively, you can do:

sed 's/^\(.\{18\}\), /\1x/'

With most sed implementations (all those compliant to POSIX 2024), that can be made more legible by switching from basic to extented regexp with -E:

sed -E 's/^(.{18}), /\1x/'
4
  • 1. In this case, ^ is saying to sed to start from the beginning or the row. I just wonder if a similar method could be used for columns. 2. Regular expression is a more succinct way of writing expressions (if POSIX compliant shell)? Commented Jun 26, 2024 at 13:29
  • This is also a direct solution to the problem I had, so perhaps is a more fitting answer to the problem. Now I don't know which to mark as the answer as both your's and the other fit. Anyway, thank you so much Commented Jun 26, 2024 at 13:37
  • @XJMZX vim regexps have operators to match at specify columns, but not standard BRE or ERE, not even perl's, but you can easily work around it by using . (matches any character) followed by a repetition operator like {18} here to match 18 characters from the start of the line. Commented Jun 26, 2024 at 14:11
  • Thanks for explaining the syntax. I've learned much from looking at these examples today. Very helpful to get a variety of methods to substitute. Commented Jun 26, 2024 at 23:52
4

As sed is neither in the title or tagged, and the input is predictable, I would offer:

awk '{ print substr ($0, 1, 18) "x" substr ($0, 21); }'

Has the advantage of being explicit, and the character positions can be passed as args to awk without substituting into the syntax of a sed pattern.

5
  • 1
    Note that it also appends a x to lines that have fewer than 19 characters. Commented Jun 26, 2024 at 19:14
  • I was relying on the "predictable" part of the question. I considered first checking substr ($0, 19, 2) != ", " { print; next; } : wish I had now. Commented Jun 26, 2024 at 21:26
  • Thanks for the answer here, also tested and works for my needs. I don't understand what the 21 refers to here. Commented Jun 26, 2024 at 23:48
  • substr (string, from, to) returns a substring of the string between char indexes a and b (both counting from 1). If b is omitted, the operation continues to the end of string. So 21 is the index for the 21st character of the input string. The concatenation of the three consecutive parts is printed. Chars 19-20 are not part of either substr and are omitted, and the x is injected in their place. Commented Jun 27, 2024 at 8:26
  • 1
    Paul+@XJMZ: awk substr is from,len -- like perl and JS/ES, and like string-slice in some shells, except that awk is 1-origin where most of the others are 0-origin. (For old farts, this is also like PL/I.) Java is the only major thing I know that uses from,to. But when you start from the origin len and to are the same, and when omitting it gives a default of rest/last you don't need to know which. Commented Sep 3 at 23:47
2

Using Raku (formerly known as Perl_6)

With sed-like code (using Raku's -pe autoprinting flags):

~$ raku -pe 's/ \,\h /x/;'  file

#OR

~$ raku -pe 's:10th/ .. /x/;'  file

#OR

~$ raku -pe ' $_.=subst( / \,\h /, "x");'  file

OR:

With awk-like code (using Raku's -ne non-autoprinting flags):

~$ raku -ne 'put S/ \,\h /x/;'  file

#OR

~$ raku -ne 'put S:10th/ .. /x/;'  file

#OR

~$ raku -ne ' .subst( / \,\h /, "x").put;'  file

Above are answers written in Raku, a member of the Perl-family of programming languages. Raku features a powerful regex engine and high-level support for Unicode, built-in.

Above 'escaping' rules are simplified in Raku, such that all non-alnum characters must be escaped/backslashed to be understood literally. So \, represents a literal comma, while \h represents a single horizontal whitespace character. As with PCRE engines, the unescaped . dot represents "any-character" (a metacharacter). There are a plethora of regex-modifiers, such as :g or :global. To replace the first "x" times a pattern is seen use the :x() modifier.

To only replace an "nth" occurrence use the :nth() modifier, which accepts very readable forms such as :1st(or variant :st(1) ), :2nd (or variant :nt(2) ), :3rd (or variant :rd(3) ) or even :10th (acceptable variants are :nth(10) or just :th(10) ).

Sample Input:

775448167763476486, 783834143007506433, 35972, 35972,
775448167763476486, 844395243412914178, 408008, 408008,
775448167763476486, 891964514511355905, 8003, 8003,
783834143007506433, 891655551753846784, 66633, 66633,
783834143007506433, 891655551753846784, 66633, 66633,

Sample Output:

775448167763476486x783834143007506433, 35972, 35972,
775448167763476486x844395243412914178, 408008, 408008,
775448167763476486x891964514511355905, 8003, 8003,
783834143007506433x891655551753846784, 66633, 66633,
783834143007506433x891655551753846784, 66633, 66633,

Once you munge your file with Raku, you canchop off trailing characters, split/join on different characters, or even use an authentic CSV parser such as Raku's Text::CSV module. More info below.

https://docs.raku.org/language/regexes
https://raku.land/zef:Tux/Text::CSV
https://raku.org

1
  • 1
    I see you already replied. Thanks I haven't had time to look at Raku, but it's on the list. From what you wrote it seems a very versatile language, though Perl is not something I read much about. But thank you for the suggestion, it's always useful to have more solutions to a problem. Commented Sep 13, 2024 at 13:53
2

Using GNU awk:

The following command is used taking cue from the answer:

$ awk '{print gensub(/(^.{18})..(.*)/, "\\1x\\2", "g")}' file

If input hasn't TAB as a character in input, then the cut may be used:

$ cut --output-delimiter="x" -c -18,21- file

Using perl:

$ perl -pe 's/^.{18}\K, /x/' file
$ perl -pe 'substr $_,18,2,"x"' file

In the first command sub() function will Keep what is matched left to it i.e., first 18 characters and then replaces comma and a space with x.

In the latter, substr() replaces 19th and 20th by substr(EXPR,OFFSET,LENGTH,REPLACEMENT). $_,18,2 is because counting of characters start from 0.

1
  • Thanks. The cut command works and is a perfect solution to the question. When I run the awk command: awk: function gensub never defined. Is this me trying to run a gawk in regular awk? Commented Jun 28, 2024 at 22:39

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.