4

I would like read bit inside square bracket and also want the square bracket. The tricky part is class4. sample1[1] is not a bit. Bit only at the end of line.

Example:

File1.txt
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Expectation result:

class1 bit = [1:2]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

I use regular expression, but square bracket cannot be read. [] = Used for set of characters. ... = Any character except newline. ref: https://www.geeksforgeeks.org/perl-regex-cheat-sheet/

My CODE:

my $file = "$File1.txt";
my $line;

open (FILE,"<", $file) or die "Cannot open a file: $!";
while (<FILE>){
    my $line = $_;
    if ($line =~ m/[..]/){
        $line = $&;
    }
}
close (FILE);

Result only show:.........

I hope you guys can help me by giving idea. Thanks.

4
  • 2
    Try this one regex101.com/r/NEcKF4/1 With PCRE regex ^([^-]*)->.*?(\[[^]]*\]); once and let me know how it goes. Cheers. Commented May 18, 2023 at 4:07
  • Is the square bracket always last before ; as shown in examples?
    – Destroy666
    Commented May 18, 2023 at 5:45
  • @Destroy666 Yes. Square bracket contain bit always last before ;. But there will square bracket in middle of the words.
    – DM 256
    Commented May 18, 2023 at 5:59
  • 2
    perl -nE 'say "$1 bit = $2" if /(class\d+)->Signal.*(\[[:\d]+\])\s*;/' file.txt
    – jhnc
    Commented May 18, 2023 at 9:15

4 Answers 4

4

With your shown samples please try following regex in PCRE.

^([^-]*)->.*?(\[[^]]*\]);$

Here is the online demo for above regex.

Explanation: Adding detailed explanation for above regex.

^            ##Matching from starting of the value here.
(            ##Creating 1st capturing group here.
  [^-]*      ##Matching everything before very next occurrence of - here.
)            ##Closing capturing group here.
->           ##Matching literal -> here.
.*?          ##Using lazy match to match till next occurrence of [ mentioned below.
(            ##Creating 2nd capturing group here.
  \[[^]]*    ##matching literal [ following by very first occurrence of ] here.
  \]         ##Matching literal ] here.
)            ##Closing 2nd capturing group here.
;$           ##Mentioning literal ; at the end of the value here.
3
  • 3
    Hey man, all is well? This .*? does not have to be non greedy as the last part of the pattern is already at the end of the string. Commented May 18, 2023 at 13:08
  • 3
    @Thefourthbird, with GOD'S grace on me I am ok, how are you. Yeah I know it's one and only occurrence but thought to use lazy match since its PCRE, cheers Commented May 18, 2023 at 13:30
  • 2
    Yes I agree that .*? should be just .* as you already have end anchor.
    – anubhava
    Commented May 18, 2023 at 15:47
4

You could select the part that you want to remove, and replace with bit =

^[^-]*\K->.*(?=\[[^][]*\];$)

Explanation

  • ^ Start of string
  • [^-]*\K Match optional chars other than - and forget what is matches so far using \K
  • ->.* Match -> and the rest of the line
  • (?=\[[^][]*\];$) Positive lookahead, assert [...]; at the end of the line

See a regex demo and a Perl demo

Example

use strict;
use warnings;

while (<DATA>)
{
  s/^[^-]*\K->.*(?=\[[^][]*\];$)/ bit = /;
  print $_;
}

__DATA__
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Output

class1 bit = [4:4];
class2 bit = [2];
class3 bit = [7:3];
class4 bit = [7:3];

Or a bit more specific regex:

^class\d+\K->.*(?=\[[^][]*\];$)

See another regex demo.

0
2

[..] makes a character literal for matching the characters within the brackets, period in this case.

Since you are only matching literal periods, this is all you see.

This problem can be solved with a fairly simple regex.

Since you only want the last bracket, you can rely on the greadiness of .* to skip any brackets in the middle:

use strict;
use warnings;

my $file = "File1.txt"; 
my $line;

open (FILE, "<", $file) or die "Cannot open a file: $!";
while (<FILE>){
    $line = $_;
    if( $line =~ /(class\d).*(\[[^\]]*\]);/ ){
        $line = "$1 bit = $2";
    }
}
close (FILE);

the regex /(class\d).*(\[[^\]]*\]);/ will match class followed by a digit, then the .* matches the rest of the line (hence it's greedy) and gives back enough to match (\[[^\]]*\]);

Using ^ as the first character in a character literal makes it match anything EXCEPT the characters within. To match literal [ you have to escape it like \[.

(              # capture to $1 
    class\d    # match "class" followed by a digit
)              # end capture
.*             # match anything (greedy)
(              # capture to $2
    \[         # literal [
    [^ \] ]*   # match anything, except ] (greedy)
    \]         # literal ]
)              # end capture
;              # match ;

The parentheses will save what is matched within to the variables $1, $2, ... etc.

This can also be done with substitute, using the same regex and the /r flag to return the value:

while (<FILE>){
    $line = s/(class\d).*(\[[^\]]*\]);/$1 bit = $2/r;
}

Here's a simple command line one-liner that'll do the same:

perl -wlp -e 's/(class\d).*(\[[^\]]*\]);/$1 bit = $2/' File1.txt

change ' to " to run on windows

0
1
cat /tmp/a.txt
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

sed -e 's/->.*\[/ bit = [/g' -e 's/;//g'  /tmp/a.txt
class1 bit = [4:4]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.