Regex to extract one part of the URI

Question

Here's the full URI:

https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4

I want to extract the base64 string after /view/ and before the numeric part in this case 366792786:

This is the part I'm trying to match:

TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z

I've managed to go this far:

my $uri = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
if ($uri =~ m/view\/(.+)\//g) {
    print $&;
}

But, it only produces the whole thing:

view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/

Please help me find the regex.

toolic · Accepted Answer · 2025-02-02 11:50:16Z

7

You should use $1 instead of $& to capture what is in the capturing parentheses.

Also:

Do not use .+ because it grabs the slash. Use [^/]+ to grab all non-slash characters.
There is no need for the //g global modifier.
You can use alternate delimiters {} to avoid escaping the slash characters.

my $uri = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
if ($uri =~ m{view/([^/]+)}) {
    print $1;
}

Outputs:

TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z

edited Feb 2 at 11:50

answered Feb 2 at 11:43

toolic

62.9k21 gold badges81 silver badges130 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ysth Feb 3 at 19:55

@ikegami from the example, this is url safe base64, so no escaping is needed. And while there could be unnecessary escaping, I would trust not given the example.

ikegami Feb 3 at 19:58

@ysth, / is a base64 character, and it would need to be escaped since it's acting as a separator. It could be a variant that doesn't use /, but we're now stacking assumptions atop of assumptions.

ysth Feb 3 at 20:06

@ikegami from the example this is definitely the url safe variant, using _-, no assumptions needed.

ikegami Feb 3 at 20:55

@ysth, No additional assumptions other than the two you already stated, you mean, and that's not counting the one that was incorrect.

brian d foy · Accepted Answer · 2025-03-04 16:09:20Z

I must have tested this originally with some different URL then added the long one from the answer. My initial answer makes no sense because the interesting piece of the answer is no longer interesting: that Mojo breaks up the path for you. If you can't use that feature, who cares? You're back to playing with a big string, neither has an advantage from

use v5.10;

my $url = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";

use Mojo::URL;
my( $p1 ) = Mojo::URL->new($url)->fragment =~ m|/view/(.*?)/|;
say "Mojo: $p1";

use URI;
my( $p2 ) = URI->new($url)->fragment =~ m|/view/(.*?)/|;
say "URI: $p2";

Here's the answer that wasn't helpful and didn't work with the actual data:

A regex can be fine, but if you're already doing a lot of web stuff, a proper URL module can help. Mojo::URL is nice because it already breaks up the path components so you can look at them individually:

use v5.10;
use Mojo::URL;

my $url = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";

my @parts = Mojo::URL->new($url)->path->parts;

my $last;
for (my $i=0; $i <= $#parts; $i++ ) {
    $last = $parts[$i] if $parts[$i-1] eq 'view';
    }

say $last;

This approach is nice when you need the other parts, such as the host, at the same time.

The URI module can be used if you don't want the weight of Mojo.
Hi @brian, the code doesn't work for me. It prints nothing. Dumper(@parts) shows: $VAR1 = [ 'entry' ];. I think we need to grab it via fragment method.
Yeah, I must have done something weird testing this, then re-pasted the URL from the answer right before I posted.

Dan · Accepted Answer · 2025-02-03 14:26:23Z

4

Here's the working example originally suggested by brian d foy:

    use Mojo::URL;
    use Data::Dumper;
    
    my $url = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
    
    my $fragment = Mojo::URL->new($url)->fragment;
    my @parts = $fragment =~ m{([^/]+)}g;
    
    print $parts[1];

answered Feb 3 at 14:26

Dan

1011 silver badge8 bronze badges

1 Comment

brian d foy Feb 4 at 15:22

Thanks. I used a different pattern in my improved answer so you don't have to assume that you are looking at the second component.

Rawley Fowler · Accepted Answer · 2025-02-05 03:20:31Z

0

You can do this more concisely than Mojo::URL, just using some Perl builtins:

use 5.036;

my $uri = 'https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4';
my @parts = split '/', $uri;

shift @parts while ($parts[0] && $parts[0] ne 'view');

my $frag = $parts[1];
say $frag;

edited Feb 5 at 3:20

answered Feb 4 at 16:11

Rawley Fowler

3,14712 silver badges21 bronze badges

Comments

barefootcoder · Accepted Answer · 2025-05-09 03:10:35Z

I'm not saying this is a better answer, but it feels like a simpler answer, and (arguably) a more intuitive one:

my $uri = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
if ($uri =~ m{view/(.+?)/}) {
    print $1;
}

Non-greedy matching isn't always the answer, but I find that it often is.

Collectives™ on Stack Overflow

Regex to extract one part of the URI

5 Answers 5

4 Comments

5 Comments

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

5 Comments

1 Comment

Comments

Comments

Related