5

Here's the full URI:

https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4

I want to extract the base64 string after /view/ and before the numeric part in this case 366792786:

This is the part I'm trying to match:

TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z

I've managed to go this far:

my $uri = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
if ($uri =~ m/view\/(.+)\//g) {
    print $&;
}

But, it only produces the whole thing:

view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/

Please help me find the regex.

5 Answers 5

7

You should use $1 instead of $& to capture what is in the capturing parentheses.

Also:

  • Do not use .+ because it grabs the slash. Use [^/]+ to grab all non-slash characters.
  • There is no need for the //g global modifier.
  • You can use alternate delimiters {} to avoid escaping the slash characters.

my $uri = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
if ($uri =~ m{view/([^/]+)}) {
    print $1;
}

Outputs:

TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z
Sign up to request clarification or add additional context in comments.

4 Comments

@ikegami from the example, this is url safe base64, so no escaping is needed. And while there could be unnecessary escaping, I would trust not given the example.
@ysth, / is a base64 character, and it would need to be escaped since it's acting as a separator. It could be a variant that doesn't use /, but we're now stacking assumptions atop of assumptions.
@ikegami from the example this is definitely the url safe variant, using _-, no assumptions needed.
@ysth, No additional assumptions other than the two you already stated, you mean, and that's not counting the one that was incorrect.
5

I must have tested this originally with some different URL then added the long one from the answer. My initial answer makes no sense because the interesting piece of the answer is no longer interesting: that Mojo breaks up the path for you. If you can't use that feature, who cares? You're back to playing with a big string, neither has an advantage from

use v5.10;

my $url = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";

use Mojo::URL;
my( $p1 ) = Mojo::URL->new($url)->fragment =~ m|/view/(.*?)/|;
say "Mojo: $p1";

use URI;
my( $p2 ) = URI->new($url)->fragment =~ m|/view/(.*?)/|;
say "URI: $p2";

Here's the answer that wasn't helpful and didn't work with the actual data:

A regex can be fine, but if you're already doing a lot of web stuff, a proper URL module can help. Mojo::URL is nice because it already breaks up the path components so you can look at them individually:

use v5.10;
use Mojo::URL;

my $url = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";

my @parts = Mojo::URL->new($url)->path->parts;

my $last;
for (my $i=0; $i <= $#parts; $i++ ) {
    $last = $parts[$i] if $parts[$i-1] eq 'view';
    }

say $last;

This approach is nice when you need the other parts, such as the host, at the same time.

5 Comments

The URI module can be used if you don't want the weight of Mojo.
Hi @brian, the code doesn't work for me. It prints nothing. Dumper(@parts) shows: $VAR1 = [ 'entry' ];. I think we need to grab it via fragment method.
This needs ->fragment instead of ->path.
Yep, just dropped an answer.
Yeah, I must have done something weird testing this, then re-pasted the URL from the answer right before I posted.
4

Here's the working example originally suggested by brian d foy:

    use Mojo::URL;
    use Data::Dumper;
    
    my $url = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
    
    my $fragment = Mojo::URL->new($url)->fragment;
    my @parts = $fragment =~ m{([^/]+)}g;
    
    print $parts[1];

1 Comment

Thanks. I used a different pattern in my improved answer so you don't have to assume that you are looking at the second component.
0

You can do this more concisely than Mojo::URL, just using some Perl builtins:

use 5.036;

my $uri = 'https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4';
my @parts = split '/', $uri;

shift @parts while ($parts[0] && $parts[0] ne 'view');

my $frag = $parts[1];
say $frag;

Comments

0

I'm not saying this is a better answer, but it feels like a simpler answer, and (arguably) a more intuitive one:

my $uri = "https://example.com/entry/#/view/TCMaftR7cPYyC3q61TnI6_Mx8PwDTsnVyo9Z6nsXHDRzrN5ftuXxHN7NvIGK34-z/366792786/aHR0cHM6Ly9lcGwuaXJpY2EuZ292LmlyL0ltZWlBZnRlclJlZ2lzdGVyP2ltZWk9MzU5NzQ0MzkxMDc2Mjg4";
if ($uri =~ m{view/(.+?)/}) {
    print $1;
}

Non-greedy matching isn't always the answer, but I find that it often is.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.