3

I have an array of strings like

urls_parts=['week', 'weeklytop', 'week/day']

And i need to monitor inclusion of this strings in my url, so this example needs to be triggered by weeklytop part only:

url='www.mysite.com/weeklytop/2'
for part in urls_parts:
    if part in url:
       print part

But it is of course triggered by 'week' too. What is the way to do it right?

OOps, let me specify my question a bit. I need that code not to trigger when url='www.mysite.com/week/day/2' and part='week' The only url needed to trigger on is when the part='week' and the url='www.mysite.com/week/2' or 'www.mysite.com/week/2-second' for example

2
  • 1
    Parse the URL using urllib.urlparse(), split the traversal into parts and then compare string by string. Is this homework? Commented Aug 13, 2012 at 7:29
  • There is pattern "week" in every one in your url_parts and how could you expect the computer can tell apart without tokenizing url? You need to at least define word boundary before you can match in your way above...or do it regex Commented Aug 13, 2012 at 7:33

5 Answers 5

5

This is how I would do it.

import re
urls_parts=['week', 'weeklytop', 'week/day']
urls_parts = sorted(urls_parts, key=lambda x: len(x), reverse=True)
rexes = [re.compile(r'{part}\b'.format(part=part)) for part in urls_parts]

urls = ['www.mysite.com/weeklytop/2', 'www.mysite.com/week/day/2', 'www.mysite.com/week/4']
for url in urls:
    for i, rex in enumerate(rexes):
        if rex.search(url):
            print url
            print urls_parts[i]
            print
            break

OUTPUT

www.mysite.com/weeklytop/2
weeklytop

www.mysite.com/week/day/2
week/day

www.mysite.com/week/4
week

Suggestion to sort by length came from @Roman

Sign up to request clarification or add additional context in comments.

Comments

3

Sort you list by len and break from the loop at first match.

Comments

2

try something like this:

>>> print(re.findall('\\weeklytop\\b', 'www.mysite.com/weeklytop/2'))
['weeklytop']
>>> print(re.findall('\\week\\b', 'www.mysite.com/weeklytop/2'))
[]

program:

>>> urls_parts=['week', 'weeklytop', 'week/day']
>>> url='www.mysite.com/weeklytop/2'
>>> for parts in urls_parts:
    if re.findall('\\'+parts +r'\b', url):
        print (parts)

output:

weeklytop

Comments

0

Why not use urls_parts like this?

 ['/week/', '/weeklytop/', '/week/day/']

1 Comment

i use this, it was just an example
-1

A slight change in your code would solve this issue -

>>> for part in urls_parts:
        if part in url.split('/'):              #splitting the url string with '/' as delimiter
            print part

    weeklytop

1 Comment

It was not me, but for example the 'week/day' can never be found this way.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.