String comparing in python

Question

I have an array of strings like

urls_parts=['week', 'weeklytop', 'week/day']

And i need to monitor inclusion of this strings in my url, so this example needs to be triggered by weeklytop part only:

url='www.mysite.com/weeklytop/2'
for part in urls_parts:
    if part in url:
       print part

But it is of course triggered by 'week' too. What is the way to do it right?

OOps, let me specify my question a bit. I need that code not to trigger when url='www.mysite.com/week/day/2' and part='week' The only url needed to trigger on is when the part='week' and the url='www.mysite.com/week/2' or 'www.mysite.com/week/2-second' for example

Parse the URL using urllib.urlparse(), split the traversal into parts and then compare string by string. Is this homework? — user2665694
– user2665694, Commented Aug 13, 2012 at 7:29
There is pattern "week" in every one in your url_parts and how could you expect the computer can tell apart without tokenizing url? You need to at least define word boundary before you can match in your way above...or do it regex — Yang
– Yang, Commented Aug 13, 2012 at 7:33

sberry · Accepted Answer · 2012-08-13 07:45:06Z

5

This is how I would do it.

import re
urls_parts=['week', 'weeklytop', 'week/day']
urls_parts = sorted(urls_parts, key=lambda x: len(x), reverse=True)
rexes = [re.compile(r'{part}\b'.format(part=part)) for part in urls_parts]

urls = ['www.mysite.com/weeklytop/2', 'www.mysite.com/week/day/2', 'www.mysite.com/week/4']
for url in urls:
    for i, rex in enumerate(rexes):
        if rex.search(url):
            print url
            print urls_parts[i]
            print
            break

OUTPUT

www.mysite.com/weeklytop/2
weeklytop

www.mysite.com/week/day/2
week/day

www.mysite.com/week/4
week

Suggestion to sort by length came from @Roman

edited Aug 13, 2012 at 7:45

answered Aug 13, 2012 at 7:35

sberry

133k20 gold badges145 silver badges171 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Roman Bodnarchuk · Accepted Answer · 2012-08-13 07:34:10Z

3

Sort you list by len and break from the loop at first match.

answered Aug 13, 2012 at 7:34

Roman Bodnarchuk

29.8k12 gold badges62 silver badges76 bronze badges

Comments

Ashwini Chaudhary · Accepted Answer · 2012-08-13 08:02:15Z

2

try something like this:

>>> print(re.findall('\\weeklytop\\b', 'www.mysite.com/weeklytop/2'))
['weeklytop']
>>> print(re.findall('\\week\\b', 'www.mysite.com/weeklytop/2'))
[]

program:

>>> urls_parts=['week', 'weeklytop', 'week/day']
>>> url='www.mysite.com/weeklytop/2'
>>> for parts in urls_parts:
    if re.findall('\\'+parts +r'\b', url):
        print (parts)

output:

weeklytop

edited Aug 13, 2012 at 8:02

answered Aug 13, 2012 at 7:33

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Comments

Martijn Pieters · Accepted Answer · 2012-11-16 09:50:16Z

0

Why not use urls_parts like this?

 ['/week/', '/weeklytop/', '/week/day/']

edited Nov 16, 2012 at 9:50

Martijn Pieters

1.1m325 gold badges4.2k silver badges3.4k bronze badges

answered Aug 13, 2012 at 8:19

Scott 混合理论

2,3328 gold badges37 silver badges59 bronze badges

1 Comment

Feanor Over a year ago

i use this, it was just an example

theharshest · Accepted Answer · 2012-08-13 07:38:24Z

-1

A slight change in your code would solve this issue -

>>> for part in urls_parts:
        if part in url.split('/'):              #splitting the url string with '/' as delimiter
            print part

    weeklytop

answered Aug 13, 2012 at 7:38

theharshest

7,91711 gold badges43 silver badges51 bronze badges

1 Comment

pepr Over a year ago

It was not me, but for example the 'week/day' can never be found this way.

Collectives™ on Stack Overflow

String comparing in python

5 Answers 5

Comments

Comments

Comments

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

1 Comment

1 Comment

Related