0

So, in my last question I asked for help in parsing the links from XML in an RSS feed. Using the ideas I received from assistance here in combination with extra research, I was able to write up this:

def GetRSS(RSSurl):
    url_info = urllib.urlopen(RSSurl)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        channel = xmldoc.getElementsByTagName('channel')
        for node in channel:
            item = xmldoc.getElementsByTagName('item')
            for node in item:
                alist = xmldoc.getElementsByTagName('link')
                for a in alist: 
                    linktext = a.firstChild.data
                    print linktext

As I mentioned in the other question, I wrote this for obtaining the links from the RSS feed on Redlettermedia.com. The code works fine and the output I receive is:

http://redlettermedia.com
http://redlettermedia.com/half-in-the-bag-b-fest-2012/
http://redlettermedia.com/an-update-from-red-letter-media/
http://redlettermedia.com/half-in-the-bag-red-tails/
http://redlettermedia.com/half-in-the-bag-the-devil-inside-and-flyin-ryan/
http://redlettermedia.com/newly-found-episode-iii-review-behind-the-scenes-footage/
http://redlettermedia.com/half-in-the-bag-the-girl-with-the-dragon-tattoo-and-2011-re-cap/
http://redlettermedia.com/mr-plinetts-indiana-jones-and-the-kingdom-of-the-crystal-skull-review/
http://redlettermedia.com/new-mr-plinkett-review-trailer/
http://redlettermedia.com/plinkett-fest/
http://redlettermedia.com/update/
http://redlettermedia.com
http://redlettermedia.com/half-in-the-bag-b-fest-2012/
http://redlettermedia.com/an-update-from-red-letter-media/
http://redlettermedia.com/half-in-the-bag-red-tails/
http://redlettermedia.com/half-in-the-bag-the-devil-inside-and-flyin-ryan/
http://redlettermedia.com/newly-found-episode-iii-review-behind-the-scenes-footage/

And so on. What I would like to do now is print only the newest update link as a result for a function (which is the second line in the output, "http://redlettermedia.com/half-in-the-bag-b-fest-2012/" in this case). How would I print only that line?

1
  • Can you install non-stdlib modules? How do you define newest update link? Commented Feb 9, 2012 at 5:29

1 Answer 1

1

If it's always the second item in the list you could try

url = xmldoc.getElementsByTagName('link')[1].firstChild.data
print url
Sign up to request clarification or add additional context in comments.

4 Comments

This works pretty much perfect, except that I receive ten lines repeating the url I was trying to get. What am I doing to cause that, as opposed to just receiving the url I wanted once?
It is because you're printing it for all items in the list. You would most likely replace what is after 'for node in item:' with my suggestion but I'm unable to test at the moment...
Well I figured that's what I should do, actually. I completely replaced everything beneath for node in item: with what you suggested, but I still seem to be getting ten lines for some reason.
Looking more closely you would probably put it directly after if (xmldoc):

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.