1

There's a lot of help on here but some of it goes over my head, so hopefully by asking my question and getting a tailored answer I will better understand.

So far I have managed to connect to a website, authenticate as a user, fill in a form and then pull down the html. The html contains a table I want. I just want to say some thing like:-

read html... when you read table start tags keep going until you reach table end tags and then disply that, or write it to a new html file and open it keeping the tags so it's formmated for me.

Here is the code I have so far.

# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
s.post(LOGINURL, data=login)
# print
r = s.get(LOGINURL)
print r.url

# An authorised request.
r = s.get(APURL)
print r.url
    # etc...

s.post(APURL)
#
r = s.post(APURL, data=findaps)
r = s.get(APURL)
#print r.text




f = open("makethisfile.html", "w")

f.write('\n'.join(['<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">',
                   '<html>',
                   ' <head>',
                   ' <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">',
                   ' <title>THE TITLE</title>',
                   ' <link rel="stylesheet" href="css/displayEventLists.css" type="text/css">',
                   r.text #this just does everything, i need to get the table.
                   ])
        )

f.close()
2

1 Answer 1

1

Although it's best to parse the file properly, a quick-and-dirty method uses a regex.

m = re.search("<table.*?>(.+)</table>", r.text, re.S)
if (m):
  print m.group()
else:
  print "Error: table not found"

As an example of why parsing is better, the regex as written will fail with the following (rather contrived!) example:

<!-- <table> -->
blah
blah
<table>
this is the actual
table
</table>

And as written it will get the first table in the file. But you could just loop to get the 2nd, etc., (or make the regex specific to the table you want if possible) so that's not a problem.

Sign up to request clarification or add additional context in comments.

2 Comments

That did it perfectly first time, thanks for the help. Does anyone mind explaining to me why this isn't the best method? Could I run into issues, say if there is more than 1 table, is that the issue?
I see ok, cool. Thanks for your help!! Luckily there is only 1 table on my page and no comments or anything so its working perfectly. Loving Python!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.