0

I am currently working on my first python script, which is supposed to check a URL every XX seconds and notify me if the text on the url changed.

My problem is that I can't find a way to refer to a variable outside the function it was defined in.

I tried to use global variable, but this resulted in errors as well.

The current version refers to the variable soup within the scrape function (scrape.soup = doesn't return errors, while `soup = does).

However in line 15 it still has issues to find the variable soup as it gives me this notification:

Cannot find reference 'soup' in 'function'

from bs4 import BeautifulSoup
import requests
import time

sleeptime = 15

def scrape():
    url = "http://www.pythonforbeginners.com"
    source_code = requests.get(url)
    plain_text = source_code.text
    scrape.soup = BeautifulSoup(plain_text, 'html.parser')

while 1:
    if scrape() == scrape.soup:
        print('Nothing Changed')
    else:
        print("Something Changed!")
        break
    time.sleep(sleeptime)

I expect the script to save the html_text of 'url' in the variable 'soup'.

The script should compare the latest scrape with the old scrape and print notifications for each result.

In case nothing changed, it should print "nothing changed".

In case it changed, it should print "Something Changed".

The script is being without any errors. However, when running the script, it always returns "Something changed".

I am pretty sure this is not correct, as it wouldn't make sense that the content on the site changed every 15 seconds. In addition I feel there is an error with time.seep, as the script runs only once and doesn't repeat every 15 seconds

I would really appreciate any clues that would point me into the right direction.

1
  • scrape() doesn't return anything Commented Apr 19, 2019 at 11:50

3 Answers 3

2

I think you're missing the concept of return.

def scrape():
    url = "http://www.pythonforbeginners.com"
    source_code = requests.get(url)
    plain_text = source_code.text
    return BeautifulSoup(plain_text, 'html.parser')

Now scrape() will always return a new object every time it is called. You can't simply check if the function returns the same thing (to mean the page content hasn't changed) because it never will.

If you only care that the content has changed (at all), then you don't even need to use Beautiful Soup. Just store the page content and compare that each cycle.

Otherwise you should use your Beautiful Soup object to dig in to the page content and extract just the parts you're watching to change. Then save that text and compare it each cycle.

Sign up to request clarification or add additional context in comments.

Comments

1

Your code

def scrape():
    url = "http://www.pythonforbeginners.com"
    source_code = requests.get(url)
    plain_text = source_code.text
    scrape.soup = BeautifulSoup(plain_text, 'html.parser')

does not return anything, hence it returns None implicitly.

When comparing

if scrape() == scrape.soup:

it will always be different, because scrape() == None and scrape.soup == .. some BeautifulSoup(...) return which is not None.

It would be better to do:

def scrape():
    url = "http://www.pythonforbeginners.com"
    source_code = requests.get(url)
    plain_text = source_code.text
    return BeautifulSoup(plain_text, 'html.parser')

s = scrape()   # get initial value

while True:
    time.sleep(sleeptime)         # sleep before testing again
    if s.text == scrape().text:   # compare the text of bs
        print('Nothing Changed')
    else:
        print("Something Changed!")
        break

Doku: https://docs.python.org/3/tutorial/controlflow.html#defining-functions

[...] The return statement returns with a value from a function. return without an expression argument returns None. Falling off the end of a function also returns None.

Comments

0

Additional to the 'return' answer: You must declare (and initialize) the variable in the correct scope. If you first assign it inside the function it will stay in this scope. Assign it outside and then use the return result to compare it.

from bs4 import BeautifulSoup
import requests
import time


sleeptime = 15
output = ""

def scrape():
    url = "http://www.pythonforbeginners.com"
    source_code = requests.get(url)
    plain_text = source_code.text
    # Use the correct API call to get the string you want to compare
    return BeautifulSoup(plain_text, 'html.parser').to_string()

while 1:
    new_output = scrape() 
    if output == new_output:
        print('Nothing Changed')
    else:
        print("Something Changed!")
        # change output to new output
        output = new_output
    time.sleep(sleeptime)

2 Comments

You've corrected the pattern, yes, but not the logic. scrape() will always return a new object and == will always return false.
Yes you are right, I forgot to mention I expect some string as return and not an object, but I am not familliar with BeautifulSoup API.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.