Display output as CSV in Python using Pandas

Question

Below is my code

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('article'):
    headline = article.a.text
    summary=article.p.text
    link = "https://www.vanglaini.org" +article.a['href']
    #print(headline)
    #print(summary)
    #print(link)

#print()

news_csv = pd.DataFrame({'Headline': headline,
                         'Summary': summary,
                        'Link' : link,


                         })
print(news_csv)

i got this error headline = article.a.text AttributeError: 'NoneType' object has no attribute 'text'

Help!

Are you sure there is articles fields in the page you are looking for? — Robin Nicole
– Robin Nicole, Commented Oct 12, 2019 at 13:03
I can print headline, summary and link. Just wanted to display as csv/csv file — user12205480
– user12205480, Commented Oct 12, 2019 at 13:05
you got answer in my comment in previous question - it seems not all articles have link so article.a gives None and you can't get None.text . You have to check if article.a is None: . And error raise ValueError("If using all scalar values, you must pass an index") is for different reason. — furas
– furas, Commented Oct 12, 2019 at 13:30

furas · Accepted Answer · 2019-10-12 13:48:15Z

As you already get in my comments and in @AmiTavory (deleted) answer - not all articles have link and sometimes article.a gives None so you have None.text which gives you error.

You have to check if article.a is not None like

import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')

for article in soup.find_all('article'):
    if article.a is None:
        continue        

    headline = article.a.text
    summary = article.p.text
    link = "https://www.vanglaini.org" + article.a['href']
    print(headline)
    print(summary)
    print(link)

and it works.

EDIT: You can get error

raise ValueError("If using all scalar values, you must pass an index") ValueError: If using all scalar values, you must pass an index

for totally different reason and you should create new question on new page.

It is problem in DataFrame because you have only last value in headline, summary, link but DataFrame expects lists in

{
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
}

You should create empty lists before for-loop

list_with_headlines = []
list_with_summaries = []
list_with_links = []

and inside for-loop you shouldappend() values to lists

list_with_headlines.append(headline)
list_with_summaries.append(summary)
list_with_links.append(link)

and later create DataFrame using lists

news_csv = pd.DataFrame({
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
})

Full code:

import pandas as pd
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')

list_with_headlines = []
list_with_summaries = []
list_with_links = []

for article in soup.find_all('article'):
    if article.a is None:
        continue        
    headline = article.a.text.strip()
    summary = article.p.text.strip()
    link = "https://www.vanglaini.org" + article.a['href']
    list_with_headlines.append(headline)
    list_with_summaries.append(summary)
    list_with_links.append(link)

news_csv = pd.DataFrame({
    'Headline': list_with_headlines,
    'Summary': list_with_summaries,
    'Link' : list_with_links,
})

print(news_csv)

Collectives™ on Stack Overflow

Display output as CSV in Python using Pandas

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related