As you already get in my comments and in @AmiTavory (deleted) answer - not all articles have link and sometimes article.a gives None so you have None.text which gives you error.
You have to check if article.a is not None like
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('article'):
if article.a is None:
continue
headline = article.a.text
summary = article.p.text
link = "https://www.vanglaini.org" + article.a['href']
print(headline)
print(summary)
print(link)
and it works.
EDIT: You can get error
raise ValueError("If using all scalar values, you must pass an index") ValueError: If using all scalar values, you must pass an index
for totally different reason and you should create new question on new page.
It is problem in DataFrame because you have only last value in headline, summary, link but DataFrame expects lists in
{
'Headline': list_with_headlines,
'Summary': list_with_summaries,
'Link' : list_with_links,
}
You should create empty lists before for-loop
list_with_headlines = []
list_with_summaries = []
list_with_links = []
and inside for-loop you shouldappend() values to lists
list_with_headlines.append(headline)
list_with_summaries.append(summary)
list_with_links.append(link)
and later create DataFrame using lists
news_csv = pd.DataFrame({
'Headline': list_with_headlines,
'Summary': list_with_summaries,
'Link' : list_with_links,
})
Full code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.vanglaini.org/').text
soup = BeautifulSoup(source, 'lxml')
list_with_headlines = []
list_with_summaries = []
list_with_links = []
for article in soup.find_all('article'):
if article.a is None:
continue
headline = article.a.text.strip()
summary = article.p.text.strip()
link = "https://www.vanglaini.org" + article.a['href']
list_with_headlines.append(headline)
list_with_summaries.append(summary)
list_with_links.append(link)
news_csv = pd.DataFrame({
'Headline': list_with_headlines,
'Summary': list_with_summaries,
'Link' : list_with_links,
})
print(news_csv)
article.agivesNoneand you can't getNone.text. You have to checkif article.a is None:. And errorraise ValueError("If using all scalar values, you must pass an index")is for different reason.