I am currently using the below code to web scrape data and then store it in a CSV file.
from bs4 import BeautifulSoup
import requests
url='https://www.business-standard.com/rss/companies-101.rss'
soup = BeautifulSoup(requests.get(url).content, 'xml')
news_items = []
for item in soup.findAll('item'):
news_item = {}
news_item['title'] = item.title.text
news_item['excerpt'] = item.description.text
print(item.link.text)
s = BeautifulSoup(requests.get(item.link.text).content, 'html.parser')
news_item['text'] = s.select_one('.p-content').get_text(strip=True, separator=' ')
news_item['link'] = item.link.text
news_item['pubDate'] = item.pubDate.text
news_item['Category'] = 'Company'
news_items.append(news_item)
import pandas as pd
df = pd.DataFrame(news_items)
df.to_csv('company_data.csv',index = False)
When displaying the data frame, the results look fine as attached.enter image description here But while opening the csv file, the columns are not as expected. enter image description hereCan anyone tell me the reason.
df.to_excel('file.xlsx').