Problem in storing dataframe to csv format

Question

I am currently using the below code to web scrape data and then store it in a CSV file.

from bs4 import BeautifulSoup 
import requests

url='https://www.business-standard.com/rss/companies-101.rss'
soup = BeautifulSoup(requests.get(url).content, 'xml')

news_items = []

for item in soup.findAll('item'):
    news_item = {}
    news_item['title'] = item.title.text
    news_item['excerpt'] = item.description.text

    print(item.link.text)
    s = BeautifulSoup(requests.get(item.link.text).content, 'html.parser')

    news_item['text'] = s.select_one('.p-content').get_text(strip=True, separator=' ')
    news_item['link'] = item.link.text
    news_item['pubDate'] = item.pubDate.text
    news_item['Category'] = 'Company'
    news_items.append(news_item)

import pandas as pd
df = pd.DataFrame(news_items)
df.to_csv('company_data.csv',index = False)

When displaying the data frame, the results look fine as attached.enter image description here But while opening the csv file, the columns are not as expected. enter image description hereCan anyone tell me the reason.

This is a problem with Excel. Try going through Data/New Query/From File/From CSV. — divingTobi
– divingTobi, Commented Jul 15, 2020 at 12:57
@divingTobi Thanks. It is working now. Is there a way where I can directly open it? — Varun S
– Varun S, Commented Jul 15, 2020 at 13:02
No, not aware of any way. But why don't you save the file as a xlsx to begin with? df.to_excel('file.xlsx'). — divingTobi
– divingTobi, Commented Jul 15, 2020 at 13:16

RamWill · Accepted Answer · 2020-07-15 13:23:36Z

2

The issue is that your data contains commas and the default seperator for to_csv is "," So each comma in your data set is treated as a seperate column.

If you perform df.to_excel('company_data.xlsx', index=False) you won't have this issue since it is not comma seperated.

answered Jul 15, 2020 at 13:23

RamWill

3281 gold badge3 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Serge Ballesta Over a year ago

NO! By default the CSV delimiter of MS Excel is locale dependant, and is the semicolon ; on some West European locales

Serge Ballesta · Accepted Answer · 2020-07-15 13:42:11Z

A csv file is not an Excel file despite what Microsoft pretends but is a text file containing records and fields. The records are separated with sequences of \r\n and the fields are separated with a delimiter, normally the comma. Fields can contain new lines or delimiters provided they are enclosed in quotation marks.

But Excel is known to have a very poor csv handling module. More exactly it can read what it has written, or what is formatted the way it would have written it. To be locale friendly, MS folks decided that they will use the locale decimal separator (which is the comma , in some West European locales) and will use another separator (the semicolon ; when the comma is the decimal separator). As a result, using Excel to read CSV files produced by other tools is a nightmare with possible workarounds like changing the separator when writing the CSV file, but no clean way. LibreOffice is far behind MS Office for most features except CSV handling. So my advice is to avoid using Excel for CSV files but use LibreOffice Calc instead.

Collectives™ on Stack Overflow

Problem in storing dataframe to csv format

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related